This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Target/X86/
-
X86/
1/2
X86TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
2/5
SLPVectorizer.cpp
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
1/2
reduction.ll
-
shuffle-single-src.ll
-
Transforms/SLPVectorizer/
-
SLPVectorizer/
-
AArch64/
1/1
PR38339.ll
-
horizontal.ll
-
X86/
-
PR39774.ll
-
bad_types.ll
-
load-merge-inseltpoison.ll
-
load-merge.ll
-
minimum-sizes.ll
-
pr35497.ll
-
pr47629-inseltpoison.ll
-
pr47629.ll
-
remark_extract_broadcast.ll
-
vec_list_bias-inseltpoison.ll
-
vec_list_bias.ll

Differential D100486

[COST]Improve cost model for shuffles in SLP.
ClosedPublic

Authored by ABataev on Apr 14 2021, 8:19 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
CarolineConcatto
greened
lebedev.ri

Commits

rG75e1cf4a6a87: [COST]Improve cost model for shuffles in SLP.
rG29a470e3804c: [COST]Improve cost model for shuffles in SLP.

Summary

Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Apr 14 2021, 8:19 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptApr 14 2021, 8:19 AM

ABataev requested review of this revision.Apr 14 2021, 8:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 8:19 AM

Harbormaster completed remote builds in B98708: Diff 337463.Apr 14 2021, 9:16 AM

Adding some AArch64 reviewers

Can you split off the target specific cost model changes? This makes it easier to track down potential regressions.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1408 ↗	(On Diff #337463)	This should be moved up I think, before the scalable vector handling. It would also be good to have cost-model tests for those shuffles.

sdesmalen added a subscriber: sdesmalen.Apr 15 2021, 3:46 AM

sdesmalen added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3823–3824	Can the finding of a more specific ShuffleKind be done by getShuffleCost when a Mask is given? It seems a bit inconvenient to have to do that manually before calling this function.

RKSimon added inline comments.Apr 15 2021, 4:39 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3823–3824	I made a similar comment on D100495 - we could do with a generic 'ShuffleKind' decoder helper function (e.g. in Analysis\VectorUtils.h) that everybody can use.

ABataev added inline comments.Apr 16 2021, 6:30 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3823–3824	In this case, we need to update `getShuffleCost` function for all the targets to translate the mask. Is it ok if I'll do it in this patch?

sdesmalen added inline comments.Apr 20 2021, 1:21 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3823–3824	There's already a few different changes in this patch (AArch64 cost-model, X86 cost-model and changes to the SLPVectorizer), so I think it makes more sense to do this in a separate patch.

Matt added a subscriber: Matt.Apr 20 2021, 6:51 AM

ABataev added inline comments.Apr 20 2021, 9:24 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1408 ↗	(On Diff #337463)	There is getIntrinsicInstrCost-vector-reverse.ll already affected by the cost changes.

Rebase

Harbormaster completed remote builds in B101236: Diff 340937.Apr 27 2021, 4:10 PM

Rebase + fixes

Harbormaster completed remote builds in B102804: Diff 343123.May 5 2021, 12:25 PM

ABataev mentioned this in D100684: [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again.May 12 2021, 7:54 AM

Rebase and fixes

Harbormaster completed remote builds in B104588: Diff 345554.May 14 2021, 3:06 PM

Rebase

Just because we've legalized to a certain vector type, is that sufficient guarantee that we have an appropriate shuffle operations on that legal vector type?

In D100486#2773883, @lebedev.ri wrote:

Just because we've legalized to a certain vector type, is that sufficient guarantee that we have an appropriate shuffle operations on that legal vector type?

I think so, at least the existing code relies on this. This code just improves the existing situation. Currently, we conservatively calculate the cost of the shuffles as the permutation of all source subvectors. But in presence of mask we can estimate this cost more precisely, choosing only subvectors actually used in the permutations.

Harbormaster completed remote builds in B105636: Diff 347034.May 21 2021, 8:48 AM

Ping

Harbormaster completed remote builds in B106274: Diff 347934.May 26 2021, 6:20 AM

Rebase

Harbormaster completed remote builds in B107504: Diff 349623.Jun 3 2021, 1:09 PM

Rebase + fixed gathering cost calculation.

Harbormaster completed remote builds in B108840: Diff 351488.Jun 11 2021, 10:55 AM

Adding @lebedev.ri who's worked on improving x86 shuffle/buildvector costmodels recently.

No i didn't.
I think while this may be somewhat correct,
is not really correct. For example, before AVX,
there is no non-32-bit shuffles, only unpacks.

In D100486#2821347, @lebedev.ri wrote:

No i didn't.
I think while this may be somewhat correct,
is not really correct. For example, before AVX,
there is no non-32-bit shuffles, only unpacks.

Checked the cost model calculations and looks like it is correct. E.g. the cost of :

; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>

We have a permutation of 2 registers here. We have a general SK_PermuteSingleSrc for the first register <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8>:

SSE2
{TTI::SK_PermuteSingleSrc, MVT::v8i16, 5}, // 2*pshuflw + 2*pshufhw + pshufd/unpck

and we have SK_Reverse for the second register `<i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>:

{TTI::SK_Reverse, MVT::v8i16, 3}, // pshuflw + pshufhw + pshufd

. The sum result is 8.

Rebase

Harbormaster completed remote builds in B110278: Diff 353473.Jun 21 2021, 2:46 PM

Added correct calculation of the cost for shuffle reuses.

ABataev added inline comments.Jun 24 2021, 12:57 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3641–3645	We incorrectly calculate the cost here (using the wrong vector type), with this fix and without this patch at least 4 X86 tests are not vectorized anymore.
llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll
6–17	These regressions are caused by the incomplete cost model for AArch64, no cost for PermuteSingleSrc shuffle kind for VFxi16

Harbormaster completed remote builds in B110887: Diff 354335.Jun 24 2021, 1:26 PM

Rebase

Harbormaster completed remote builds in B112610: Diff 356710.Jul 6 2021, 7:46 AM

Rebase

Harbormaster completed remote builds in B113208: Diff 357519.Jul 9 2021, 8:57 AM

Thanks for looking at this - this is well overdue!

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
1192	We already have something similar in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE - do you think we could have a single version of the code some place?

ABataev mentioned this in D106060: [SLP]Improve calculations of the cost for reused/reordered scalars..Jul 15 2021, 5:36 AM

ABataev added inline comments.Jul 15 2021, 5:39 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
1192	Will check if can merge it somehow.

ABataev mentioned this in rGda3dbfcacf9a: [SLP]Improve calculations of the cost for reused/reordered scalars..Jul 16 2021, 1:42 PM

Rebase + address comments.

Currently in hangs the compiler in tests test/CodeGen/Thumb2/mve-vst4.ll and llvm/test/CodeGen/Thumb2/mve-vst3.ll, need to investigate.

Herald added a subscriber: dmgreen. · View Herald TranscriptJul 27 2021, 1:22 PM

Harbormaster completed remote builds in B116530: Diff 362162.Jul 27 2021, 9:18 PM

I hadn't been expecting you'd have to change the legalization implemention, at least not as an initial patch - can we not keep closer to the original SplitVecRes_VECTOR_SHUFFLE implementation?

In D100486#2910065, @RKSimon wrote:

I hadn't been expecting you'd have to change the legalization implemention, at least not as an initial patch - can we not keep closer to the original SplitVecRes_VECTOR_SHUFFLE implementation?

Do you mean, use extract/inserts in case if we have more than 2 input vectors?

Rebase + address comments.

Harbormaster completed remote builds in B116794: Diff 362526.Jul 28 2021, 4:28 PM

Rebase

Harbormaster completed remote builds in B120986: Diff 368380.Aug 24 2021, 10:34 AM

Rebase

Harbormaster completed remote builds in B124670: Diff 373597.Sep 20 2021, 8:50 AM

lebedev.ri mentioned this in D110704: [X86][Costmodel] Load/store i8 Stride=2 VF=4 interleaving costs.Sep 29 2021, 6:57 AM

Rebase

Harbormaster completed remote builds in B126829: Diff 376894.Oct 4 2021, 7:39 AM

Ping

Rebase

Harbormaster completed remote builds in B130772: Diff 382405.Oct 26 2021, 12:08 PM

Rebase

Harbormaster completed remote builds in B133822: Diff 386667.Nov 11 2021, 3:28 PM

vporpo added a subscriber: vporpo.Nov 11 2021, 7:57 PM

Anything else here?

Rebase

Harbormaster completed remote builds in B134955: Diff 388277.Nov 18 2021, 12:04 PM

Ping!

Rebase

Harbormaster completed remote builds in B136470: Diff 390388.Nov 29 2021, 11:12 AM

Ping!

Rebase

Harbormaster completed remote builds in B138194: Diff 392814.Dec 8 2021, 10:16 AM

Sorry for the slow response on this one - please can you split the DAG legalization change out as the first patch and the TTI change as a followup?

llvm/include/llvm/Analysis/VectorUtils.h
401 ↗	(On Diff #392814)	Description comment?
402 ↗	(On Diff #392814)	Should this be splitShuffleMasks?
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2140 ↗	(On Diff #392814)	I'm rather surprised this didn't reduce further (same for the TTI) - maybe there's more functionality that can be moved into the computeShuffleMasks helper?

ABataev mentioned this in D115653: [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer..Dec 13 2021, 10:42 AM

Rebase

Harbormaster completed remote builds in B140891: Diff 396522.Dec 29 2021, 6:59 AM

ABataev mentioned this in rG2f49163b3365: [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type….Apr 20 2022, 5:50 AM

ABataev mentioned this in rG2cca53c8155f: [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type….Apr 20 2022, 9:38 AM

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 1:58 PM

RKSimon added inline comments.Apr 20 2022, 2:23 PM

llvm/test/Analysis/CostModel/X86/reduction.ll
66	Something might be still going wrong here - SSE max legal type is v4i32 so this cost should be 0 as its referencing a single existing vector

ABataev added inline comments.Apr 20 2022, 2:35 PM

llvm/test/Analysis/CostModel/X86/reduction.ll
66	I'm kind of pessimistic here and if the dest reg is not the same as src, I consider it as a reg copy and add cost TCC_Basic.

Harbormaster completed remote builds in B160519: Diff 424015.Apr 20 2022, 3:03 PM

dmgreen mentioned this in D123414: [AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost.Apr 26 2022, 3:23 AM

LGTM

This revision is now accepted and ready to land.Apr 26 2022, 5:09 AM

Closed by commit rG29a470e3804c: [COST]Improve cost model for shuffles in SLP. (authored by ABataev). · Explain WhyApr 27 2022, 10:58 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG29a470e3804c: [COST]Improve cost model for shuffles in SLP..

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

In D100486#3479989, @fhahn wrote:

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

Hi, most probably the crash is caused by the opaque pointers. This patch does not change the vectorizer behavior, just adjusts the cost of the vector shuffles ops.

In D100486#3479994, @ABataev wrote:

In D100486#3479989, @fhahn wrote:

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

Hi, most probably the crash is caused by the opaque pointers. This patch does not change the vectorizer behavior, just adjusts the cost of the vector shuffles ops.

It crashes when calculating the cost with

Assertion failed: (idx < size()), function operator[], file SmallVector.h, line 273.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments:  clang++ -DNDEBUG -O3 -O3 -DNDEBUG -isysroot ...
1.	<eof> parser at end of file
2.	Optimizer
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang-15                 0x000000010a03f8c7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 39
1  clang-15                 0x000000010a03e6f8 llvm::sys::RunSignalHandlers() + 248
2  clang-15                 0x000000010a03ed20 llvm::sys::CleanupOnSignal(unsigned long) + 208
3  clang-15                 0x0000000109f6417a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106
4  clang-15                 0x0000000109f6435e CrashRecoverySignalHandler(int) + 110
5  libsystem_platform.dylib 0x00007ff808cfddfd _sigtramp + 29
6  libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603370433094176
7  libsystem_c.dylib        0x00007ff808c33d24 abort + 123
8  libsystem_c.dylib        0x00007ff808c330cb err + 0
9  clang-15                 0x000000010d263f33 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) (.cold.9) + 35
10 clang-15                 0x00000001091758c4 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) + 2244
11 clang-15                 0x000000010886d07b llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) + 1755
12 clang-15                 0x00000001088666a7 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) + 4551
13 clang-15                 0x000000010913f4b2 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const + 18
14 clang-15                 0x0000000109d40056 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::TargetTransformInfo::TargetCostKind) const + 214
15 clang-15                 0x0000000108f97ed5 (anonymous namespace)::CallAnalyzer::visitInstruction(llvm::Instruction&) + 37

In D100486#3480011, @fhahn wrote:

In D100486#3479994, @ABataev wrote:

In D100486#3479989, @fhahn wrote:

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

Hi, most probably the crash is caused by the opaque pointers. This patch does not change the vectorizer behavior, just adjusts the cost of the vector shuffles ops.

It crashes when calculating the cost with

Assertion failed: (idx < size()), function operator[], file SmallVector.h, line 273.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments:  clang++ -DNDEBUG -O3 -O3 -DNDEBUG -isysroot ...
1.	<eof> parser at end of file
2.	Optimizer
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang-15                 0x000000010a03f8c7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 39
1  clang-15                 0x000000010a03e6f8 llvm::sys::RunSignalHandlers() + 248
2  clang-15                 0x000000010a03ed20 llvm::sys::CleanupOnSignal(unsigned long) + 208
3  clang-15                 0x0000000109f6417a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106
4  clang-15                 0x0000000109f6435e CrashRecoverySignalHandler(int) + 110
5  libsystem_platform.dylib 0x00007ff808cfddfd _sigtramp + 29
6  libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603370433094176
7  libsystem_c.dylib        0x00007ff808c33d24 abort + 123
8  libsystem_c.dylib        0x00007ff808c330cb err + 0
9  clang-15                 0x000000010d263f33 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) (.cold.9) + 35
10 clang-15                 0x00000001091758c4 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) + 2244
11 clang-15                 0x000000010886d07b llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) + 1755
12 clang-15                 0x00000001088666a7 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) + 4551
13 clang-15                 0x000000010913f4b2 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const + 18
14 clang-15                 0x0000000109d40056 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::TargetTransformInfo::TargetCostKind) const + 214
15 clang-15                 0x0000000108f97ed5 (anonymous namespace)::CallAnalyzer::visitInstruction(llvm::Instruction&) + 37

Ok, the reproducer should help.

In D100486#3480011, @fhahn wrote:

In D100486#3479994, @ABataev wrote:

In D100486#3479989, @fhahn wrote:

Just a heads up that with this change building llvm-test-suite/SingleSource/UnitTests/matrix-types-spec.cpp crashes on X86. I'll see if I can extract a reproducer.

Hi, most probably the crash is caused by the opaque pointers. This patch does not change the vectorizer behavior, just adjusts the cost of the vector shuffles ops.

It crashes when calculating the cost with

Assertion failed: (idx < size()), function operator[], file SmallVector.h, line 273.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments:  clang++ -DNDEBUG -O3 -O3 -DNDEBUG -isysroot ...
1.	<eof> parser at end of file
2.	Optimizer
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  clang-15                 0x000000010a03f8c7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 39
1  clang-15                 0x000000010a03e6f8 llvm::sys::RunSignalHandlers() + 248
2  clang-15                 0x000000010a03ed20 llvm::sys::CleanupOnSignal(unsigned long) + 208
3  clang-15                 0x0000000109f6417a (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) + 106
4  clang-15                 0x0000000109f6435e CrashRecoverySignalHandler(int) + 110
5  libsystem_platform.dylib 0x00007ff808cfddfd _sigtramp + 29
6  libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603370433094176
7  libsystem_c.dylib        0x00007ff808c33d24 abort + 123
8  libsystem_c.dylib        0x00007ff808c330cb err + 0
9  clang-15                 0x000000010d263f33 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) (.cold.9) + 35
10 clang-15                 0x00000001091758c4 llvm::processShuffleMasks(llvm::ArrayRef<int>, unsigned int, unsigned int, unsigned int, llvm::function_ref<void ()>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>, llvm::function_ref<void (llvm::ArrayRef<int>, unsigned int, unsigned int)>) + 2244
11 clang-15                 0x000000010886d07b llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) + 1755
12 clang-15                 0x00000001088666a7 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) + 4551
13 clang-15                 0x000000010913f4b2 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const + 18
14 clang-15                 0x0000000109d40056 llvm::TargetTransformInfo::getUserCost(llvm::User const*, llvm::TargetTransformInfo::TargetCostKind) const + 214
15 clang-15                 0x0000000108f97ed5 (anonymous namespace)::CallAnalyzer::visitInstruction(llvm::Instruction&) + 37

Could you just share your compile options for the source file compilation?

Reproducer below crashes with bin/opt -passes="print<cost-model>"

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin"

define void @test() {
entry:
  %matins.2.2 = shufflevector <9 x double> undef, <9 x double> undef, <9 x i32> <i32 0, i32 3, i32 6, i32 1, i32 4, i32 7, i32 2, i32 5, i32 8>
  ret void
}

In D100486#3480073, @fhahn wrote:

Reproducer below crashes with bin/opt -passes="print<cost-model>"

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin"

define void @test() {
entry:
  %matins.2.2 = shufflevector <9 x double> undef, <9 x double> undef, <9 x i32> <i32 0, i32 3, i32 6, i32 1, i32 4, i32 7, i32 2, i32 5, i32 8>
  ret void
}

Thanks, will revert the patch to fix the bug.

ABataev added a reverting change: rG9861ca0c23a6: Revert "[COST]Improve cost model for shuffles in SLP.".Apr 28 2022, 8:12 AM

ABataev added a commit: rG75e1cf4a6a87: [COST]Improve cost model for shuffles in SLP..Apr 28 2022, 10:06 AM

Hi Alexey. Here is another crash reproducer:

; bin/opt -mcpu=corei7-avx -passes="print<cost-model>" -S test.ll

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local <12 x i64> @foo(<12 x i64> noundef %src) {
entry:

%shuffle = shufflevector <12 x i64> %src, <12 x i64> poison, <12 x i32> <i32 0, i32 3, i32 6, i32 9, i32 1, i32 4, i32 7, i32 10, i32 2, i32 5, i32 8, i32 11>
ret <12 x i64> %shuffle

}

In D100486#3481545, @vdmitrie wrote:
Hi Alexey. Here is another crash reproducer:

; bin/opt -mcpu=corei7-avx -passes="print<cost-model>" -S test.ll

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define dso_local <12 x i64> @foo(<12 x i64> noundef %src) {
entry:
%shuffle = shufflevector <12 x i64> %src, <12 x i64> poison, <12 x i32> <i32 0, i32 3, i32 6, i32 9, i32 1, i32 4, i32 7, i32 10, i32 2, i32 5, i32 8, i32 11>
ret <12 x i64> %shuffle
}

Fixed in 371412e065a63107d5d79330da6757ff693d91cc

jgorbe added a subscriber: jgorbe.May 5 2022, 11:14 AM

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

In D100486#3494581, @asbirlea wrote:

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

In D100486#3494746, @ABataev wrote:

In D100486#3494581, @asbirlea wrote:

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

The reproducer is open source: https://gitlab.com/libeigen/eigen
Regressions are on haswell platform, xfdo configuration; the compiler used is bootstrapped against itself; revision tested includes this patch and all follow-up changes to the SLPVectorizer including D114171 and excluding D115750.
A few number samples:
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_16x64_ 16%
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_64x64_ 17%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_64x64_ 15%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_16x64_ 14%

In D100486#3495157, @asbirlea wrote:

In D100486#3494746, @ABataev wrote:

In D100486#3494581, @asbirlea wrote:

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

The reproducer is open source: https://gitlab.com/libeigen/eigen
Regressions are on haswell platform, xfdo configuration; the compiler used is bootstrapped against itself; revision tested includes this patch and all follow-up changes to the SLPVectorizer including D114171 and excluding D115750.
A few number samples:
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_16x64_ 16%
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_64x64_ 17%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_64x64_ 15%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_16x64_ 14%

Could you at least extract the llvm ir code for one of these functions before the changes and after, so I coukd at least compare them?

In D100486#3495174, @ABataev wrote:

In D100486#3495157, @asbirlea wrote:

In D100486#3494746, @ABataev wrote:

In D100486#3494581, @asbirlea wrote:

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

The reproducer is open source: https://gitlab.com/libeigen/eigen
Regressions are on haswell platform, xfdo configuration; the compiler used is bootstrapped against itself; revision tested includes this patch and all follow-up changes to the SLPVectorizer including D114171 and excluding D115750.
A few number samples:
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_16x64_ 16%
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_64x64_ 17%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_64x64_ 15%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_16x64_ 14%

Could you at least extract the llvm ir code for one of these functions before the changes and after, so I coukd at least compare them?

Here are the only differences I see for dynamic, 64x64; base is before the patch, experiment includes this - revert+recommit -, https://reviews.llvm.org/rG371412e065a63107d5d79330da6757ff693d91cc, https://reviews.llvm.org/D114171 and the NFCs.
Looks like the change is in the presence of poison:

dumpslp_base_eigencomplex13 KBDownload

dumpslp_exp_eigencomplex13 KBDownload

Another 30-40% regression on a non-public benchmark was root-caused to https://reviews.llvm.org/D114171, so investigating that.

In D100486#3504277, @asbirlea wrote:

In D100486#3495174, @ABataev wrote:

In D100486#3495157, @asbirlea wrote:

In D100486#3494746, @ABataev wrote:

In D100486#3494581, @asbirlea wrote:

We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?

Most probably, some overoptimistic cost model estimations turned on some extra vectorization. The reproducer will help, if you can provide it.

The reproducer is open source: https://gitlab.com/libeigen/eigen
Regressions are on haswell platform, xfdo configuration; the compiler used is bootstrapped against itself; revision tested includes this patch and all follow-up changes to the SLPVectorizer including D114171 and excluding D115750.
A few number samples:
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_16x64_ 16%
BM_MatrixVectorMultiply_Complex_EigenRowMajorDynamic_float_64x64_ 17%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_64x64_ 15%
BM_MatrixVectorMultiply_Complex_EigenRowMajorFixed_float_16x64_ 14%

Could you at least extract the llvm ir code for one of these functions before the changes and after, so I coukd at least compare them?

Here are the only differences I see for dynamic, 64x64; base is before the patch, experiment includes this - revert+recommit -, https://reviews.llvm.org/rG371412e065a63107d5d79330da6757ff693d91cc, https://reviews.llvm.org/D114171 and the NFCs.
Looks like the change is in the presence of poison:

dumpslp_base_eigencomplex13 KBDownload

dumpslp_exp_eigencomplex13 KBDownload

Another 30-40% regression on a non-public benchmark was root-caused to https://reviews.llvm.org/D114171, so investigating that.

Hmm, I don't see the difference, actually, just changed the order of evaluation and lanes switched, no actual difference in IR. Most probably, the regression is caused by something else, maybe some lowering passes were tweaked for the previous order but not for the current.

Following up on the regression I was seeing with https://reviews.llvm.org/D114171, the only meaningful IR change seems to be:
Before the patch (after slp):

%65 = shufflevector <2 x float> %9, <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%66 = insertelement <4 x float> %65, float %16, i32 2  
%67 = insertelement <4 x float> %66, float %15, i32 3
%68 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %67)
%69 = fcmp oeq <4 x float> %68, <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
%70 = freeze <4 x i1> %69 
%71 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %70)

After the patch (after slp)

%65 = shufflevector <2 x float> %9, <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>  
%66 = shufflevector <4 x float> poison, <4 x float> %65, <4 x i32> <i32 4, i32 5, i32 2, i32 3> 
%67 = insertelement <4 x float> %66, float %16, i32 2  
%68 = insertelement <4 x float> %67, float %15, i32 3
%69 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %68)
%70 = fcmp oeq <4 x float> %69, <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
%71 = freeze <4 x i1> %70  
%72 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %71)

I'm not clear why the order of 0,1 in the shufflevector changed or the purpose of the second shufflevector adding poison on positions that will be overwritten, but the operations that follow seem to not affected by these changes.
Just providing an update, and not digging further into it at this time.

In D100486#3517448, @asbirlea wrote:
Following up on the regression I was seeing with https://reviews.llvm.org/D114171, the only meaningful IR change seems to be:
Before the patch (after slp):
%65 = shufflevector <2 x float> %9, <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%66 = insertelement <4 x float> %65, float %16, i32 2  
%67 = insertelement <4 x float> %66, float %15, i32 3
%68 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %67)
%69 = fcmp oeq <4 x float> %68, <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
%70 = freeze <4 x i1> %69 
%71 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %70)
After the patch (after slp)
%65 = shufflevector <2 x float> %9, <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>  
%66 = shufflevector <4 x float> poison, <4 x float> %65, <4 x i32> <i32 4, i32 5, i32 2, i32 3> 
%67 = insertelement <4 x float> %66, float %16, i32 2  
%68 = insertelement <4 x float> %67, float %15, i32 3
%69 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %68)
%70 = fcmp oeq <4 x float> %69, <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
%71 = freeze <4 x i1> %70  
%72 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %71)
I'm not clear why the order of 0,1 in the shufflevector changed or the purpose of the second shufflevector adding poison on positions that will be overwritten, but the operations that follow seem to not affected by these changes.
Just providing an update, and not digging further into it at this time.

I'm still working on the reductions improvements. Hope to fix all these regressions.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

72 lines

Transforms/

Vectorize/

SLPVectorizer.cpp

51 lines

test/

Analysis/

CostModel/

X86/

reduction.ll

168 lines

shuffle-single-src.ll

100 lines

Transforms/

SLPVectorizer/

AArch64/

PR38339.ll

36 lines

horizontal.ll

2 lines

X86/

PR39774.ll

2 lines

bad_types.ll

9 lines

load-merge-inseltpoison.ll

14 lines

load-merge.ll

14 lines

minimum-sizes.ll

51 lines

pr35497.ll

13 lines

pr47629-inseltpoison.ll

86 lines

pr47629.ll

86 lines

remark_extract_broadcast.ll

2 lines

vec_list_bias-inseltpoison.ll

19 lines

vec_list_bias.ll

19 lines

Diff 356710

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show All 33 Lines
/// Nehalem 7-14 7-18 3		/// Nehalem 7-14 7-18 3
/// Haswell 10-13 11 5		/// Haswell 10-13 11 5
/// TODO: Develop and implement the target dependent cost model and		/// TODO: Develop and implement the target dependent cost model and
/// specialize cost numbers for different Cost Model Targets such as throughput,		/// specialize cost numbers for different Cost Model Targets such as throughput,
/// code size, latency and uop count.		/// code size, latency and uop count.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "X86TargetTransformInfo.h"		#include "X86TargetTransformInfo.h"
		#include "llvm/ADT/SmallBitVector.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/CodeGen/BasicTTIImpl.h"		#include "llvm/CodeGen/BasicTTIImpl.h"
#include "llvm/CodeGen/CostTable.h"		#include "llvm/CodeGen/CostTable.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 1,131 Lines • ▼ Show 20 Lines	if (LegalVT.isVector() &&
// Number of source vectors after legalization:		// Number of source vectors after legalization:
unsigned NumOfSrcs = (VecTySize + LegalVTSize - 1) / LegalVTSize;		unsigned NumOfSrcs = (VecTySize + LegalVTSize - 1) / LegalVTSize;
// Number of destination vectors after legalization:		// Number of destination vectors after legalization:
InstructionCost NumOfDests = LT.first;		InstructionCost NumOfDests = LT.first;

auto *SingleOpTy = FixedVectorType::get(BaseTp->getElementType(),		auto *SingleOpTy = FixedVectorType::get(BaseTp->getElementType(),
LegalVT.getVectorNumElements());		LegalVT.getVectorNumElements());

		if (!Mask.empty() && NumOfDests.isValid()) {
		// Try to perform better estimation of the permutation.
		// 1. Split the source/destination vectors into real registers.
		RKSimonUnsubmitted Not Done Reply Inline Actions We already have something similar in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE - do you think we could have a single version of the code some place? RKSimon: We already have something similar in DAGTypeLegalizer::SplitVecRes_VECTOR_SHUFFLE - do you…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will check if can merge it somehow. ABataev: Will check if can merge it somehow.
		// 2. Do the mask analysis to identify which real registers are
		// permuted. If more than 1 source registers are used for the
		// destination register building, the cost for this destination register
		// is (Number_of_source_register - 1) * Cost_PermuteTwoSrc. If only one
		// source register is used, build mask and calculate the cost as a cost
		// of PermuteSingleSrc.
		// Also, for the single register permute we try to identify if the
		// destination register is just a copy of the source register or the
		// copy of the previous destination register (the cost is
		// TTI::TCC_Basic). If the source register is just reused, the cost for
		// this operation is 0.
		InstructionCost Cost = 0;
		int Sz = Mask.size();
		unsigned E = *NumOfDests.getValue();
		unsigned SzDest = Sz / E;
		unsigned SzSrc = Sz / NumOfSrcs;
		unsigned PrevSrcReg = 0;
		SmallVector<int> PrevRegMask;
		for (unsigned I = 0; I < E; ++I) {
		SmallVector<int> RegMask(SzDest, UndefMaskElem);
		SmallBitVector UsedSrcRegs(NumOfSrcs, false);
		// Check that the values in dest registers are in the one src
		// register.
		for (unsigned K = 0; K < SzDest; ++K) {
		int Idx = I * SzDest + K;
		if (Idx == Sz)
		break;
		if (Mask[Idx] >= Sz \|\| Mask[Idx] == UndefMaskElem)
		continue;
		int SrcRegIdx = Mask[Idx] / SzSrc;
		// Add a cost of PermuteTwoSrc for each new source register permute,
		// if we have more than one source registers.
		UsedSrcRegs.set(SrcRegIdx);
		RegMask[K] = Mask[Idx] % SzSrc;
		}
		int NumSrcRegs = UsedSrcRegs.count();
		// NumSrcRegs might be 0 if the mask is just undef.
		if (NumSrcRegs == 1) {
		// Analysis of the single source register permutation.
		unsigned SrcReg = UsedSrcRegs.find_first();
		if (!ShuffleVectorInst::isIdentityMask(RegMask)) {
		// Check if the previous register can be just copied to the next
		// one.
		if (PrevRegMask.empty() \|\| PrevSrcReg != SrcReg \|\|
		PrevRegMask != RegMask)
		Cost += getShuffleCost(TTI::SK_PermuteSingleSrc, SingleOpTy,
		RegMask, 0, nullptr);
		else
		// Just a copy of previous destination register.
		Cost += TTI::TCC_Basic;
		} else if (SrcReg != I && any_of(RegMask, [](int I) {
		return I != UndefMaskElem;
		})) {
		// Just a copy of the source register.
		Cost += TTI::TCC_Basic;
		}
		PrevSrcReg = SrcReg;
		PrevRegMask.swap(RegMask);
		} else if (NumSrcRegs > 1) {
		// TODO: Add analysis of regmask for better cost estimation.
		Cost +=
		(NumSrcRegs - 1) * getShuffleCost(TTI::SK_PermuteTwoSrc,
		SingleOpTy, None, 0, nullptr);
		}
		}
		return Cost;
		}

InstructionCost NumOfShuffles = (NumOfSrcs - 1) * NumOfDests;		InstructionCost NumOfShuffles = (NumOfSrcs - 1) * NumOfDests;
return NumOfShuffles * getShuffleCost(TTI::SK_PermuteTwoSrc, SingleOpTy,		return NumOfShuffles * getShuffleCost(TTI::SK_PermuteTwoSrc, SingleOpTy,
None, 0, nullptr);		None, 0, nullptr);
}		}

return BaseT::getShuffleCost(Kind, BaseTp, Mask, Index, SubTp);		return BaseT::getShuffleCost(Kind, BaseTp, Mask, Index, SubTp);
}		}

▲ Show 20 Lines • Show All 3,962 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,573 Lines • ▼ Show 20 Lines	private:
/// Vectorize a single entry in the tree.		/// Vectorize a single entry in the tree.
Value vectorizeTree(TreeEntry E);		Value vectorizeTree(TreeEntry E);

/// Vectorize a single entry in the tree, starting in \p VL.		/// Vectorize a single entry in the tree, starting in \p VL.
Value vectorizeTree(ArrayRef<Value > VL);		Value vectorizeTree(ArrayRef<Value > VL);

/// \returns the scalarization cost for this type. Scalarization in this		/// \returns the scalarization cost for this type. Scalarization in this
/// context means the creation of vectors from a group of scalars.		/// context means the creation of vectors from a group of scalars.
InstructionCost		InstructionCost getGatherCost(FixedVectorType *Ty, ArrayRef<int> Mask) const;
getGatherCost(FixedVectorType *Ty,
const DenseSet<unsigned> &ShuffledIndices) const;

/// Checks if the gathered \p VL can be represented as shuffle(s) of previous		/// Checks if the gathered \p VL can be represented as shuffle(s) of previous
/// tree entries.		/// tree entries.
/// \returns ShuffleKind, if gathered values can be represented as shuffles of		/// \returns ShuffleKind, if gathered values can be represented as shuffles of
/// previous tree entries. \p Mask is filled with the shuffle mask.		/// previous tree entries. \p Mask is filled with the shuffle mask.
Optional<TargetTransformInfo::ShuffleKind>		Optional<TargetTransformInfo::ShuffleKind>
isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,		isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,
SmallVectorImpl<const TreeEntry *> &Entries);		SmallVectorImpl<const TreeEntry *> &Entries);
▲ Show 20 Lines • Show All 1,982 Lines • ▼ Show 20 Lines	computeExtractCost(ArrayRef<Value > VL, FixedVectorType VecTy,
bool AllConsecutive = true;		bool AllConsecutive = true;
unsigned EltsPerVector = VecTy->getNumElements() / NumOfParts;		unsigned EltsPerVector = VecTy->getNumElements() / NumOfParts;
unsigned Idx = -1;		unsigned Idx = -1;
InstructionCost Cost = 0;		InstructionCost Cost = 0;

// Process extracts in blocks of EltsPerVector to check if the source vector		// Process extracts in blocks of EltsPerVector to check if the source vector
// operand can be re-used directly. If not, add the cost of creating a shuffle		// operand can be re-used directly. If not, add the cost of creating a shuffle
// to extract the values into a vector register.		// to extract the values into a vector register.
		SmallVector<int> RegMask(EltsPerVector, UndefMaskElem);
for (auto *V : VL) {		for (auto *V : VL) {
++Idx;		++Idx;

// Reached the start of a new vector registers.		// Reached the start of a new vector registers.
if (Idx % EltsPerVector == 0) {		if (Idx % EltsPerVector == 0) {
AllConsecutive = true;		AllConsecutive = true;
continue;		continue;
}		}

// Check all extracts for a vector register on the target directly		// Check all extracts for a vector register on the target directly
// extract values in order.		// extract values in order.
unsigned CurrentIdx = *getExtractIndex(cast<Instruction>(V));		unsigned CurrentIdx = *getExtractIndex(cast<Instruction>(V));
unsigned PrevIdx = *getExtractIndex(cast<Instruction>(VL[Idx - 1]));		unsigned PrevIdx = *getExtractIndex(cast<Instruction>(VL[Idx - 1]));
AllConsecutive &= PrevIdx + 1 == CurrentIdx &&		AllConsecutive &= PrevIdx + 1 == CurrentIdx &&
CurrentIdx % EltsPerVector == Idx % EltsPerVector;		CurrentIdx % EltsPerVector == Idx % EltsPerVector;
		RegMask[Idx % EltsPerVector] = CurrentIdx % EltsPerVector;

if (AllConsecutive)		if (AllConsecutive)
continue;		continue;

// Skip all indices, except for the last index per vector block.		// Skip all indices, except for the last index per vector block.
if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())		if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())
continue;		continue;

// If we have a series of extracts which are not consecutive and hence		// If we have a series of extracts which are not consecutive and hence
// cannot re-use the source vector register directly, compute the shuffle		// cannot re-use the source vector register directly, compute the shuffle
// cost to extract the a vector with EltsPerVector elements.		// cost to extract the a vector with EltsPerVector elements.
Cost += TTI.getShuffleCost(		Cost += TTI.getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc,		TargetTransformInfo::SK_PermuteSingleSrc,
FixedVectorType::get(VecTy->getElementType(), EltsPerVector));		FixedVectorType::get(VecTy->getElementType(), EltsPerVector), RegMask);
		RegMask.assign(EltsPerVector, UndefMaskElem);
}		}
return Cost;		return Cost;
}		}

InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,		InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
ArrayRef<Value *> VectorizedVals) {		ArrayRef<Value *> VectorizedVals) {
ArrayRef<Value*> VL = E->Scalars;		ArrayRef<Value*> VL = E->Scalars;

Show All 12 Lines	InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
if (MinBWs.count(VL[0]))		if (MinBWs.count(VL[0]))
VecTy = FixedVectorType::get(		VecTy = FixedVectorType::get(
IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());		IntegerType::get(F->getContext(), MinBWs[VL[0]].first), VL.size());

unsigned ReuseShuffleNumbers = E->ReuseShuffleIndices.size();		unsigned ReuseShuffleNumbers = E->ReuseShuffleIndices.size();
bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
InstructionCost ReuseShuffleCost = 0;		InstructionCost ReuseShuffleCost = 0;
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
		auto *FinalVecTy =
		FixedVectorType::get(VecTy->getElementType(), ReuseShuffleNumbers);
ReuseShuffleCost =		ReuseShuffleCost =
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy,		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
E->ReuseShuffleIndices);		FinalVecTy, E->ReuseShuffleIndices);
		ABataevAuthorUnsubmitted Done Reply Inline Actions We incorrectly calculate the cost here (using the wrong vector type), with this fix and without this patch at least 4 X86 tests are not vectorized anymore. ABataev: We incorrectly calculate the cost here (using the wrong vector type), with this fix and without…
}		}
// FIXME: it tries to fix a problem with MSVC buildbots.		// FIXME: it tries to fix a problem with MSVC buildbots.
TargetTransformInfo &TTIRef = *TTI;		TargetTransformInfo &TTIRef = *TTI;
auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy,		auto &&AdjustExtractsCost = [this, &TTIRef, CostKind, VL, VecTy,
VectorizedVals](InstructionCost &Cost,		VectorizedVals](InstructionCost &Cost,
bool IsGather) {		bool IsGather) {
DenseMap<Value *, int> ExtractVectorsTys;		DenseMap<Value *, int> ExtractVectorsTys;
for (auto *V : VL) {		for (auto *V : VL) {
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	if (Shuffle.hasValue()) {
GatherCost = TTI->getShuffleCost(*Shuffle, VecTy, Mask);		GatherCost = TTI->getShuffleCost(*Shuffle, VecTy, Mask);
}		}
return ReuseShuffleCost + GatherCost;		return ReuseShuffleCost + GatherCost;
}		}
if (isSplat(VL)) {		if (isSplat(VL)) {
// Found the broadcasting of the single scalar, calculate the cost as the		// Found the broadcasting of the single scalar, calculate the cost as the
// broadcast.		// broadcast.
return ReuseShuffleCost +		return ReuseShuffleCost +
TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy, None,		TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy);
0);
}		}
if (E->getOpcode() == Instruction::ExtractElement && allSameType(VL) &&		if (E->getOpcode() == Instruction::ExtractElement && allSameType(VL) &&
allSameBlock(VL) &&		allSameBlock(VL) &&
!isa<ScalableVectorType>(		!isa<ScalableVectorType>(
cast<ExtractElementInst>(E->getMainOp())->getVectorOperandType())) {		cast<ExtractElementInst>(E->getMainOp())->getVectorOperandType())) {
// Check that gather of extractelements can be represented as just a		// Check that gather of extractelements can be represented as just a
// shuffle of a single/two vectors the scalars are extracted from.		// shuffle of a single/two vectors the scalars are extracted from.
SmallVector<int> Mask;		SmallVector<int> Mask;
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	case Instruction::ExtractElement: {
}		}
CommonCost = ReuseShuffleCost;		CommonCost = ReuseShuffleCost;
} else if (!E->ReorderIndices.empty()) {		} else if (!E->ReorderIndices.empty()) {
SmallVector<int> NewMask;		SmallVector<int> NewMask;
inversePermutation(E->ReorderIndices, NewMask);		inversePermutation(E->ReorderIndices, NewMask);
CommonCost = TTI->getShuffleCost(		CommonCost = TTI->getShuffleCost(
TargetTransformInfo::SK_PermuteSingleSrc, VecTy, NewMask);		TargetTransformInfo::SK_PermuteSingleSrc, VecTy, NewMask);
}		}
if (ShuffleOrOp == Instruction::ExtractValue) {		if (ShuffleOrOp == Instruction::ExtractValue) {
for (unsigned I = 0, E = VL.size(); I < E; ++I) {		for (unsigned I = 0, E = VL.size(); I < E; ++I) {
		sdesmalenUnsubmitted Not Done Reply Inline Actions Can the finding of a more specific ShuffleKind be done by getShuffleCost when a Mask is given? It seems a bit inconvenient to have to do that manually before calling this function. sdesmalen: Can the finding of a more specific ShuffleKind be done by getShuffleCost when a Mask is given?
		RKSimonUnsubmitted Not Done Reply Inline Actions I made a similar comment on D100495 - we could do with a generic 'ShuffleKind' decoder helper function (e.g. in Analysis\VectorUtils.h) that everybody can use. RKSimon: I made a similar comment on D100495 - we could do with a generic 'ShuffleKind' decoder helper…
		ABataevAuthorUnsubmitted Done Reply Inline Actions In this case, we need to update `getShuffleCost` function for all the targets to translate the mask. Is it ok if I'll do it in this patch? ABataev: In this case, we need to update `getShuffleCost` function for all the targets to translate the…
		sdesmalenUnsubmitted Not Done Reply Inline Actions There's already a few different changes in this patch (AArch64 cost-model, X86 cost-model and changes to the SLPVectorizer), so I think it makes more sense to do this in a separate patch. sdesmalen: There's already a few different changes in this patch (AArch64 cost-model, X86 cost-model and…
auto *EI = cast<Instruction>(VL[I]);		auto *EI = cast<Instruction>(VL[I]);
// Take credit for instruction that will become dead.		// Take credit for instruction that will become dead.
if (EI->hasOneUse()) {		if (EI->hasOneUse()) {
Instruction *Ext = EI->user_back();		Instruction *Ext = EI->user_back();
if ((isa<SExtInst>(Ext) \|\| isa<ZExtInst>(Ext)) &&		if ((isa<SExtInst>(Ext) \|\| isa<ZExtInst>(Ext)) &&
all_of(Ext->users(),		all_of(Ext->users(),
[](User *U) { return isa<GetElementPtrInst>(U); })) {		[](User *U) { return isa<GetElementPtrInst>(U); })) {
// Use getExtractWithExtendCost() to calculate the cost of		// Use getExtractWithExtendCost() to calculate the cost of
▲ Show 20 Lines • Show All 345 Lines • ▼ Show 20 Lines	case Instruction::ShuffleVector: {

SmallVector<int> Mask(E->Scalars.size());		SmallVector<int> Mask(E->Scalars.size());
for (unsigned I = 0, End = E->Scalars.size(); I < End; ++I) {		for (unsigned I = 0, End = E->Scalars.size(); I < End; ++I) {
auto *OpInst = cast<Instruction>(E->Scalars[I]);		auto *OpInst = cast<Instruction>(E->Scalars[I]);
assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");		assert(E->isOpcodeOrAlt(OpInst) && "Unexpected main/alternate opcode");
Mask[I] = I + (OpInst->getOpcode() == E->getAltOpcode() ? End : 0);		Mask[I] = I + (OpInst->getOpcode() == E->getAltOpcode() ? End : 0);
}		}
VecCost +=		VecCost +=
TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, Mask, 0);		TTI->getShuffleCost(TargetTransformInfo::SK_Select, VecTy, Mask);
LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, ReuseShuffleCost, VecCost, ScalarCost));
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
default:		default:
llvm_unreachable("Unknown instruction");		llvm_unreachable("Unknown instruction");
}		}
}		}

▲ Show 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	BoUpSLP::isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,
case 2:		case 2:
return TargetTransformInfo::SK_PermuteTwoSrc;		return TargetTransformInfo::SK_PermuteTwoSrc;
default:		default:
break;		break;
}		}
return None;		return None;
}		}

InstructionCost		InstructionCost BoUpSLP::getGatherCost(FixedVectorType *Ty,
BoUpSLP::getGatherCost(FixedVectorType *Ty,		ArrayRef<int> Mask) const {
const DenseSet<unsigned> &ShuffledIndices) const {		int NumElts = Ty->getNumElements();
unsigned NumElts = Ty->getNumElements();		APInt DemandedElts(NumElts, 0);
APInt DemandedElts = APInt::getNullValue(NumElts);		for_each(Mask, [&DemandedElts](int Idx) {
for (unsigned I = 0; I < NumElts; ++I)		if (Idx > 0)
if (!ShuffledIndices.count(I))		DemandedElts.setBit(Idx);
DemandedElts.setBit(I);		});
InstructionCost Cost =		InstructionCost Cost =
TTI->getScalarizationOverhead(Ty, DemandedElts, /Insert/ true,		TTI->getScalarizationOverhead(Ty, DemandedElts, /Insert/ true,
/Extract/ false);		/Extract/ false);
if (!ShuffledIndices.empty())		if (!ShuffleVectorInst::isIdentityMask(Mask))
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, Ty);		Cost +=
		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, Ty, Mask);
return Cost;		return Cost;
}		}

InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL) const {		InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL) const {
// Find the type of the operands in VL.		// Find the type of the operands in VL.
Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());		auto *VecTy = FixedVectorType::get(ScalarTy, VL.size());
// Find the cost of inserting/extracting values from the vector.		// Find the cost of inserting/extracting values from the vector.
// Check if the same elements are inserted several times and count them as		// Check if the same elements are inserted several times and count them as
// shuffle candidates.		// shuffle candidates.
DenseSet<unsigned> ShuffledElements;		DenseMap<Value *, unsigned> UniqueElements;
DenseSet<Value *> UniqueElements;		SmallVector<int> Mask(VL.size(), UndefMaskElem);
// Iterate in reverse order to consider insert elements with the high cost.		// Iterate in reverse order to consider insert elements with the high cost.
for (unsigned I = VL.size(); I > 0; --I) {		for (unsigned I = VL.size(); I > 0; --I) {
unsigned Idx = I - 1;		unsigned Idx = I - 1;
if (isConstant(VL[Idx]))		if (isConstant(VL[Idx]))
continue;		continue;
if (!UniqueElements.insert(VL[Idx]).second)		auto Res = UniqueElements.try_emplace(VL[Idx], Idx);
ShuffledElements.insert(Idx);		Mask[Idx] = Res.first->second;
}		}
return getGatherCost(VecTy, ShuffledElements);		return getGatherCost(VecTy, Mask);
}		}

// Perform operand reordering on the instructions in VL and return the reordered		// Perform operand reordering on the instructions in VL and return the reordered
// operands in Left and Right.		// operands in Left and Right.
void BoUpSLP::reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		void BoUpSLP::reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,
SmallVectorImpl<Value *> &Left,		SmallVectorImpl<Value *> &Left,
SmallVectorImpl<Value *> &Right,		SmallVectorImpl<Value *> &Right,
const DataLayout &DL,		const DataLayout &DL,
▲ Show 20 Lines • Show All 3,927 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/reduction.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	;
%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7		%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7

%r = extractelement <4 x float> %bin.rdx8, i32 0		%r = extractelement <4 x float> %bin.rdx8, i32 0
ret float %r		ret float %r
}		}

define fastcc i32 @reduction_cost_int(<8 x i32> %rdx) {		define fastcc i32 @reduction_cost_int(<8 x i32> %rdx) {
; SSE-LABEL: 'reduction_cost_int'		; SSE-LABEL: 'reduction_cost_int'
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
		RKSimonUnsubmitted Not Done Reply Inline Actions Something might be still going wrong here - SSE max legal type is v4i32 so this cost should be 0 as its referencing a single existing vector RKSimon: Something might be still going wrong here - SSE max legal type is v4i32 so this cost should be…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'm kind of pessimistic here and if the dest reg is not the same as src, I consider it as a reg copy and add cost TCC_Basic. ABataev: I'm kind of pessimistic here and if the dest reg is not the same as src, I consider it as a reg…
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %rdx, %rdx.shuf		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %rdx, %rdx.shuf
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx.2 = add <8 x i32> %bin.rdx, %rdx.shuf.2		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx.2 = add <8 x i32> %bin.rdx, %rdx.shuf.2
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.3 = shufflevector <8 x i32> %bin.rdx.2, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.3 = shufflevector <8 x i32> %bin.rdx.2, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx.3 = add <8 x i32> %bin.rdx.2, %rdx.shuf.3		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx.3 = add <8 x i32> %bin.rdx.2, %rdx.shuf.3
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x i32> %bin.rdx.3, i32 0		; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x i32> %bin.rdx.3, i32 0
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r		; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r
;		;
; AVX1-LABEL: 'reduction_cost_int'		; AVX1-LABEL: 'reduction_cost_int'
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = add <8 x i32> %rdx, %rdx.shuf		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = add <8 x i32> %rdx, %rdx.shuf
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
Show All 9 Lines
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx.2 = add <8 x i32> %bin.rdx, %rdx.shuf.2		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx.2 = add <8 x i32> %bin.rdx, %rdx.shuf.2
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.3 = shufflevector <8 x i32> %bin.rdx.2, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.3 = shufflevector <8 x i32> %bin.rdx.2, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx.3 = add <8 x i32> %bin.rdx.2, %rdx.shuf.3		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx.3 = add <8 x i32> %bin.rdx.2, %rdx.shuf.3
; AVX2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <8 x i32> %bin.rdx.3, i32 0		; AVX2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <8 x i32> %bin.rdx.3, i32 0
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r
;		;
; SLM-LABEL: 'reduction_cost_int'		; SLM-LABEL: 'reduction_cost_int'
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %rdx, %rdx.shuf		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %rdx, %rdx.shuf
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx.2 = add <8 x i32> %bin.rdx, %rdx.shuf.2		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx.2 = add <8 x i32> %bin.rdx, %rdx.shuf.2
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.3 = shufflevector <8 x i32> %bin.rdx.2, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.3 = shufflevector <8 x i32> %bin.rdx.2, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx.3 = add <8 x i32> %bin.rdx.2, %rdx.shuf.3		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx.3 = add <8 x i32> %bin.rdx.2, %rdx.shuf.3
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x i32> %bin.rdx.3, i32 0		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x i32> %bin.rdx.3, i32 0
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r
;		;
%rdx.shuf = shufflevector <8 x i32> %rdx, <8 x i32> undef,		%rdx.shuf = shufflevector <8 x i32> %rdx, <8 x i32> undef,
<8 x i32> <i32 4 , i32 5, i32 6, i32 7,		<8 x i32> <i32 4 , i32 5, i32 6, i32 7,
i32 undef, i32 undef, i32 undef, i32 undef>		i32 undef, i32 undef, i32 undef, i32 undef>
%bin.rdx = add <8 x i32> %rdx, %rdx.shuf		%bin.rdx = add <8 x i32> %rdx, %rdx.shuf
▲ Show 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	;
%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7		%bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7

%r = extractelement <4 x float> %bin.rdx8, i32 0		%r = extractelement <4 x float> %bin.rdx8, i32 0
ret float %r		ret float %r
}		}

define fastcc double @no_pairwise_reduction4double(<4 x double> %rdx, double %f1) {		define fastcc double @no_pairwise_reduction4double(<4 x double> %rdx, double %f1) {
; SSE2-LABEL: 'no_pairwise_reduction4double'		; SSE2-LABEL: 'no_pairwise_reduction4double'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
; SSSE3-LABEL: 'no_pairwise_reduction4double'		; SSSE3-LABEL: 'no_pairwise_reduction4double'
; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf
; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
; SSE42-LABEL: 'no_pairwise_reduction4double'		; SSE42-LABEL: 'no_pairwise_reduction4double'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7
; SSE42-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; SSE42-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
; AVX1-LABEL: 'no_pairwise_reduction4double'		; AVX1-LABEL: 'no_pairwise_reduction4double'
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7
; AVX1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; AVX1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
; AVX2-LABEL: 'no_pairwise_reduction4double'		; AVX2-LABEL: 'no_pairwise_reduction4double'
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7
; AVX2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; AVX2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
; SLM-LABEL: 'no_pairwise_reduction4double'		; SLM-LABEL: 'no_pairwise_reduction4double'
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx, %rdx.shuf
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7
; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
%rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		%rdx.shuf = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%bin.rdx = fadd <4 x double> %rdx, %rdx.shuf		%bin.rdx = fadd <4 x double> %rdx, %rdx.shuf
%rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		%rdx.shuf7 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7		%bin.rdx8 = fadd <4 x double> %bin.rdx, %rdx.shuf7

%r = extractelement <4 x double> %bin.rdx8, i32 0		%r = extractelement <4 x double> %bin.rdx8, i32 0
ret double %r		ret double %r
}		}

define fastcc float @no_pairwise_reduction8float(<8 x float> %rdx, float %f1) {		define fastcc float @no_pairwise_reduction8float(<8 x float> %rdx, float %f1) {
; SSE2-LABEL: 'no_pairwise_reduction8float'		; SSE2-LABEL: 'no_pairwise_reduction8float'
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7
; SSE2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
; SSSE3-LABEL: 'no_pairwise_reduction8float'		; SSSE3-LABEL: 'no_pairwise_reduction8float'
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7
; SSSE3-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0		; SSSE3-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
; SSE42-LABEL: 'no_pairwise_reduction8float'		; SSE42-LABEL: 'no_pairwise_reduction8float'
; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3
; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf
; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7
; SSE42-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0		; SSE42-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
; AVX1-LABEL: 'no_pairwise_reduction8float'		; AVX1-LABEL: 'no_pairwise_reduction8float'
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
Show All 9 Lines
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7
; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0		; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
; SLM-LABEL: 'no_pairwise_reduction8float'		; SLM-LABEL: 'no_pairwise_reduction8float'
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <8 x float> %bin.rdx, %rdx.shuf7
; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0		; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <8 x float> %bin.rdx8, i32 0
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
%rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7,i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf3 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7,i32 undef, i32 undef, i32 undef, i32 undef>
%bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3		%bin.rdx4 = fadd <8 x float> %rdx, %rdx.shuf3
%rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf = shufflevector <8 x float> %bin.rdx4, <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf		%bin.rdx = fadd <8 x float> %bin.rdx4, %rdx.shuf
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	;
%bin.rdx8 = add <4 x i32> %bin.rdx, %rdx.shuf7		%bin.rdx8 = add <4 x i32> %bin.rdx, %rdx.shuf7

%r = extractelement <4 x i32> %bin.rdx8, i32 0		%r = extractelement <4 x i32> %bin.rdx8, i32 0
ret i32 %r		ret i32 %r
}		}

define fastcc i64 @no_pairwise_reduction4i64(<4 x i64> %rdx, i64 %f1) {		define fastcc i64 @no_pairwise_reduction4i64(<4 x i64> %rdx, i64 %f1) {
; SSE-LABEL: 'no_pairwise_reduction4i64'		; SSE-LABEL: 'no_pairwise_reduction4i64'
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <4 x i64> %rdx, %rdx.shuf		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <4 x i64> %rdx, %rdx.shuf
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7
; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0		; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r		; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r
;		;
; AVX1-LABEL: 'no_pairwise_reduction4i64'		; AVX1-LABEL: 'no_pairwise_reduction4i64'
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = add <4 x i64> %rdx, %rdx.shuf		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = add <4 x i64> %rdx, %rdx.shuf
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7
; AVX1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0		; AVX1-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0
; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r		; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r
;		;
; AVX2-LABEL: 'no_pairwise_reduction4i64'		; AVX2-LABEL: 'no_pairwise_reduction4i64'
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = add <4 x i64> %rdx, %rdx.shuf		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = add <4 x i64> %rdx, %rdx.shuf
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7
; AVX2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0		; AVX2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r
;		;
; SLM-LABEL: 'no_pairwise_reduction4i64'		; SLM-LABEL: 'no_pairwise_reduction4i64'
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %bin.rdx = add <4 x i64> %rdx, %rdx.shuf		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %bin.rdx = add <4 x i64> %rdx, %rdx.shuf
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7
; SLM-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0		; SLM-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r
;		;
%rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		%rdx.shuf = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%bin.rdx = add <4 x i64> %rdx, %rdx.shuf		%bin.rdx = add <4 x i64> %rdx, %rdx.shuf
%rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		%rdx.shuf7 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7		%bin.rdx8 = add <4 x i64> %bin.rdx, %rdx.shuf7
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	;
%bin.rdx8 = add <8 x i16> %bin.rdx, %rdx.shuf7		%bin.rdx8 = add <8 x i16> %bin.rdx, %rdx.shuf7

%r = extractelement <8 x i16> %bin.rdx8, i32 0		%r = extractelement <8 x i16> %bin.rdx8, i32 0
ret i16 %r		ret i16 %r
}		}

define fastcc i32 @no_pairwise_reduction8i32(<8 x i32> %rdx, i32 %f1) {		define fastcc i32 @no_pairwise_reduction8i32(<8 x i32> %rdx, i32 %f1) {
; SSE-LABEL: 'no_pairwise_reduction8i32'		; SSE-LABEL: 'no_pairwise_reduction8i32'
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf3 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = add <8 x i32> %rdx, %rdx.shuf3		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = add <8 x i32> %rdx, %rdx.shuf3
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %bin.rdx4, %rdx.shuf		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %bin.rdx4, %rdx.shuf
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf7 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <8 x i32> %bin.rdx, %rdx.shuf7		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <8 x i32> %bin.rdx, %rdx.shuf7
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x i32> %bin.rdx8, i32 0		; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x i32> %bin.rdx8, i32 0
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r		; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r
;		;
; AVX1-LABEL: 'no_pairwise_reduction8i32'		; AVX1-LABEL: 'no_pairwise_reduction8i32'
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx4 = add <8 x i32> %rdx, %rdx.shuf3		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx4 = add <8 x i32> %rdx, %rdx.shuf3
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
Show All 9 Lines
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = add <8 x i32> %bin.rdx4, %rdx.shuf		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = add <8 x i32> %bin.rdx4, %rdx.shuf
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = add <8 x i32> %bin.rdx, %rdx.shuf7		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = add <8 x i32> %bin.rdx, %rdx.shuf7
; AVX2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <8 x i32> %bin.rdx8, i32 0		; AVX2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <8 x i32> %bin.rdx8, i32 0
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r
;		;
; SLM-LABEL: 'no_pairwise_reduction8i32'		; SLM-LABEL: 'no_pairwise_reduction8i32'
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf3 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf3 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = add <8 x i32> %rdx, %rdx.shuf3		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx4 = add <8 x i32> %rdx, %rdx.shuf3
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %bin.rdx4, %rdx.shuf		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %bin.rdx4, %rdx.shuf
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf7 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf7 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <8 x i32> %bin.rdx, %rdx.shuf7		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <8 x i32> %bin.rdx, %rdx.shuf7
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x i32> %bin.rdx8, i32 0		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <8 x i32> %bin.rdx8, i32 0
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r
;		;
%rdx.shuf3 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7,i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf3 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7,i32 undef, i32 undef, i32 undef, i32 undef>
%bin.rdx4 = add <8 x i32> %rdx, %rdx.shuf3		%bin.rdx4 = add <8 x i32> %rdx, %rdx.shuf3
%rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf = shufflevector <8 x i32> %bin.rdx4, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%bin.rdx = add <8 x i32> %bin.rdx4, %rdx.shuf		%bin.rdx = add <8 x i32> %bin.rdx4, %rdx.shuf
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	;
%bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1		%bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1

%r = extractelement <4 x float> %bin.rdx8, i32 0		%r = extractelement <4 x float> %bin.rdx8, i32 0
ret float %r		ret float %r
}		}

define fastcc double @pairwise_reduction4double(<4 x double> %rdx, double %f1) {		define fastcc double @pairwise_reduction4double(<4 x double> %rdx, double %f1) {
; SSE2-LABEL: 'pairwise_reduction4double'		; SSE2-LABEL: 'pairwise_reduction4double'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1
; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
; SSSE3-LABEL: 'pairwise_reduction4double'		; SSSE3-LABEL: 'pairwise_reduction4double'
; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1
; SSSE3-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; SSSE3-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
; SSE42-LABEL: 'pairwise_reduction4double'		; SSE42-LABEL: 'pairwise_reduction4double'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1
; SSE42-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; SSE42-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
; AVX1-LABEL: 'pairwise_reduction4double'		; AVX1-LABEL: 'pairwise_reduction4double'
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1
Show All 9 Lines
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1
; AVX2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; AVX2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
; SLM-LABEL: 'pairwise_reduction4double'		; SLM-LABEL: 'pairwise_reduction4double'
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1		; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1
; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0		; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = extractelement <4 x double> %bin.rdx8, i32 0
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret double %r
;		;
%rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		%rdx.shuf.0.0 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		%rdx.shuf.0.1 = shufflevector <4 x double> %rdx, <4 x double> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1		%bin.rdx = fadd <4 x double> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		%rdx.shuf.1.0 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		%rdx.shuf.1.1 = shufflevector <4 x double> %bin.rdx, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
%bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1		%bin.rdx8 = fadd <4 x double> %rdx.shuf.1.0, %rdx.shuf.1.1

%r = extractelement <4 x double> %bin.rdx8, i32 0		%r = extractelement <4 x double> %bin.rdx8, i32 0
ret double %r		ret double %r
}		}

define fastcc float @pairwise_reduction8float(<8 x float> %rdx, float %f1) {		define fastcc float @pairwise_reduction8float(<8 x float> %rdx, float %f1) {
; SSE2-LABEL: 'pairwise_reduction8float'		; SSE2-LABEL: 'pairwise_reduction8float'
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.1 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1		; SSE2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1
; SSE2-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0		; SSE2-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
; SSSE3-LABEL: 'pairwise_reduction8float'		; SSSE3-LABEL: 'pairwise_reduction8float'
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.1 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1		; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1
; SSSE3-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0		; SSSE3-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0
; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
; SSE42-LABEL: 'pairwise_reduction8float'		; SSE42-LABEL: 'pairwise_reduction8float'
; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.1 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1
; SSE42-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0		; SSE42-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
; AVX1-LABEL: 'pairwise_reduction8float'		; AVX1-LABEL: 'pairwise_reduction8float'
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
Show All 15 Lines
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1
; AVX2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0		; AVX2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
; SLM-LABEL: 'pairwise_reduction8float'		; SLM-LABEL: 'pairwise_reduction8float'
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.1 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <8 x float> %bin.rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = fadd <8 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x float> %bin.rdx8, <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx9 = fadd <8 x float> %rdx.shuf.2.0, %rdx.shuf.2.1
; SLM-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0		; SLM-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <8 x float> %bin.rdx9, i32 0
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r
;		;
%rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6,i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf.0.0 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6,i32 undef, i32 undef, i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7,i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf.0.1 = shufflevector <8 x float> %rdx, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7,i32 undef, i32 undef, i32 undef, i32 undef>
%bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1		%bin.rdx = fadd <8 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef,<8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf.1.0 = shufflevector <8 x float> %bin.rdx, <8 x float> undef,<8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	;
%bin.rdx8 = add <4 x i32> %rdx.shuf.1.0, %rdx.shuf.1.1		%bin.rdx8 = add <4 x i32> %rdx.shuf.1.0, %rdx.shuf.1.1

%r = extractelement <4 x i32> %bin.rdx8, i32 0		%r = extractelement <4 x i32> %bin.rdx8, i32 0
ret i32 %r		ret i32 %r
}		}

define fastcc i64 @pairwise_reduction4i64(<4 x i64> %rdx, i64 %f1) {		define fastcc i64 @pairwise_reduction4i64(<4 x i64> %rdx, i64 %f1) {
; SSE-LABEL: 'pairwise_reduction4i64'		; SSE-LABEL: 'pairwise_reduction4i64'
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.0 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.1 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.1.1 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <4 x i64> %rdx.shuf.1.0, %rdx.shuf.1.1		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <4 x i64> %rdx.shuf.1.0, %rdx.shuf.1.1
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0		; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r		; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r
;		;
; AVX1-LABEL: 'pairwise_reduction4i64'		; AVX1-LABEL: 'pairwise_reduction4i64'
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1
Show All 9 Lines
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = add <4 x i64> %rdx.shuf.1.0, %rdx.shuf.1.1		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = add <4 x i64> %rdx.shuf.1.0, %rdx.shuf.1.1
; AVX2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0		; AVX2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r
;		;
; SLM-LABEL: 'pairwise_reduction4i64'		; SLM-LABEL: 'pairwise_reduction4i64'
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.0 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.0.1 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.1.0 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.1.1 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %bin.rdx8 = add <4 x i64> %rdx.shuf.1.0, %rdx.shuf.1.1		; SLM-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %bin.rdx8 = add <4 x i64> %rdx.shuf.1.0, %rdx.shuf.1.1
; SLM-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0		; SLM-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r = extractelement <4 x i64> %bin.rdx8, i32 0
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r
;		;
%rdx.shuf.0.0 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		%rdx.shuf.0.0 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		%rdx.shuf.0.1 = shufflevector <4 x i64> %rdx, <4 x i64> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1		%bin.rdx = add <4 x i64> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		%rdx.shuf.1.0 = shufflevector <4 x i64> %bin.rdx, <4 x i64> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	;
%bin.rdx9 = add <8 x i16> %rdx.shuf.2.0, %rdx.shuf.2.1		%bin.rdx9 = add <8 x i16> %rdx.shuf.2.0, %rdx.shuf.2.1

%r = extractelement <8 x i16> %bin.rdx9, i32 0		%r = extractelement <8 x i16> %bin.rdx9, i32 0
ret i16 %r		ret i16 %r
}		}

define fastcc i32 @pairwise_reduction8i32(<8 x i32> %rdx, i32 %f1) {		define fastcc i32 @pairwise_reduction8i32(<8 x i32> %rdx, i32 %f1) {
; SSE-LABEL: 'pairwise_reduction8i32'		; SSE-LABEL: 'pairwise_reduction8i32'
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %rdx.shuf.0.0, %rdx.shuf.0.1		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %rdx.shuf.0.0, %rdx.shuf.0.1
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.0 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.0 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.1 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <8 x i32> %rdx.shuf.1.0, %rdx.shuf.1.1		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <8 x i32> %rdx.shuf.1.0, %rdx.shuf.1.1
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2.1 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx9 = add <8 x i32> %rdx.shuf.2.0, %rdx.shuf.2.1		; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx9 = add <8 x i32> %rdx.shuf.2.0, %rdx.shuf.2.1
; SSE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %r = extractelement <8 x i32> %bin.rdx9, i32 0		; SSE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %r = extractelement <8 x i32> %bin.rdx9, i32 0
; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r		; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r
;		;
; AVX1-LABEL: 'pairwise_reduction8i32'		; AVX1-LABEL: 'pairwise_reduction8i32'
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = add <8 x i32> %rdx.shuf.0.0, %rdx.shuf.0.1		; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %bin.rdx = add <8 x i32> %rdx.shuf.0.0, %rdx.shuf.0.1
Show All 15 Lines
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = add <8 x i32> %rdx.shuf.1.0, %rdx.shuf.1.1		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx8 = add <8 x i32> %rdx.shuf.1.0, %rdx.shuf.1.1
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx9 = add <8 x i32> %rdx.shuf.2.0, %rdx.shuf.2.1		; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %bin.rdx9 = add <8 x i32> %rdx.shuf.2.0, %rdx.shuf.2.1
; AVX2-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %r = extractelement <8 x i32> %bin.rdx9, i32 0		; AVX2-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %r = extractelement <8 x i32> %bin.rdx9, i32 0
; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r		; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r
;		;
; SLM-LABEL: 'pairwise_reduction8i32'		; SLM-LABEL: 'pairwise_reduction8i32'
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.0 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.0 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.0.1 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %rdx.shuf.0.1 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %rdx.shuf.0.0, %rdx.shuf.0.1		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx = add <8 x i32> %rdx.shuf.0.0, %rdx.shuf.0.1
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.0 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.0 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.1.1 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.1.1 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <8 x i32> %rdx.shuf.1.0, %rdx.shuf.1.1		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx8 = add <8 x i32> %rdx.shuf.1.0, %rdx.shuf.1.1
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %rdx.shuf.2.0 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %rdx.shuf.2.1 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SLM-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %rdx.shuf.2.1 = shufflevector <8 x i32> %bin.rdx8, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx9 = add <8 x i32> %rdx.shuf.2.0, %rdx.shuf.2.1		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %bin.rdx9 = add <8 x i32> %rdx.shuf.2.0, %rdx.shuf.2.1
; SLM-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %r = extractelement <8 x i32> %bin.rdx9, i32 0		; SLM-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %r = extractelement <8 x i32> %bin.rdx9, i32 0
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %r
;		;
%rdx.shuf.0.0 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6,i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf.0.0 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6,i32 undef, i32 undef, i32 undef, i32 undef>
%rdx.shuf.0.1 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7,i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf.0.1 = shufflevector <8 x i32> %rdx, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7,i32 undef, i32 undef, i32 undef, i32 undef>
%bin.rdx = add <8 x i32> %rdx.shuf.0.0, %rdx.shuf.0.1		%bin.rdx = add <8 x i32> %rdx.shuf.0.0, %rdx.shuf.0.1
%rdx.shuf.1.0 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef,<8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		%rdx.shuf.1.0 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef,<8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
Show All 9 Lines

llvm/test/Analysis/CostModel/X86/shuffle-single-src.ll

	Show All 15 Lines
	;			;
	; Verify the cost model for 1 src shuffles			; Verify the cost model for 1 src shuffles
	;			;

	define void @test_vXf64(<2 x double> %src128, <4 x double> %src256, <8 x double> %src512, <16 x double> %src1024) {			define void @test_vXf64(<2 x double> %src128, <4 x double> %src256, <8 x double> %src512, <16 x double> %src1024) {
	; SSE-LABEL: 'test_vXf64'			; SSE-LABEL: 'test_vXf64'
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>
	; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; XOP-LABEL: 'test_vXf64'			; XOP-LABEL: 'test_vXf64'
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>
	; XOP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX1-LABEL: 'test_vXf64'			; AVX1-LABEL: 'test_vXf64'
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX2-LABEL: 'test_vXf64'			; AVX2-LABEL: 'test_vXf64'
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX512-LABEL: 'test_vXf64'			; AVX512-LABEL: 'test_vXf64'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>			%V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> <i32 1, i32 1>
	%V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			%V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	%V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			%V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	%V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			%V1024 = shufflevector <16 x double> %src1024, <16 x double> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	ret void			ret void
	}			}

	define void @test_vXi64(<2 x i64> %src128, <4 x i64> %src256, <8 x i64> %src512) {			define void @test_vXi64(<2 x i64> %src128, <4 x i64> %src256, <8 x i64> %src512) {
	; SSE-LABEL: 'test_vXi64'			; SSE-LABEL: 'test_vXi64'
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; XOP-LABEL: 'test_vXi64'			; XOP-LABEL: 'test_vXi64'
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	; XOP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX1-LABEL: 'test_vXi64'			; AVX1-LABEL: 'test_vXi64'
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX2-LABEL: 'test_vXi64'			; AVX2-LABEL: 'test_vXi64'
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX512-LABEL: 'test_vXi64'			; AVX512-LABEL: 'test_vXi64'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>			%V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	%V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			%V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	%V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			%V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	ret void			ret void
	}			}

	define void @test_vXf32(<2 x float> %src64, <4 x float> %src128, <8 x float> %src256, <16 x float> %src512) {			define void @test_vXf32(<2 x float> %src64, <4 x float> %src128, <8 x float> %src256, <16 x float> %src512) {
	; SSE-LABEL: 'test_vXf32'			; SSE-LABEL: 'test_vXf32'
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; XOP-LABEL: 'test_vXf32'			; XOP-LABEL: 'test_vXf32'
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX1-LABEL: 'test_vXf32'			; AVX1-LABEL: 'test_vXf32'
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX2-LABEL: 'test_vXf32'			; AVX2-LABEL: 'test_vXf32'
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX512-LABEL: 'test_vXf32'			; AVX512-LABEL: 'test_vXf32'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>			%V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> <i32 1, i32 1>
	%V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			%V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	%V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			%V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	%V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			%V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	ret void			ret void
	}			}

	define void @test_vXi32(<2 x i32> %src64, <4 x i32> %src128, <8 x i32> %src256, <16 x i32> %src512, <32 x i32> %src1024) {			define void @test_vXi32(<2 x i32> %src64, <4 x i32> %src128, <8 x i32> %src256, <16 x i32> %src512, <32 x i32> %src1024) {
	; SSE-LABEL: 'test_vXi32'			; SSE-LABEL: 'test_vXi32'
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 112 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; XOP-LABEL: 'test_vXi32'			; XOP-LABEL: 'test_vXi32'
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX1-LABEL: 'test_vXi32'			; AVX1-LABEL: 'test_vXi32'
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX2-LABEL: 'test_vXi32'			; AVX2-LABEL: 'test_vXi32'
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX512-LABEL: 'test_vXi32'			; AVX512-LABEL: 'test_vXi32'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			%V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	%V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>			%V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 0>
	%V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>			%V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 5, i32 3, i32 2, i32 1, i32 0>
	%V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			%V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 13, i32 10, i32 9, i32 8, i32 8, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	%V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			%V1024 = shufflevector <32 x i32> %src1024, <32 x i32> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	ret void			ret void
	}			}

	define void @test_vXi16(<2 x i16> %src32, <4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512, <64 x i16> %src1024) {			define void @test_vXi16(<2 x i16> %src32, <4 x i16> %src64, <8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512, <64 x i16> %src1024) {
	; SSE2-LABEL: 'test_vXi16'			; SSE2-LABEL: 'test_vXi16'
	; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>			; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>			; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 448 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE2-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; SSSE3-LABEL: 'test_vXi16'			; SSSE3-LABEL: 'test_vXi16'
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; SSE42-LABEL: 'test_vXi16'			; SSE42-LABEL: 'test_vXi16'
	; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>			; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>			; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 168 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE42-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; XOP-LABEL: 'test_vXi16'			; XOP-LABEL: 'test_vXi16'
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX1-LABEL: 'test_vXi16'			; AVX1-LABEL: 'test_vXi16'
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 180 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX2-LABEL: 'test_vXi16'			; AVX2-LABEL: 'test_vXi16'
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX512F-LABEL: 'test_vXi16'			; AVX512F-LABEL: 'test_vXi16'
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 84 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %V1024 = shufflevector <64 x i16> %src1024, <64 x i16> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX512F-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX512BW-LABEL: 'test_vXi16'			; AVX512BW-LABEL: 'test_vXi16'
	; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>			; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <2 x i16> %src32, <2 x i16> undef, <2 x i32> <i32 1, i32 1>
	; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>			; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <4 x i16> %src64, <4 x i16> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>
	; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512BW-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> <i32 7, i32 6, i32 6, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512BW-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512BW-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 13, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512BW-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512BW-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> <i32 31, i32 30, i32 20, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 11, i32 9, i32 8, i32 7, i32 11, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	Show All 19 Lines
	}			}

	define void @test_vXi8(<2 x i8> %src16, <4 x i8> %src32, <8 x i8> %src64, <16 x i8> %src128, <32 x i8> %src256, <64 x i8> %src512) {			define void @test_vXi8(<2 x i8> %src16, <4 x i8> %src32, <8 x i8> %src64, <16 x i8> %src128, <32 x i8> %src256, <64 x i8> %src512) {
	; SSE2-LABEL: 'test_vXi8'			; SSE2-LABEL: 'test_vXi8'
	; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>			; SSE2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>			; SSE2-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>			; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE2-NEXT: Cost Model: Found an estimated cost of 19 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 156 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE2-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; SSSE3-LABEL: 'test_vXi8'			; SSSE3-LABEL: 'test_vXi8'
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSSE3-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSSE3-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; SSE42-LABEL: 'test_vXi8'			; SSE42-LABEL: 'test_vXi8'
	; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>			; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>			; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>			; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE42-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; SSE42-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; XOP-LABEL: 'test_vXi8'			; XOP-LABEL: 'test_vXi8'
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>
	; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; XOP-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; XOP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX1-LABEL: 'test_vXi8'			; AVX1-LABEL: 'test_vXi8'
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX1-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX2-LABEL: 'test_vXi8'			; AVX2-LABEL: 'test_vXi8'
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> <i32 63, i32 62, i32 61, i32 60, i32 59, i32 58, i32 57, i32 56, i32 55, i32 54, i32 53, i32 52, i32 51, i32 50, i32 49, i32 48, i32 47, i32 46, i32 45, i32 44, i32 43, i32 42, i32 41, i32 40, i32 39, i32 38, i32 37, i32 36, i32 35, i32 34, i32 33, i32 32, i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 20, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; AVX2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; AVX512F-LABEL: 'test_vXi8'			; AVX512F-LABEL: 'test_vXi8'
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V16 = shufflevector <2 x i8> %src16, <2 x i8> undef, <2 x i32> <i32 1, i32 1>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V32 = shufflevector <4 x i8> %src32, <4 x i8> undef, <4 x i32> <i32 3, i32 3, i32 1, i32 1>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V64 = shufflevector <8 x i8> %src64, <8 x i8> undef, <8 x i32> <i32 7, i32 7, i32 5, i32 5, i32 3, i32 3, i32 1, i32 1>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 11, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512F-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512F-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> <i32 31, i32 30, i32 29, i32 28, i32 27, i32 26, i32 25, i32 24, i32 23, i32 22, i32 21, i32 20, i32 19, i32 18, i32 17, i32 16, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 8, i32 8, i32 7, i32 6, i32 8, i32 4, i32 3, i32 2, i32 1, i32 0>
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=aarch64-apple-ios -mcpu=cyclone -o - %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=aarch64-apple-ios -mcpu=cyclone -o - %s \| FileCheck %s

	define void @f1(<2 x i16> %x, i16* %a) {			define void @f1(<2 x i16> %x, i16* %a) {
	; CHECK-LABEL: @f1(			; CHECK-LABEL: @f1(
	; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i16> [[X:%.]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>			; CHECK-NEXT: [[T2:%.]] = extractelement <2 x i16> [[X:%.]], i32 0
				; CHECK-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[X]], i32 1
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i16> [[X]], i32 0			; CHECK-NEXT: store i16 [[T2]], i16* [[A:%.*]], align 2
	; CHECK-NEXT: store i16 [[TMP1]], i16* [[A:%.*]], align 2			; CHECK-NEXT: store i16 [[T2]], i16* [[PTR0]], align 2
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR1]], align 2
	; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP2]], align 2			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR2]], align 2
				; CHECK-NEXT: store i16 [[T2]], i16* [[PTR3]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
				ABataevAuthorUnsubmitted Done Reply Inline Actions These regressions are caused by the incomplete cost model for AArch64, no cost for PermuteSingleSrc shuffle kind for VFxi16 ABataev: These regressions are caused by the incomplete cost model for AArch64, no cost for…
	;			;
	%t2 = extractelement <2 x i16> %x, i32 0			%t2 = extractelement <2 x i16> %x, i32 0
	%t3 = extractelement <2 x i16> %x, i32 1			%t3 = extractelement <2 x i16> %x, i32 1
	%ptr0 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 0			%ptr0 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 0
	%ptr1 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 1			%ptr1 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 1
	%ptr2 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 2			%ptr2 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 2
	%ptr3 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 3			%ptr3 = getelementptr inbounds [4 x i16], [4 x i16]* undef, i16 0, i16 3
	store i16 %t2, i16* %a			store i16 %t2, i16* %a
	store i16 %t2, i16* %ptr0			store i16 %t2, i16* %ptr0
	store i16 %t3, i16* %ptr1			store i16 %t3, i16* %ptr1
	store i16 %t3, i16* %ptr2			store i16 %t3, i16* %ptr2
	store i16 %t2, i16* %ptr3			store i16 %t2, i16* %ptr3
	ret void			ret void
	}			}

	define void @f2(<2 x i16> %x, i16* %a) {			define void @f2(<2 x i16> %x, i16* %a) {
	; CHECK-LABEL: @f2(			; CHECK-LABEL: @f2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[CONT:%.*]]			; CHECK-NEXT: br label [[CONT:%.*]]
	; CHECK: cont:			; CHECK: cont:
	; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[XX]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 0>			; CHECK-NEXT: [[T2:%.*]] = extractelement <2 x i16> [[XX]], i32 0
				; CHECK-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[XX]], i32 1
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i16> [[XX]], i32 0			; CHECK-NEXT: store i16 [[T2]], i16* [[A]], align 2
	; CHECK-NEXT: store i16 [[TMP0]], i16* [[A]], align 2			; CHECK-NEXT: store i16 [[T2]], i16* [[PTR0]], align 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR1]], align 2
	; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP1]], align 2			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR2]], align 2
				; CHECK-NEXT: store i16 [[T2]], i16* [[PTR3]], align 2
	; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2			; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %cont			br label %cont
	Show All 22 Lines

	define void @f3(<2 x i16> %x, i16* %a) {			define void @f3(<2 x i16> %x, i16* %a) {
	; CHECK-LABEL: @f3(			; CHECK-LABEL: @f3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[CONT:%.*]]			; CHECK-NEXT: br label [[CONT:%.*]]
	; CHECK: cont:			; CHECK: cont:
	; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[XX:%.]] = phi <2 x i16> [ [[X:%.]], [[ENTRY:%.*]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]			; CHECK-NEXT: [[AA:%.]] = phi i16 [ [[A:%.*]], [[ENTRY]] ], [ undef, [[CONT]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i16> [[XX]], <2 x i16> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[T2:%.*]] = extractelement <2 x i16> [[XX]], i32 0
				; CHECK-NEXT: [[T3:%.*]] = extractelement <2 x i16> [[XX]], i32 1
	; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0			; CHECK-NEXT: [[PTR0:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 0
	; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1			; CHECK-NEXT: [[PTR1:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 1
	; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2			; CHECK-NEXT: [[PTR2:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 2
	; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3			; CHECK-NEXT: [[PTR3:%.]] = getelementptr inbounds [4 x i16], [4 x i16] undef, i16 0, i16 3
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <2 x i16> [[XX]], i32 1			; CHECK-NEXT: store i16 [[T3]], i16* [[A]], align 2
	; CHECK-NEXT: store i16 [[TMP0]], i16* [[A]], align 2			; CHECK-NEXT: store i16 [[T3]], i16* [[PTR0]], align 2
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[PTR0]] to <4 x i16>*			; CHECK-NEXT: store i16 [[T2]], i16* [[PTR1]], align 2
	; CHECK-NEXT: store <4 x i16> [[SHUFFLE]], <4 x i16>* [[TMP1]], align 2			; CHECK-NEXT: store i16 [[T2]], i16* [[PTR2]], align 2
				; CHECK-NEXT: store i16 [[T3]], i16* [[PTR3]], align 2
	; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2			; CHECK-NEXT: [[A_VAL:%.]] = load i16, i16 [[A]], align 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0			; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A_VAL]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[CONT]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %cont			br label %cont
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-threshold=-6 -S -pass-remarks-output=%t < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-threshold=-5 -S -pass-remarks-output=%t < %s \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=YAML %s			; RUN: cat %t \| FileCheck -check-prefix=YAML %s


	; FIXME: The threshold is changed to keep this test case a bit smaller.			; FIXME: The threshold is changed to keep this test case a bit smaller.
	; The AArch64 cost model should not give such high costs to select statements.			; The AArch64 cost model should not give such high costs to select statements.

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux"			target triple = "aarch64--linux"
	▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/bad_types.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define void @test1(x86_mmx %a, x86_mmx %b, i64* %ptr) {			define void @test1(x86_mmx %a, x86_mmx %b, i64* %ptr) {
	; Ensure we can handle x86_mmx values which are primitive and can be bitcast			; Ensure we can handle x86_mmx values which are primitive and can be bitcast
	; with integer types but can't be put into a vector.			; with integer types but can't be put into a vector.
	;			;
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A_CAST:%.]] = bitcast x86_mmx [[A:%.]] to i64			; CHECK-NEXT: [[A_CAST:%.]] = bitcast x86_mmx [[A:%.]] to i64
	; CHECK-NEXT: [[B_CAST:%.]] = bitcast x86_mmx [[B:%.]] to i64			; CHECK-NEXT: [[B_CAST:%.]] = bitcast x86_mmx [[B:%.]] to i64
	; CHECK-NEXT: [[A_AND:%.*]] = and i64 [[A_CAST]], 42			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i64> poison, i64 [[A_CAST]], i32 0
	; CHECK-NEXT: [[B_AND:%.*]] = and i64 [[B_CAST]], 42			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> [[TMP0]], i64 [[B_CAST]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i64> [[TMP1]], <i64 42, i64 42>
	; CHECK-NEXT: [[GEP:%.]] = getelementptr i64, i64 [[PTR:%.*]], i32 1			; CHECK-NEXT: [[GEP:%.]] = getelementptr i64, i64 [[PTR:%.*]], i32 1
	; CHECK-NEXT: store i64 [[A_AND]], i64* [[PTR]], align 8			; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[PTR]] to <2 x i64>*
	; CHECK-NEXT: store i64 [[B_AND]], i64* [[GEP]], align 8			; CHECK-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* [[TMP3]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%a.cast = bitcast x86_mmx %a to i64			%a.cast = bitcast x86_mmx %a to i64
	%b.cast = bitcast x86_mmx %b to i64			%b.cast = bitcast x86_mmx %b to i64
	%a.and = and i64 %a.cast, 42			%a.and = and i64 %a.cast, 42
	%b.and = and i64 %b.cast, 42			%b.and = and i64 %b.cast, 42
	%gep = getelementptr i64, i64* %ptr, i32 1			%gep = getelementptr i64, i64* %ptr, i32 1
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

	Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines

	define <4 x float> @PR16739_byval(<4 x float>* nocapture readonly dereferenceable(16) %x) {			define <4 x float> @PR16739_byval(<4 x float>* nocapture readonly dereferenceable(16) %x) {
	; CHECK-LABEL: @PR16739_byval(			; CHECK-LABEL: @PR16739_byval(
	; CHECK-NEXT: [[T0:%.]] = bitcast <4 x float> [[X:%.]] to i64			; CHECK-NEXT: [[T0:%.]] = bitcast <4 x float> [[X:%.]] to i64
	; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16			; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16
	; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2			; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
	; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*			; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*
	; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8			; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8
	; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32
	; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float
	; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> poison, float [[T6]], i32 0
	; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32			; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32
	; CHECK-NEXT: [[T9:%.*]] = trunc i64 [[T8]] to i32			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[T1]], i32 0
	; CHECK-NEXT: [[T10:%.*]] = bitcast i32 [[T9]] to float			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[T8]], i32 1
	; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[T10]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = trunc <2 x i64> [[TMP2]] to <2 x i32>
				; CHECK-NEXT: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <2 x float>
				; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; CHECK-NEXT: [[T111:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP5]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32			; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32
	; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float			; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float
	; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[T13]], i32 2			; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T111]], float [[T13]], i32 2
	; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3			; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3
	; CHECK-NEXT: ret <4 x float> [[T15]]			; CHECK-NEXT: ret <4 x float> [[T15]]
	;			;
	%t0 = bitcast <4 x float>* %x to i64*			%t0 = bitcast <4 x float>* %x to i64*
	%t1 = load i64, i64* %t0, align 16			%t1 = load i64, i64* %t0, align 16
	%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2			%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
	%t3 = bitcast float* %t2 to i64*			%t3 = bitcast float* %t2 to i64*
	%t4 = load i64, i64* %t3, align 8			%t4 = load i64, i64* %t3, align 8
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

	Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines

	define <4 x float> @PR16739_byval(<4 x float>* nocapture readonly dereferenceable(16) %x) {			define <4 x float> @PR16739_byval(<4 x float>* nocapture readonly dereferenceable(16) %x) {
	; CHECK-LABEL: @PR16739_byval(			; CHECK-LABEL: @PR16739_byval(
	; CHECK-NEXT: [[T0:%.]] = bitcast <4 x float> [[X:%.]] to i64			; CHECK-NEXT: [[T0:%.]] = bitcast <4 x float> [[X:%.]] to i64
	; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16			; CHECK-NEXT: [[T1:%.]] = load i64, i64 [[T0]], align 16
	; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2			; CHECK-NEXT: [[T2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
	; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*			; CHECK-NEXT: [[T3:%.]] = bitcast float [[T2]] to i64*
	; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8			; CHECK-NEXT: [[T4:%.]] = load i64, i64 [[T3]], align 8
	; CHECK-NEXT: [[T5:%.*]] = trunc i64 [[T1]] to i32
	; CHECK-NEXT: [[T6:%.*]] = bitcast i32 [[T5]] to float
	; CHECK-NEXT: [[T7:%.*]] = insertelement <4 x float> undef, float [[T6]], i32 0
	; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32			; CHECK-NEXT: [[T8:%.*]] = lshr i64 [[T1]], 32
	; CHECK-NEXT: [[T9:%.*]] = trunc i64 [[T8]] to i32			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> poison, i64 [[T1]], i32 0
	; CHECK-NEXT: [[T10:%.*]] = bitcast i32 [[T9]] to float			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[T8]], i32 1
	; CHECK-NEXT: [[T11:%.*]] = insertelement <4 x float> [[T7]], float [[T10]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = trunc <2 x i64> [[TMP2]] to <2 x i32>
				; CHECK-NEXT: [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <2 x float>
				; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; CHECK-NEXT: [[T111:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP5]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32			; CHECK-NEXT: [[T12:%.*]] = trunc i64 [[T4]] to i32
	; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float			; CHECK-NEXT: [[T13:%.*]] = bitcast i32 [[T12]] to float
	; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T11]], float [[T13]], i32 2			; CHECK-NEXT: [[T14:%.*]] = insertelement <4 x float> [[T111]], float [[T13]], i32 2
	; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3			; CHECK-NEXT: [[T15:%.*]] = insertelement <4 x float> [[T14]], float [[T13]], i32 3
	; CHECK-NEXT: ret <4 x float> [[T15]]			; CHECK-NEXT: ret <4 x float> [[T15]]
	;			;
	%t0 = bitcast <4 x float>* %x to i64*			%t0 = bitcast <4 x float>* %x to i64*
	%t1 = load i64, i64* %t0, align 16			%t1 = load i64, i64* %t0, align 16
	%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2			%t2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
	%t3 = bitcast float* %t2 to i64*			%t3 = bitcast float* %t2 to i64*
	%t4 = load i64, i64* %t3, align 8			%t4 = load i64, i64* %t3, align 8
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+sse2 -S \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+sse2 -S \| FileCheck %s
	; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx -S \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx -S \| FileCheck %s
	; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx2 -S \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx2 -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; These tests ensure that we do not regress due to PR31243. Note that we set			; These tests ensure that we do not regress due to PR31243. Note that we set
	; the SLP threshold to force vectorization even when not profitable.			; the SLP threshold to force vectorization even when not profitable.

	; When computing minimum sizes, if we can prove the sign bit is zero, we can			; When computing minimum sizes, if we can prove the sign bit is zero, we can
	Show All 38 Lines
	; if we can't prove that the upper bit of the original type is equal to			; if we can't prove that the upper bit of the original type is equal to
	; the upper bit of the proposed smaller type. If these two bits are the			; the upper bit of the proposed smaller type. If these two bits are the
	; same (either zero or one) we know that sign-extending from the smaller			; same (either zero or one) we know that sign-extending from the smaller
	; type will result in the same value. Since we don't yet perform this			; type will result in the same value. Since we don't yet perform this
	; optimization, we make the proposed smaller type (i8) larger (i16) to			; optimization, we make the proposed smaller type (i8) larger (i16) to
	; ensure correctness.			; ensure correctness.
	;			;
	define i8 @PR31243_sext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {			define i8 @PR31243_sext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {
	; SSE-LABEL: @PR31243_sext(			; CHECK-LABEL: @PR31243_sext(
	; SSE-NEXT: entry:			; CHECK-NEXT: entry:
	; SSE-NEXT: [[TMP0:%.]] = or i8 [[V0:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0
	; SSE-NEXT: [[TMP1:%.]] = or i8 [[V1:%.]], 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1
	; SSE-NEXT: [[TMP2:%.*]] = sext i8 [[TMP0]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>
	; SSE-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i16>
	; SSE-NEXT: [[TMP3:%.*]] = sext i8 [[TMP1]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i16> [[TMP3]], i32 0
	; SSE-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP5:%.*]] = sext i16 [[TMP4]] to i64
	; SSE-NEXT: [[TMP6:%.]] = load i8, i8 [[TMP4]], align 1			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP5]]
	; SSE-NEXT: [[TMP7:%.]] = load i8, i8 [[TMP5]], align 1			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i16> [[TMP3]], i32 1
	; SSE-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = sext i16 [[TMP6]] to i64
	; SSE-NEXT: ret i8 [[TMP8]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP7]]
	;			; CHECK-NEXT: [[TMP6:%.]] = load i8, i8 [[TMP4]], align 1
	; AVX-LABEL: @PR31243_sext(			; CHECK-NEXT: [[TMP7:%.]] = load i8, i8 [[TMP5]], align 1
	; AVX-NEXT: entry:			; CHECK-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
	; AVX-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i32 0			; CHECK-NEXT: ret i8 [[TMP8]]
	; AVX-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i32 1
	; AVX-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>
	; AVX-NEXT: [[TMP3:%.*]] = sext <2 x i8> [[TMP2]] to <2 x i16>
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x i16> [[TMP3]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = sext i16 [[TMP4]] to i64
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP5]]
	; AVX-NEXT: [[TMP6:%.*]] = extractelement <2 x i16> [[TMP3]], i32 1
	; AVX-NEXT: [[TMP7:%.*]] = sext i16 [[TMP6]] to i64
	; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP7]]
	; AVX-NEXT: [[TMP6:%.]] = load i8, i8 [[TMP4]], align 1
	; AVX-NEXT: [[TMP7:%.]] = load i8, i8 [[TMP5]], align 1
	; AVX-NEXT: [[TMP8:%.*]] = add i8 [[TMP6]], [[TMP7]]
	; AVX-NEXT: ret i8 [[TMP8]]
	;			;
	entry:			entry:
	%tmp0 = sext i8 %v0 to i32			%tmp0 = sext i8 %v0 to i32
	%tmp1 = sext i8 %v1 to i32			%tmp1 = sext i8 %v1 to i32
	%tmp2 = or i32 %tmp0, 1			%tmp2 = or i32 %tmp0, 1
	%tmp3 = or i32 %tmp1, 1			%tmp3 = or i32 %tmp1, 1
	%tmp4 = getelementptr inbounds i8, i8* %ptr, i32 %tmp2			%tmp4 = getelementptr inbounds i8, i8* %ptr, i32 %tmp2
	%tmp5 = getelementptr inbounds i8, i8* %ptr, i32 %tmp3			%tmp5 = getelementptr inbounds i8, i8* %ptr, i32 %tmp3
	%tmp6 = load i8, i8* %tmp4			%tmp6 = load i8, i8* %tmp4
	%tmp7 = load i8, i8* %tmp5			%tmp7 = load i8, i8* %tmp5
	%tmp8 = add i8 %tmp6, %tmp7			%tmp8 = add i8 %tmp6, %tmp7
	ret i8 %tmp8			ret i8 %tmp8
	}			}

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

	Show All 11 Lines

	; Function Attrs: uwtable			; Function Attrs: uwtable
	define void @_ZN1C10SwitchModeEv() local_unnamed_addr #0 comdat align 2 {			define void @_ZN1C10SwitchModeEv() local_unnamed_addr #0 comdat align 2 {
	; SSE-LABEL: @_ZN1C10SwitchModeEv(			; SSE-LABEL: @_ZN1C10SwitchModeEv(
	; SSE-NEXT: for.body.lr.ph.i:			; SSE-NEXT: for.body.lr.ph.i:
	; SSE-NEXT: [[OR_1:%.*]] = or i64 undef, 1			; SSE-NEXT: [[OR_1:%.*]] = or i64 undef, 1
	; SSE-NEXT: store i64 [[OR_1]], i64* undef, align 8			; SSE-NEXT: store i64 [[OR_1]], i64* undef, align 8
	; SSE-NEXT: [[FOO_1:%.]] = getelementptr inbounds [[CLASS_1:%.]], %class.1* undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 0			; SSE-NEXT: [[FOO_1:%.]] = getelementptr inbounds [[CLASS_1:%.]], %class.1* undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 0
	; SSE-NEXT: [[FOO_3:%.]] = load i64, i64 [[FOO_1]], align 8
	; SSE-NEXT: [[FOO_2:%.]] = getelementptr inbounds [[CLASS_1]], %class.1 undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 1			; SSE-NEXT: [[FOO_2:%.]] = getelementptr inbounds [[CLASS_1]], %class.1 undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 1
	; SSE-NEXT: [[FOO_4:%.]] = load i64, i64 [[FOO_2]], align 8			; SSE-NEXT: [[TMP0:%.]] = bitcast i64 [[FOO_1]] to <2 x i64>*
				; SSE-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8
	; SSE-NEXT: [[BAR5:%.]] = load i64, i64 undef, align 8			; SSE-NEXT: [[BAR5:%.]] = load i64, i64 undef, align 8
	; SSE-NEXT: [[AND_2:%.*]] = and i64 [[OR_1]], [[FOO_3]]			; SSE-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> poison, i64 [[OR_1]], i32 0
	; SSE-NEXT: [[AND_1:%.*]] = and i64 [[BAR5]], [[FOO_4]]			; SSE-NEXT: [[TMP3:%.*]] = insertelement <2 x i64> [[TMP2]], i64 [[BAR5]], i32 1
				; SSE-NEXT: [[TMP4:%.*]] = and <2 x i64> [[TMP3]], [[TMP1]]
	; SSE-NEXT: [[BAR3:%.]] = getelementptr inbounds [[CLASS_2:%.]], %class.2* undef, i64 0, i32 0, i32 0, i32 0, i64 0			; SSE-NEXT: [[BAR3:%.]] = getelementptr inbounds [[CLASS_2:%.]], %class.2* undef, i64 0, i32 0, i32 0, i32 0, i64 0
	; SSE-NEXT: store i64 [[AND_2]], i64* [[BAR3]], align 8
	; SSE-NEXT: [[BAR4:%.]] = getelementptr inbounds [[CLASS_2]], %class.2 undef, i64 0, i32 0, i32 0, i32 0, i64 1			; SSE-NEXT: [[BAR4:%.]] = getelementptr inbounds [[CLASS_2]], %class.2 undef, i64 0, i32 0, i32 0, i32 0, i64 1
	; SSE-NEXT: store i64 [[AND_1]], i64* [[BAR4]], align 8			; SSE-NEXT: [[TMP5:%.]] = bitcast i64 [[BAR3]] to <2 x i64>*
				; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP5]], align 8
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @_ZN1C10SwitchModeEv(			; AVX-LABEL: @_ZN1C10SwitchModeEv(
	; AVX-NEXT: for.body.lr.ph.i:			; AVX-NEXT: for.body.lr.ph.i:
	; AVX-NEXT: [[OR_1:%.*]] = or i64 undef, 1			; AVX-NEXT: [[OR_1:%.*]] = or i64 undef, 1
	; AVX-NEXT: store i64 [[OR_1]], i64* undef, align 8			; AVX-NEXT: store i64 [[OR_1]], i64* undef, align 8
	; AVX-NEXT: [[FOO_1:%.]] = getelementptr inbounds [[CLASS_1:%.]], %class.1* undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 0			; AVX-NEXT: [[FOO_1:%.]] = getelementptr inbounds [[CLASS_1:%.]], %class.1* undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 0
	; AVX-NEXT: [[FOO_2:%.]] = getelementptr inbounds [[CLASS_1]], %class.1 undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 1			; AVX-NEXT: [[FOO_2:%.]] = getelementptr inbounds [[CLASS_1]], %class.1 undef, i64 0, i32 0, i32 0, i32 0, i32 0, i64 1
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

	Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21			; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX2-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4			; AVX2-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4
	; AVX2-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @gather_load_3(			; AVX512-LABEL: @gather_load_3(
	; AVX512-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
	; AVX512-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1			; AVX512-NEXT: [[TMP4:%.]] = insertelement <2 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1			; AVX512-NEXT: [[TMP5:%.]] = shufflevector <2 x i32> [[TMP4]], <2 x i32*> poison, <2 x i32> zeroinitializer
	; AVX512-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP6:%.]] = getelementptr i32, <2 x i32> [[TMP5]], <2 x i64> <i64 4, i64 15>
	; AVX512-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0			; AVX512-NEXT: [[TMP7:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP7:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512-NEXT: [[TMP8:%.]] = shufflevector <4 x i32> [[TMP7]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[TMP7]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512-NEXT: [[TMP9:%.]] = getelementptr i32, <4 x i32> [[TMP8]], <4 x i64> <i64 18, i64 9, i64 6, i64 21>
	; AVX512-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP10:%.]] = insertelement <8 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>			; AVX512-NEXT: [[TMP11:%.]] = insertelement <8 x i32> [[TMP10]], i32* [[TMP3]], i32 1
	; AVX512-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5			; AVX512-NEXT: [[TMP12:%.]] = shufflevector <2 x i32> [[TMP6]], <2 x i32*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*			; AVX512-NEXT: [[TMP13:%.]] = shufflevector <8 x i32> [[TMP11]], <8 x i32*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP14:%.]] = shufflevector <4 x i32> [[TMP9]], <4 x i32*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX512-NEXT: [[TMP15:%.]] = shufflevector <8 x i32> [[TMP13]], <8 x i32*> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; AVX512-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP13]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP16:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP15:%.*]] = add i32 [[TMP14]], 2			; AVX512-NEXT: [[TMP17:%.*]] = add <8 x i32> [[TMP16]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
	; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6			; AVX512-NEXT: [[TMP18:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
	; AVX512-NEXT: store i32 [[TMP15]], i32* [[TMP11]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: store <8 x i32> [[TMP17]], <8 x i32>* [[TMP18]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX512-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 3
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
	; AVX512-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX512-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4
	; AVX512-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%3 = load i32, i32* %1, align 4, !tbaa !2			%3 = load i32, i32* %1, align 4, !tbaa !2
	%4 = add i32 %3, 1			%4 = add i32 %3, 1
	%5 = getelementptr inbounds i32, i32* %0, i64 1			%5 = getelementptr inbounds i32, i32* %0, i64 1
	store i32 %4, i32* %0, align 4, !tbaa !2			store i32 %4, i32* %0, align 4, !tbaa !2
	%6 = getelementptr inbounds i32, i32* %1, i64 11			%6 = getelementptr inbounds i32, i32* %1, i64 11
	%7 = load i32, i32* %6, align 4, !tbaa !2			%7 = load i32, i32* %6, align 4, !tbaa !2
	▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*			; AVX2-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*
	; AVX2-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @gather_load_4(			; AVX512-LABEL: @gather_load_4(
	; AVX512-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1			; AVX512-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
	; AVX512-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i32 0			; AVX512-NEXT: [[TMP1:%.]] = insertelement <2 x i32> poison, i32* [[T1]], i32 0
	; AVX512-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[TMP1]], <2 x i32*> poison, <2 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP3:%.]] = getelementptr i32, <4 x i32> [[TMP2]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512-NEXT: [[TMP3:%.]] = getelementptr i32, <2 x i32> [[TMP2]], <2 x i64> <i64 4, i64 15>
	; AVX512-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5			; AVX512-NEXT: [[TMP4:%.]] = insertelement <4 x i32> poison, i32* [[T1]], i32 0
	; AVX512-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9			; AVX512-NEXT: [[TMP5:%.]] = shufflevector <4 x i32> [[TMP4]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; AVX512-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6			; AVX512-NEXT: [[TMP6:%.]] = getelementptr i32, <4 x i32> [[TMP5]], <4 x i64> <i64 18, i64 9, i64 6, i64 21>
	; AVX512-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6			; AVX512-NEXT: [[TMP7:%.]] = insertelement <8 x i32> poison, i32* [[T1]], i32 0
	; AVX512-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7			; AVX512-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32* [[T6]], i32 1
	; AVX512-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21			; AVX512-NEXT: [[TMP9:%.]] = shufflevector <2 x i32> [[TMP3]], <2 x i32*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP10:%.]] = shufflevector <8 x i32> [[TMP8]], <8 x i32*> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP4:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP11:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP12:%.]] = shufflevector <8 x i32> [[TMP10]], <8 x i32*> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; AVX512-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP13:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP12]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP14:%.*]] = add <8 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
	; AVX512-NEXT: [[T4:%.*]] = add i32 [[T3]], 1			; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
	; AVX512-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 2, i32 3, i32 4, i32 1>			; AVX512-NEXT: store <8 x i32> [[TMP14]], <8 x i32>* [[TMP15]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[T24:%.*]] = add i32 [[T23]], 2
	; AVX512-NEXT: [[T28:%.*]] = add i32 [[T27]], 3
	; AVX512-NEXT: [[T32:%.*]] = add i32 [[T31]], 4
	; AVX512-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*
	; AVX512-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%t5 = getelementptr inbounds i32, i32* %t0, i64 1			%t5 = getelementptr inbounds i32, i32* %t0, i64 1
	%t6 = getelementptr inbounds i32, i32* %t1, i64 11			%t6 = getelementptr inbounds i32, i32* %t1, i64 11
	%t9 = getelementptr inbounds i32, i32* %t0, i64 2			%t9 = getelementptr inbounds i32, i32* %t0, i64 2
	%t10 = getelementptr inbounds i32, i32* %t1, i64 4			%t10 = getelementptr inbounds i32, i32* %t1, i64 4
	%t13 = getelementptr inbounds i32, i32* %t0, i64 3			%t13 = getelementptr inbounds i32, i32* %t0, i64 3
	%t14 = getelementptr inbounds i32, i32* %t1, i64 15			%t14 = getelementptr inbounds i32, i32* %t1, i64 15
	▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

	Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21			; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX2-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4			; AVX2-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4
	; AVX2-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @gather_load_3(			; AVX512-LABEL: @gather_load_3(
	; AVX512-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
	; AVX512-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1			; AVX512-NEXT: [[TMP4:%.]] = insertelement <2 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1			; AVX512-NEXT: [[TMP5:%.]] = shufflevector <2 x i32> [[TMP4]], <2 x i32*> poison, <2 x i32> zeroinitializer
	; AVX512-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP6:%.]] = getelementptr i32, <2 x i32> [[TMP5]], <2 x i64> <i64 4, i64 15>
	; AVX512-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0			; AVX512-NEXT: [[TMP7:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP7:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512-NEXT: [[TMP8:%.]] = shufflevector <4 x i32> [[TMP7]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP8:%.]] = getelementptr i32, <4 x i32> [[TMP7]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512-NEXT: [[TMP9:%.]] = getelementptr i32, <4 x i32> [[TMP8]], <4 x i64> <i64 18, i64 9, i64 6, i64 21>
	; AVX512-NEXT: [[TMP9:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP8]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP10:%.]] = insertelement <8 x i32> poison, i32* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP10:%.*]] = add <4 x i32> [[TMP9]], <i32 2, i32 3, i32 4, i32 1>			; AVX512-NEXT: [[TMP11:%.]] = insertelement <8 x i32> [[TMP10]], i32* [[TMP3]], i32 1
	; AVX512-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5			; AVX512-NEXT: [[TMP12:%.]] = shufflevector <2 x i32> [[TMP6]], <2 x i32*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*			; AVX512-NEXT: [[TMP13:%.]] = shufflevector <8 x i32> [[TMP11]], <8 x i32*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: store <4 x i32> [[TMP10]], <4 x i32>* [[TMP12]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP14:%.]] = shufflevector <4 x i32> [[TMP9]], <4 x i32*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX512-NEXT: [[TMP15:%.]] = shufflevector <8 x i32> [[TMP13]], <8 x i32*> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; AVX512-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP13]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP16:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP15:%.*]] = add i32 [[TMP14]], 2			; AVX512-NEXT: [[TMP17:%.*]] = add <8 x i32> [[TMP16]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
	; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6			; AVX512-NEXT: [[TMP18:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>
	; AVX512-NEXT: store i32 [[TMP15]], i32* [[TMP11]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: store <8 x i32> [[TMP17]], <8 x i32>* [[TMP18]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX512-NEXT: [[TMP18:%.]] = load i32, i32 [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP19:%.*]] = add i32 [[TMP18]], 3
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 7
	; AVX512-NEXT: store i32 [[TMP19]], i32* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX512-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP21]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP23:%.*]] = add i32 [[TMP22]], 4
	; AVX512-NEXT: store i32 [[TMP23]], i32* [[TMP20]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%3 = load i32, i32* %1, align 4, !tbaa !2			%3 = load i32, i32* %1, align 4, !tbaa !2
	%4 = add i32 %3, 1			%4 = add i32 %3, 1
	%5 = getelementptr inbounds i32, i32* %0, i64 1			%5 = getelementptr inbounds i32, i32* %0, i64 1
	store i32 %4, i32* %0, align 4, !tbaa !2			store i32 %4, i32* %0, align 4, !tbaa !2
	%6 = getelementptr inbounds i32, i32* %1, i64 11			%6 = getelementptr inbounds i32, i32* %1, i64 11
	%7 = load i32, i32* %6, align 4, !tbaa !2			%7 = load i32, i32* %6, align 4, !tbaa !2
	▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*			; AVX2-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*
	; AVX2-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512-LABEL: @gather_load_4(			; AVX512-LABEL: @gather_load_4(
	; AVX512-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1			; AVX512-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
	; AVX512-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i32 0			; AVX512-NEXT: [[TMP1:%.]] = insertelement <2 x i32> poison, i32* [[T1]], i32 0
	; AVX512-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[TMP1]], <2 x i32*> poison, <2 x i32> zeroinitializer
	; AVX512-NEXT: [[TMP3:%.]] = getelementptr i32, <4 x i32> [[TMP2]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512-NEXT: [[TMP3:%.]] = getelementptr i32, <2 x i32> [[TMP2]], <2 x i64> <i64 4, i64 15>
	; AVX512-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5			; AVX512-NEXT: [[TMP4:%.]] = insertelement <4 x i32> poison, i32* [[T1]], i32 0
	; AVX512-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9			; AVX512-NEXT: [[TMP5:%.]] = shufflevector <4 x i32> [[TMP4]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; AVX512-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6			; AVX512-NEXT: [[TMP6:%.]] = getelementptr i32, <4 x i32> [[TMP5]], <4 x i64> <i64 18, i64 9, i64 6, i64 21>
	; AVX512-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6			; AVX512-NEXT: [[TMP7:%.]] = insertelement <8 x i32> poison, i32* [[T1]], i32 0
	; AVX512-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7			; AVX512-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32* [[T6]], i32 1
	; AVX512-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21			; AVX512-NEXT: [[TMP9:%.]] = shufflevector <2 x i32> [[TMP3]], <2 x i32*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP10:%.]] = shufflevector <8 x i32> [[TMP8]], <8 x i32*> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[TMP4:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP11:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP12:%.]] = shufflevector <8 x i32> [[TMP10]], <8 x i32*> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; AVX512-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP13:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP12]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef), !tbaa [[TBAA0]]
	; AVX512-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]			; AVX512-NEXT: [[TMP14:%.*]] = add <8 x i32> [[TMP13]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
	; AVX512-NEXT: [[T4:%.*]] = add i32 [[T3]], 1			; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
	; AVX512-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 2, i32 3, i32 4, i32 1>			; AVX512-NEXT: store <8 x i32> [[TMP14]], <8 x i32>* [[TMP15]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[T24:%.*]] = add i32 [[T23]], 2
	; AVX512-NEXT: [[T28:%.*]] = add i32 [[T27]], 3
	; AVX512-NEXT: [[T32:%.*]] = add i32 [[T31]], 4
	; AVX512-NEXT: store i32 [[T4]], i32* [[T0]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: [[TMP6:%.]] = bitcast i32 [[T5]] to <4 x i32>*
	; AVX512-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	%t5 = getelementptr inbounds i32, i32* %t0, i64 1			%t5 = getelementptr inbounds i32, i32* %t0, i64 1
	%t6 = getelementptr inbounds i32, i32* %t1, i64 11			%t6 = getelementptr inbounds i32, i32* %t1, i64 11
	%t9 = getelementptr inbounds i32, i32* %t0, i64 2			%t9 = getelementptr inbounds i32, i32* %t0, i64 2
	%t10 = getelementptr inbounds i32, i32* %t1, i64 4			%t10 = getelementptr inbounds i32, i32* %t1, i64 4
	%t13 = getelementptr inbounds i32, i32* %t0, i64 3			%t13 = getelementptr inbounds i32, i32* %t0, i64 3
	%t14 = getelementptr inbounds i32, i32* %t1, i64 15			%t14 = getelementptr inbounds i32, i32* %t1, i64 15
	▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

	Show All 37 Lines
	; CHECK-NEXT: store <8 x i16> [[TMP8]], <8 x i16>* [[TMP9]], align 2			; CHECK-NEXT: store <8 x i16> [[TMP8]], <8 x i16>* [[TMP9]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: fextr			; YAML-NEXT: Function: fextr
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'Stores SLP vectorized with cost '			; YAML-NEXT: - String: 'Stores SLP vectorized with cost '
	; YAML-NEXT: - Cost: '-4'			; YAML-NEXT: - Cost: '-5'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '4'			; YAML-NEXT: - TreeSize: '4'

	entry:			entry:
	%LD = load <8 x i16>, <8 x i16>* undef			%LD = load <8 x i16>, <8 x i16>* undef
	%V0 = extractelement <8 x i16> %LD, i32 0			%V0 = extractelement <8 x i16> %LD, i32 0
	br label %t			br label %t

	Show All 34 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

	Show All 21 Lines
	; CHECK-NEXT: [[T17:%.]] = load i32, i32 [[T16]], align 4			; CHECK-NEXT: [[T17:%.]] = load i32, i32 [[T16]], align 4
	; CHECK-NEXT: [[T20:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 3			; CHECK-NEXT: [[T20:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 3
	; CHECK-NEXT: [[T21:%.]] = load i32, i32 [[T20]], align 4			; CHECK-NEXT: [[T21:%.]] = load i32, i32 [[T20]], align 4
	; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4			; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4
	; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4			; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4
	; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]			; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]
	; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]			; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]
	; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]			; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]
	; CHECK-NEXT: [[T28:%.*]] = add nsw i32 [[T15]], [[T9]]
	; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]			; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]
	; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]			; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]
	; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433			; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433
	; CHECK-NEXT: [[T32:%.*]] = mul nsw i32 [[T27]], 6270			; CHECK-NEXT: [[T32:%.*]] = mul nsw i32 [[T27]], 6270
	; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137			; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137
	; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]			; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]
	; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]			; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]
	; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]			; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]
	; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633			; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633
	; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446			; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446
	; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819			; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]			; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]
	; CHECK-NEXT: [[T50:%.*]] = add nsw i32 [[T40]], [[T48]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> poison, i32 [[T28]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[T50]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[T9]], i32 0
				; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[T48]], i32 1
				; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP2]], [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0
				; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> poison, i32 [[TMP6]], i32 0
				; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1
				; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[TMP7]], i32 1
	; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2			; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2
	; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3			; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3
	; CHECK-NEXT: [[T69:%.*]] = insertelement <8 x i32> [[T68]], i32 [[T28]], i32 4			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T69]], i32 [[T50]], i32 5			; CHECK-NEXT: [[T701:%.*]] = shufflevector <8 x i32> [[T68]], <8 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[T34]], i32 6			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T701]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7			; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	%t4 = getelementptr inbounds i32, i32* %t2, i64 7			%t4 = getelementptr inbounds i32, i32* %t2, i64 7
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

	Show All 21 Lines
	; CHECK-NEXT: [[T17:%.]] = load i32, i32 [[T16]], align 4			; CHECK-NEXT: [[T17:%.]] = load i32, i32 [[T16]], align 4
	; CHECK-NEXT: [[T20:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 3			; CHECK-NEXT: [[T20:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 3
	; CHECK-NEXT: [[T21:%.]] = load i32, i32 [[T20]], align 4			; CHECK-NEXT: [[T21:%.]] = load i32, i32 [[T20]], align 4
	; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4			; CHECK-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T2]], i64 4
	; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4			; CHECK-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4
	; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]			; CHECK-NEXT: [[T24:%.*]] = add nsw i32 [[T23]], [[T21]]
	; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]			; CHECK-NEXT: [[T25:%.*]] = sub nsw i32 [[T21]], [[T23]]
	; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]			; CHECK-NEXT: [[T27:%.*]] = sub nsw i32 [[T3]], [[T24]]
	; CHECK-NEXT: [[T28:%.*]] = add nsw i32 [[T15]], [[T9]]
	; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]			; CHECK-NEXT: [[T29:%.*]] = sub nsw i32 [[T9]], [[T15]]
	; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]			; CHECK-NEXT: [[T30:%.*]] = add nsw i32 [[T27]], [[T29]]
	; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433			; CHECK-NEXT: [[T31:%.*]] = mul nsw i32 [[T30]], 4433
	; CHECK-NEXT: [[T32:%.*]] = mul nsw i32 [[T27]], 6270			; CHECK-NEXT: [[T32:%.*]] = mul nsw i32 [[T27]], 6270
	; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137			; CHECK-NEXT: [[T34:%.*]] = mul nsw i32 [[T29]], -15137
	; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]			; CHECK-NEXT: [[T37:%.*]] = add nsw i32 [[T25]], [[T11]]
	; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]			; CHECK-NEXT: [[T38:%.*]] = add nsw i32 [[T17]], [[T5]]
	; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]			; CHECK-NEXT: [[T39:%.*]] = add nsw i32 [[T37]], [[T38]]
	; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633			; CHECK-NEXT: [[T40:%.*]] = mul nsw i32 [[T39]], 9633
	; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446			; CHECK-NEXT: [[T41:%.*]] = mul nsw i32 [[T25]], 2446
	; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819			; CHECK-NEXT: [[T42:%.*]] = mul nsw i32 [[T17]], 16819
	; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069			; CHECK-NEXT: [[T47:%.*]] = mul nsw i32 [[T37]], -16069
	; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196			; CHECK-NEXT: [[T48:%.*]] = mul nsw i32 [[T38]], -3196
	; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]			; CHECK-NEXT: [[T49:%.*]] = add nsw i32 [[T40]], [[T47]]
	; CHECK-NEXT: [[T50:%.*]] = add nsw i32 [[T40]], [[T48]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[T15]], i32 0
	; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> undef, i32 [[T28]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[T40]], i32 1
	; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[T50]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[T9]], i32 0
				; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[T48]], i32 1
				; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP2]], [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP5]], i32 0
				; CHECK-NEXT: [[T65:%.*]] = insertelement <8 x i32> undef, i32 [[TMP6]], i32 0
				; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP5]], i32 1
				; CHECK-NEXT: [[T66:%.*]] = insertelement <8 x i32> [[T65]], i32 [[TMP7]], i32 1
	; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2			; CHECK-NEXT: [[T67:%.*]] = insertelement <8 x i32> [[T66]], i32 [[T32]], i32 2
	; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3			; CHECK-NEXT: [[T68:%.*]] = insertelement <8 x i32> [[T67]], i32 [[T49]], i32 3
	; CHECK-NEXT: [[T69:%.*]] = insertelement <8 x i32> [[T68]], i32 [[T28]], i32 4			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[T70:%.*]] = insertelement <8 x i32> [[T69]], i32 [[T50]], i32 5			; CHECK-NEXT: [[T701:%.*]] = shufflevector <8 x i32> [[T68]], <8 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>
	; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T70]], i32 [[T34]], i32 6			; CHECK-NEXT: [[T71:%.*]] = insertelement <8 x i32> [[T701]], i32 [[T34]], i32 6
	; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7			; CHECK-NEXT: [[T72:%.*]] = insertelement <8 x i32> [[T71]], i32 [[T49]], i32 7
	; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			; CHECK-NEXT: [[T76:%.*]] = shl <8 x i32> [[T72]], <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*			; CHECK-NEXT: [[T79:%.]] = bitcast i32 [[T2]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4			; CHECK-NEXT: store <8 x i32> [[T76]], <8 x i32>* [[T79]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%t3 = load i32, i32* %t2, align 4			%t3 = load i32, i32* %t2, align 4
	%t4 = getelementptr inbounds i32, i32* %t2, i64 7			%t4 = getelementptr inbounds i32, i32* %t2, i64 7
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[COST]Improve cost model for shuffles in SLP.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 356710

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Analysis/CostModel/X86/reduction.ll

llvm/test/Analysis/CostModel/X86/shuffle-single-src.ll

llvm/test/Transforms/SLPVectorizer/AArch64/PR38339.ll

llvm/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/bad_types.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias.ll

[COST]Improve cost model for shuffles in SLP.
ClosedPublic