This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
-
BasicTTIImpl.h
-
IR/
4
Instructions.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
IR/
2/4
Instructions.cpp
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
ARM/
-
ARMISelLowering.cpp
-
RISCV/
-
RISCVISelLowering.cpp
-
RISCVTargetTransformInfo.cpp
-
X86/
-
X86TargetTransformInfo.cpp
-
Transforms/
-
InstCombine/
-
InstCombineVectorOps.cpp
-
Vectorize/
2/2
SLPVectorizer.cpp
-
test/Transforms/
-
Transforms/
-
LoopVectorize/RISCV/
-
RISCV/
-
interleaved-accesses.ll
-
SLPVectorizer/
-
AMDGPU/
-
add_sub_sat-inseltpoison.ll
-
add_sub_sat.ll
-
crash_extract_subvector_cost.ll
-
phi-result-use-order.ll
-
RISCV/
-
math-function.ll
-
X86/
-
alternate-calls-inseltpoison.ll
-
alternate-calls.ll
-
hadd-inseltpoison.ll
-
hadd.ll
-
hsub-inseltpoison.ll
-
hsub.ll
-
reduction-transpose.ll
-
unittests/IR/
-
IR/
-
InstructionsTest.cpp
1
ShuffleVectorInstTest.cpp

Differential D158449

[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
ClosedPublic

Authored by ABataev on Aug 21 2023, 1:05 PM.

Download Raw Diff

Details

Reviewers

RKSimon

Commits

rGe22818d5c98a: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
rGb186f1f68be1: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
rG6f43d28f3452: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
rG9f5960e004ff: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
rGc88c281cf1ac: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.

Summary

Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Aug 21 2023, 1:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 1:05 PM

Herald added subscribers: luke, sunshaoce, frasercrmck and 25 others. · View Herald Transcript

ABataev requested review of this revision.Aug 21 2023, 1:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 1:05 PM

Herald added subscribers: wangpc, MaskRay. · View Herald Transcript

craig.topper added a subscriber: craig.topper.Aug 21 2023, 1:46 PM

craig.topper added inline comments.

llvm/lib/IR/Instructions.cpp
2368–2369	If Mask.size() == 4 and NumSrcElts == 6. Then this considers <5, 4, 3, 2> as a reverse?

ABataev added inline comments.Aug 21 2023, 1:52 PM

llvm/lib/IR/Instructions.cpp
2368–2369	Yes. Need to add a check that the operation does not change the size, just like non-static isReverse() does.

Harbormaster completed remote builds in B253914: Diff 552115.Aug 21 2023, 2:32 PM

Address comments

Harbormaster completed remote builds in B253940: Diff 552154.Aug 21 2023, 4:45 PM

RKSimon added inline comments.Aug 23 2023, 1:46 AM

llvm/include/llvm/IR/Instructions.h
2107	Update this comment
2201	Update comment
2221	Update comment
2292	Update comment
llvm/unittests/IR/ShuffleVectorInstTest.cpp
14	All of these tests need support for length changing shuffles

Address comments

Harbormaster completed remote builds in B254337: Diff 552703.Aug 23 2023, 8:51 AM

Ping!

craig.topper added inline comments.Sep 6 2023, 8:58 AM

llvm/lib/IR/Instructions.cpp
2350	Why is the argument an int if we're going to cast it to unsigned?

ABataev added inline comments.Sep 6 2023, 9:48 AM

llvm/lib/IR/Instructions.cpp
2350	For conformance. All other functions have corresponding int parameter, though treat it as unsigned.

Rebase, ping!

Harbormaster completed remote builds in B257416: Diff 557058.Sep 19 2023, 11:46 AM

No objections to this, but I do think it'd make sense to split into 2 patches - (1) adding the NumSrcElts argument to the shuffle matching helpers + additional unit test coverage and (2) using it in SLP

Rebase, address comments

Harbormaster completed remote builds in B257505: Diff 557193.Sep 21 2023, 2:13 PM

RKSimon added inline comments.Sep 26 2023, 12:15 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6521	Maybe split this into a series of separate if () { return true/false; } to make it easier to grok?

Address comment

ABataev marked an inline comment as done.Sep 26 2023, 12:58 PM

ABataev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6521	Done

ABataev marked an inline comment as done.Sep 26 2023, 1:00 PM

Harbormaster completed remote builds in B257612: Diff 557371.Sep 26 2023, 2:41 PM

LGTM - cheers

This revision is now accepted and ready to land.Sep 27 2023, 3:01 AM

ABataev mentioned this in rG59a67ea35d60: [SLP]Improve costs in computeExtractCost() to avoid crash after D158449..Sep 28 2023, 9:43 AM

This revision was landed with ongoing or failed builds.Sep 28 2023, 11:12 AM

Closed by commit rGc88c281cf1ac: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGc88c281cf1ac: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst..

ABataev added a reverting change: rG3204f88a8bdb: Revert "[IR]Add NumSrcElts param to is..Mask static function in….Sep 28 2023, 11:57 AM

hans mentioned this in rG06f3b0ed436b: Revert "[SLP]Improve costs in computeExtractCost() to avoid crash after D158449..Sep 29 2023, 1:42 AM

ABataev mentioned this in rG019aee832768: [SLP]Improve costs in computeExtractCost() to avoid crash after D158449..Sep 29 2023, 7:51 AM

ABataev added a commit: rG9f5960e004ff: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst..Sep 29 2023, 1:21 PM

ABataev added a reverting change: rGebcb5d59fc7d: Revert "[IR]Add NumSrcElts param to is..Mask static function in….Sep 29 2023, 3:04 PM

ABataev added a commit: rG6f43d28f3452: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst..Oct 3 2023, 10:26 AM

hitting an opt -p slp-vectorizer assert on the following IR:

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.34.0"

define ptr @f() {
entry:
  %cmp.i.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %0 = zext i1 %cmp.i.i to i64
  %pgocount88 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8
  %1 = or i64 %pgocount88, %0
  store i64 %1, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8
  %cond.i.i = select i1 %cmp.i.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i.i = fcmp ogt float %cond.i.i, 0.000000e+00
  %2 = zext i1 %cmp1.i.i to i64
  %pgocount89 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8
  %3 = or i64 %pgocount89, %2
  store i64 %3, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8
  %cmp.i9.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i10.i = select i1 %cmp.i9.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i11.i = fcmp ogt float %cond.i10.i, 0.000000e+00
  %cmp.i14.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i15.i = select i1 %cmp.i14.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i16.i = fcmp ogt float %cond.i15.i, 0.000000e+00
  %cmp.i19.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i20.i = select i1 %cmp.i19.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i21.i = fcmp ogt float %cond.i20.i, 0.000000e+00
  ret ptr null
}

ABataev added a reverting change: rG1129dec778ae: Revert "[IR]Add NumSrcElts param to is..Mask static function in….Oct 3 2023, 1:02 PM

In D158449#4652813, @aeubanks wrote:

hitting an opt -p slp-vectorizer assert on the following IR:

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.34.0"

define ptr @f() {
entry:
  %cmp.i.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %0 = zext i1 %cmp.i.i to i64
  %pgocount88 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8
  %1 = or i64 %pgocount88, %0
  store i64 %1, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8
  %cond.i.i = select i1 %cmp.i.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i.i = fcmp ogt float %cond.i.i, 0.000000e+00
  %2 = zext i1 %cmp1.i.i to i64
  %pgocount89 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8
  %3 = or i64 %pgocount89, %2
  store i64 %3, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8
  %cmp.i9.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i10.i = select i1 %cmp.i9.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i11.i = fcmp ogt float %cond.i10.i, 0.000000e+00
  %cmp.i14.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i15.i = select i1 %cmp.i14.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i16.i = fcmp ogt float %cond.i15.i, 0.000000e+00
  %cmp.i19.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i20.i = select i1 %cmp.i19.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i21.i = fcmp ogt float %cond.i20.i, 0.000000e+00
  ret ptr null
}

Thanks, reverted the patch, will investigate it and fix.

ABataev added a commit: rGb186f1f68be1: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst..Oct 4 2023, 7:59 AM

Coming into this quite late, but I'm confused by the need for this API change.

Doesn't the existing ArrayRef::take_front API fulfill the same need? If you need to consider a prefix of the mask, why not just take a reference to that prefix?

In D158449#4653004, @reames wrote:

Coming into this quite late, but I'm confused by the need for this API change.

Doesn't the existing ArrayRef::take_front API fulfill the same need? If you need to consider a prefix of the mask, why not just take a reference to that prefix?

Sure, it may work this way, but better to have this in the interface functions. It helped to find so many bugs in the cost model. Plus, this extract thing must be copied over almost all target-specific TTI implementations.

another slp-vectorizer crash on

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv7-unknown-linux-android24"

define void @_ZNK8SkStroke10strokeRectERK6SkRectP6SkPath15SkPathDirection() {
entry:
  %0 = load float, ptr null, align 4
  %1 = load float, ptr null, align 4
  %2 = load float, ptr null, align 4
  %cmp.i = fcmp ogt float %1, %0
  %rect.sroa.14.0 = select i1 %cmp.i, float %1, float 0.000000e+00
  %rect.sroa.0.0 = select i1 %cmp.i, float %0, float 0.000000e+00
  %cmp4.i = fcmp ogt float 0.000000e+00, %2
  %rect.sroa.19.0 = select i1 %cmp4.i, float 0.000000e+00, float 0.000000e+00
  %rect.sroa.9.0 = select i1 %cmp4.i, float %2, float 0.000000e+00
  store float %rect.sroa.0.0, ptr null, align 4
  %rect.sroa.9.0.r.sroa_idx = getelementptr i8, ptr null, i32 4
  store float %rect.sroa.9.0, ptr %rect.sroa.9.0.r.sroa_idx, align 4
  %rect.sroa.14.0.r.sroa_idx = getelementptr i8, ptr null, i32 8
  store float %rect.sroa.14.0, ptr %rect.sroa.14.0.r.sroa_idx, align 4
  %rect.sroa.19.0.r.sroa_idx = getelementptr i8, ptr null, i32 12
  store float %rect.sroa.19.0, ptr %rect.sroa.19.0.r.sroa_idx, align 4
  ret void
}

aeubanks added a reverting change: rG07389535a702: Revert "[IR]Add NumSrcElts param to is..Mask static function in….Oct 4 2023, 2:41 PM

ABataev added a commit: rGe22818d5c98a: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst..Oct 5 2023, 6:25 AM

In D158449#4653006, @ABataev wrote:

In D158449#4653004, @reames wrote:

Coming into this quite late, but I'm confused by the need for this API change.

Doesn't the existing ArrayRef::take_front API fulfill the same need? If you need to consider a prefix of the mask, why not just take a reference to that prefix?

Sure, it may work this way, but better to have this in the interface functions. It helped to find so many bugs in the cost model. Plus, this extract thing must be copied over almost all target-specific TTI implementations.

On the first point, I disagree. Having a simpler API and shifting the mask manipulation to the client (SLP) seems strictly better. On the second, I don't see (in your patch) need for truncation *within* backend modeling, I only see it from the client (SLP). Am I missing something?

In D158449#4653154, @reames wrote:

In D158449#4653006, @ABataev wrote:

In D158449#4653004, @reames wrote:

Coming into this quite late, but I'm confused by the need for this API change.

Doesn't the existing ArrayRef::take_front API fulfill the same need? If you need to consider a prefix of the mask, why not just take a reference to that prefix?

Sure, it may work this way, but better to have this in the interface functions. It helped to find so many bugs in the cost model. Plus, this extract thing must be copied over almost all target-specific TTI implementations.

On the first point, I disagree. Having a simpler API and shifting the mask manipulation to the client (SLP) seems strictly better. On the second, I don't see (in your patch) need for truncation *within* backend modeling, I only see it from the client (SLP). Am I missing something?

It is supposed to be used by other clients too. Currently it mostly affects SLP.
Backend relies on the different functions, which already include check for size change. This change actually make the API for middle-end more strictly follow the backend requirements and allows better cost estimation.

Hi @ABataev

The following fails with an assertion with this patch:

opt bbi-87376.ll -passes=vector-combine -o /dev/null

bbi-87376.ll328 BDownload

It crashes with

opt: ../lib/IR/Instructions.cpp:2129: bool isSingleSourceMaskImpl(ArrayRef<int>, int): Assertion `I >= 0 && I < (NumOpElts * 2) && "Out-of-bounds shuffle mask element"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ../../main-github/llvm/build-all/bin/opt bbi-87376.ll -passes=vector-combine -o /dev/null
 #0 0x0000556ead38c447 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/opt+0x2f2a447)
 #1 0x0000556ead389f9e llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/opt+0x2f27f9e)
 #2 0x0000556ead38cb0f SignalHandler(int) (../../main-github/llvm/build-all/bin/opt+0x2f2ab0f)
 #3 0x00007f28ab3f0630 __restore_rt (/lib64/libpthread.so.0+0xf630)
 #4 0x00007f28a8b37387 raise (/lib64/libc.so.6+0x36387)
 #5 0x00007f28a8b38a78 abort (/lib64/libc.so.6+0x37a78)
 #6 0x00007f28a8b301a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #7 0x00007f28a8b30252 (/lib64/libc.so.6+0x2f252)
 #8 0x0000556eacd49919 llvm::ShuffleVectorInst::isReverseMask(llvm::ArrayRef<int>, int) (../../main-github/llvm/build-all/bin/opt+0x28e7919)
 #9 0x0000556eabe3ea36 llvm::BasicTTIImplBase<llvm::X86TTIImpl>::improveShuffleKindFromMask(llvm::TargetTransformInfo::ShuffleKind, llvm::ArrayRef<int>, llvm::VectorType*, int&, llvm::VectorType*&) const (../../main-github/llvm/build-all/bin/opt+0x19dca36)
#10 0x0000556eabe3cb32 llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) (../../main-github/llvm/build-all/bin/opt+0x19dab32)
#11 0x0000556eac5a944d llvm::TargetTransformInfo::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) const (../../main-github/llvm/build-all/bin/opt+0x214744d)
#12 0x0000556ead852a58 (anonymous namespace)::VectorCombine::run()::$_12::operator()(llvm::Instruction&) const (../../main-github/llvm/build-all/bin/opt+0x33f0a58)
#13 0x0000556ead84d7e5 llvm::VectorCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x33eb7e5)
#14 0x0000556ead5a9a7d llvm::detail::PassModel<llvm::Function, llvm::VectorCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x3147a7d)
#15 0x0000556eacdac8c4 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x294a8c4)
#16 0x0000556eab18572d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0xd2372d)
#17 0x0000556eacdb0cae llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x294ecae)
#18 0x0000556eab1854cd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0xd234cd)
#19 0x0000556eacdaba54 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x2949a54)
#20 0x0000556eaad8a0d3 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (../../main-github/llvm/build-all/bin/opt+0x9280d3)
#21 0x0000556eaad9771c main (../../main-github/llvm/build-all/bin/opt+0x93571c)
#22 0x00007f28a8b23555 __libc_start_main (/lib64/libc.so.6+0x22555)
#23 0x0000556eaad84270 _start (../../main-github/llvm/build-all/bin/opt+0x922270)
Abort (core dumped)

In D158449#4653419, @uabelho wrote:

Hi @ABataev

The following fails with an assertion with this patch:

opt bbi-87376.ll -passes=vector-combine -o /dev/null

bbi-87376.ll328 BDownload

It crashes with

opt: ../lib/IR/Instructions.cpp:2129: bool isSingleSourceMaskImpl(ArrayRef<int>, int): Assertion `I >= 0 && I < (NumOpElts * 2) && "Out-of-bounds shuffle mask element"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ../../main-github/llvm/build-all/bin/opt bbi-87376.ll -passes=vector-combine -o /dev/null
 #0 0x0000556ead38c447 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/opt+0x2f2a447)
 #1 0x0000556ead389f9e llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/opt+0x2f27f9e)
 #2 0x0000556ead38cb0f SignalHandler(int) (../../main-github/llvm/build-all/bin/opt+0x2f2ab0f)
 #3 0x00007f28ab3f0630 __restore_rt (/lib64/libpthread.so.0+0xf630)
 #4 0x00007f28a8b37387 raise (/lib64/libc.so.6+0x36387)
 #5 0x00007f28a8b38a78 abort (/lib64/libc.so.6+0x37a78)
 #6 0x00007f28a8b301a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #7 0x00007f28a8b30252 (/lib64/libc.so.6+0x2f252)
 #8 0x0000556eacd49919 llvm::ShuffleVectorInst::isReverseMask(llvm::ArrayRef<int>, int) (../../main-github/llvm/build-all/bin/opt+0x28e7919)
 #9 0x0000556eabe3ea36 llvm::BasicTTIImplBase<llvm::X86TTIImpl>::improveShuffleKindFromMask(llvm::TargetTransformInfo::ShuffleKind, llvm::ArrayRef<int>, llvm::VectorType*, int&, llvm::VectorType*&) const (../../main-github/llvm/build-all/bin/opt+0x19dca36)
#10 0x0000556eabe3cb32 llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) (../../main-github/llvm/build-all/bin/opt+0x19dab32)
#11 0x0000556eac5a944d llvm::TargetTransformInfo::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) const (../../main-github/llvm/build-all/bin/opt+0x214744d)
#12 0x0000556ead852a58 (anonymous namespace)::VectorCombine::run()::$_12::operator()(llvm::Instruction&) const (../../main-github/llvm/build-all/bin/opt+0x33f0a58)
#13 0x0000556ead84d7e5 llvm::VectorCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x33eb7e5)
#14 0x0000556ead5a9a7d llvm::detail::PassModel<llvm::Function, llvm::VectorCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x3147a7d)
#15 0x0000556eacdac8c4 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x294a8c4)
#16 0x0000556eab18572d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0xd2372d)
#17 0x0000556eacdb0cae llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x294ecae)
#18 0x0000556eab1854cd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0xd234cd)
#19 0x0000556eacdaba54 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x2949a54)
#20 0x0000556eaad8a0d3 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (../../main-github/llvm/build-all/bin/opt+0x9280d3)
#21 0x0000556eaad9771c main (../../main-github/llvm/build-all/bin/opt+0x93571c)
#22 0x00007f28a8b23555 __libc_start_main (/lib64/libc.so.6+0x22555)
#23 0x0000556eaad84270 _start (../../main-github/llvm/build-all/bin/opt+0x922270)
Abort (core dumped)

Fixed in c2ae16f6a72a9e48d7c6df00ff34d12360eec190

In D158449#4653522, @ABataev wrote:

Fixed in c2ae16f6a72a9e48d7c6df00ff34d12360eec190

Thanks!

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

BasicTTIImpl.h

19 lines

IR/

Instructions.h

59 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

2 lines

IR/

Instructions.cpp

71 lines

Target/

AArch64/

AArch64ISelLowering.cpp

5 lines

ARM/

ARMISelLowering.cpp

2 lines

RISCV/

RISCVISelLowering.cpp

3 lines

RISCVTargetTransformInfo.cpp

7 lines

X86/

X86TargetTransformInfo.cpp

2 lines

Transforms/

InstCombine/

InstCombineVectorOps.cpp

4 lines

Vectorize/

SLPVectorizer.cpp

61 lines

test/

Transforms/

LoopVectorize/

RISCV/

interleaved-accesses.ll

32 lines

SLPVectorizer/

AMDGPU/

add_sub_sat-inseltpoison.ll

19 lines

add_sub_sat.ll

19 lines

crash_extract_subvector_cost.ll

13 lines

phi-result-use-order.ll

46 lines

RISCV/

math-function.ll

144 lines

X86/

alternate-calls-inseltpoison.ll

55 lines

55 lines

157 lines

157 lines

124 lines

124 lines

reduction-transpose.ll

32 lines

unittests/

IR/

InstructionsTest.cpp

178 lines

ShuffleVectorInstTest.cpp

38 lines

Diff 552154

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 929 Lines • ▼ Show 20 Lines	InstructionCost getArithmeticInstrCost(
// We don't know anything about this scalar instruction.		// We don't know anything about this scalar instruction.
return OpCost;		return OpCost;
}		}

TTI::ShuffleKind improveShuffleKindFromMask(TTI::ShuffleKind Kind,		TTI::ShuffleKind improveShuffleKindFromMask(TTI::ShuffleKind Kind,
ArrayRef<int> Mask,		ArrayRef<int> Mask,
VectorType *Ty, int &Index,		VectorType *Ty, int &Index,
VectorType *&SubTy) const {		VectorType *&SubTy) const {
int Limit = Mask.size() * 2;		if (Mask.empty())
if (Mask.empty() \|\|
// Extra check required by isSingleSourceMaskImpl function (called by
// ShuffleVectorInst::isSingleSourceMask).
any_of(Mask, [Limit](int I) { return I >= Limit; }))
return Kind;		return Kind;
		int NumSrcElts = Ty->getElementCount().getKnownMinValue();
switch (Kind) {		switch (Kind) {
case TTI::SK_PermuteSingleSrc:		case TTI::SK_PermuteSingleSrc:
if (ShuffleVectorInst::isReverseMask(Mask))		if (ShuffleVectorInst::isReverseMask(Mask, NumSrcElts))
return TTI::SK_Reverse;		return TTI::SK_Reverse;
if (ShuffleVectorInst::isZeroEltSplatMask(Mask))		if (ShuffleVectorInst::isZeroEltSplatMask(Mask, NumSrcElts))
return TTI::SK_Broadcast;		return TTI::SK_Broadcast;
break;		break;
case TTI::SK_PermuteTwoSrc: {		case TTI::SK_PermuteTwoSrc: {
int NumSubElts;		int NumSubElts;
if (Mask.size() > 2 && ShuffleVectorInst::isInsertSubvectorMask(		if (Mask.size() > 2 && ShuffleVectorInst::isInsertSubvectorMask(
Mask, Mask.size(), NumSubElts, Index)) {		Mask, NumSrcElts, NumSubElts, Index)) {
SubTy = FixedVectorType::get(Ty->getElementType(), NumSubElts);		SubTy = FixedVectorType::get(Ty->getElementType(), NumSubElts);
return TTI::SK_InsertSubvector;		return TTI::SK_InsertSubvector;
}		}
if (ShuffleVectorInst::isSelectMask(Mask))		if (ShuffleVectorInst::isSelectMask(Mask, NumSrcElts))
return TTI::SK_Select;		return TTI::SK_Select;
if (ShuffleVectorInst::isTransposeMask(Mask))		if (ShuffleVectorInst::isTransposeMask(Mask, NumSrcElts))
return TTI::SK_Transpose;		return TTI::SK_Transpose;
if (ShuffleVectorInst::isSpliceMask(Mask, Index))		if (ShuffleVectorInst::isSpliceMask(Mask, NumSrcElts, Index))
return TTI::SK_Splice;		return TTI::SK_Splice;
break;		break;
}		}
case TTI::SK_Select:		case TTI::SK_Select:
case TTI::SK_Reverse:		case TTI::SK_Reverse:
case TTI::SK_Broadcast:		case TTI::SK_Broadcast:
case TTI::SK_Transpose:		case TTI::SK_Transpose:
case TTI::SK_InsertSubvector:		case TTI::SK_InsertSubvector:
▲ Show 20 Lines • Show All 1,510 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Instructions.h

Show First 20 Lines • Show All 2,098 Lines • ▼ Show 20 Lines	unsigned NumSourceElts = cast<VectorType>(Op<0>()->getType())
.getKnownMinValue();		.getKnownMinValue();
unsigned NumMaskElts = ShuffleMask.size();		unsigned NumMaskElts = ShuffleMask.size();
return NumSourceElts < NumMaskElts;		return NumSourceElts < NumMaskElts;
}		}

/// Return true if this shuffle mask chooses elements from exactly one source		/// Return true if this shuffle mask chooses elements from exactly one source
/// vector.		/// vector.
/// Example: <7,5,undef,7>		/// Example: <7,5,undef,7>
/// This assumes that vector operands are the same length as the mask.		/// This assumes that vector operands are the same length as the mask.
		RKSimonUnsubmitted Not Done Reply Inline Actions Update this comment RKSimon: Update this comment
static bool isSingleSourceMask(ArrayRef<int> Mask);		static bool isSingleSourceMask(ArrayRef<int> Mask, int NumSrcElts);
static bool isSingleSourceMask(const Constant *Mask) {		static bool isSingleSourceMask(const Constant *Mask, int NumSrcElts) {
assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");		assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");
SmallVector<int, 16> MaskAsInts;		SmallVector<int, 16> MaskAsInts;
getShuffleMask(Mask, MaskAsInts);		getShuffleMask(Mask, MaskAsInts);
return isSingleSourceMask(MaskAsInts);		return isSingleSourceMask(MaskAsInts, NumSrcElts);
}		}

/// Return true if this shuffle chooses elements from exactly one source		/// Return true if this shuffle chooses elements from exactly one source
/// vector without changing the length of that vector.		/// vector without changing the length of that vector.
/// Example: shufflevector <4 x n> A, <4 x n> B, <3,0,undef,3>		/// Example: shufflevector <4 x n> A, <4 x n> B, <3,0,undef,3>
/// TODO: Optionally allow length-changing shuffles.		/// TODO: Optionally allow length-changing shuffles.
bool isSingleSource() const {		bool isSingleSource() const {
return !changesLength() && isSingleSourceMask(ShuffleMask);		return !changesLength() &&
		isSingleSourceMask(ShuffleMask, ShuffleMask.size());
}		}

/// Return true if this shuffle mask chooses elements from exactly one source		/// Return true if this shuffle mask chooses elements from exactly one source
/// vector without lane crossings. A shuffle using this mask is not		/// vector without lane crossings. A shuffle using this mask is not
/// necessarily a no-op because it may change the number of elements from its		/// necessarily a no-op because it may change the number of elements from its
/// input vectors or it may provide demanded bits knowledge via undef lanes.		/// input vectors or it may provide demanded bits knowledge via undef lanes.
/// Example: <undef,undef,2,3>		/// Example: <undef,undef,2,3>
static bool isIdentityMask(ArrayRef<int> Mask);		static bool isIdentityMask(ArrayRef<int> Mask, int NumSrcElts);
static bool isIdentityMask(const Constant *Mask) {		static bool isIdentityMask(const Constant *Mask, int NumSrcElts) {
assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");		assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");

// Not possible to express a shuffle mask for a scalable vector for this		// Not possible to express a shuffle mask for a scalable vector for this
// case.		// case.
if (isa<ScalableVectorType>(Mask->getType()))		if (isa<ScalableVectorType>(Mask->getType()))
return false;		return false;

SmallVector<int, 16> MaskAsInts;		SmallVector<int, 16> MaskAsInts;
getShuffleMask(Mask, MaskAsInts);		getShuffleMask(Mask, MaskAsInts);
return isIdentityMask(MaskAsInts);		return isIdentityMask(MaskAsInts, NumSrcElts);
}		}

/// Return true if this shuffle chooses elements from exactly one source		/// Return true if this shuffle chooses elements from exactly one source
/// vector without lane crossings and does not change the number of elements		/// vector without lane crossings and does not change the number of elements
/// from its input vectors.		/// from its input vectors.
/// Example: shufflevector <4 x n> A, <4 x n> B, <4,undef,6,undef>		/// Example: shufflevector <4 x n> A, <4 x n> B, <4,undef,6,undef>
bool isIdentity() const {		bool isIdentity() const {
// Not possible to express a shuffle mask for a scalable vector for this		// Not possible to express a shuffle mask for a scalable vector for this
// case.		// case.
if (isa<ScalableVectorType>(getType()))		if (isa<ScalableVectorType>(getType()))
return false;		return false;

return !changesLength() && isIdentityMask(ShuffleMask);		return !changesLength() && isIdentityMask(ShuffleMask, ShuffleMask.size());
}		}

/// Return true if this shuffle lengthens exactly one source vector with		/// Return true if this shuffle lengthens exactly one source vector with
/// undefs in the high elements.		/// undefs in the high elements.
bool isIdentityWithPadding() const;		bool isIdentityWithPadding() const;

/// Return true if this shuffle extracts the first N elements of exactly one		/// Return true if this shuffle extracts the first N elements of exactly one
/// source vector.		/// source vector.
bool isIdentityWithExtract() const;		bool isIdentityWithExtract() const;

/// Return true if this shuffle concatenates its 2 source vectors. This		/// Return true if this shuffle concatenates its 2 source vectors. This
/// returns false if either input is undefined. In that case, the shuffle is		/// returns false if either input is undefined. In that case, the shuffle is
/// is better classified as an identity with padding operation.		/// is better classified as an identity with padding operation.
bool isConcat() const;		bool isConcat() const;

/// Return true if this shuffle mask chooses elements from its source vectors		/// Return true if this shuffle mask chooses elements from its source vectors
/// without lane crossings. A shuffle using this mask would be		/// without lane crossings. A shuffle using this mask would be
/// equivalent to a vector select with a constant condition operand.		/// equivalent to a vector select with a constant condition operand.
/// Example: <4,1,6,undef>		/// Example: <4,1,6,undef>
/// This returns false if the mask does not choose from both input vectors.		/// This returns false if the mask does not choose from both input vectors.
/// In that case, the shuffle is better classified as an identity shuffle.		/// In that case, the shuffle is better classified as an identity shuffle.
/// This assumes that vector operands are the same length as the mask		/// This assumes that vector operands are the same length as the mask
/// (a length-changing shuffle can never be equivalent to a vector select).		/// (a length-changing shuffle can never be equivalent to a vector select).
static bool isSelectMask(ArrayRef<int> Mask);		static bool isSelectMask(ArrayRef<int> Mask, int NumSrcElts);
static bool isSelectMask(const Constant *Mask) {		static bool isSelectMask(const Constant *Mask, int NumSrcElts) {
assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");		assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");
SmallVector<int, 16> MaskAsInts;		SmallVector<int, 16> MaskAsInts;
getShuffleMask(Mask, MaskAsInts);		getShuffleMask(Mask, MaskAsInts);
return isSelectMask(MaskAsInts);		return isSelectMask(MaskAsInts, NumSrcElts);
}		}

/// Return true if this shuffle chooses elements from its source vectors		/// Return true if this shuffle chooses elements from its source vectors
/// without lane crossings and all operands have the same number of elements.		/// without lane crossings and all operands have the same number of elements.
/// In other words, this shuffle is equivalent to a vector select with a		/// In other words, this shuffle is equivalent to a vector select with a
/// constant condition operand.		/// constant condition operand.
/// Example: shufflevector <4 x n> A, <4 x n> B, <undef,1,6,3>		/// Example: shufflevector <4 x n> A, <4 x n> B, <undef,1,6,3>
/// This returns false if the mask does not choose from both input vectors.		/// This returns false if the mask does not choose from both input vectors.
/// In that case, the shuffle is better classified as an identity shuffle.		/// In that case, the shuffle is better classified as an identity shuffle.
/// TODO: Optionally allow length-changing shuffles.		/// TODO: Optionally allow length-changing shuffles.
bool isSelect() const {		bool isSelect() const {
return !changesLength() && isSelectMask(ShuffleMask);		return !changesLength() && isSelectMask(ShuffleMask, ShuffleMask.size());
}		}

/// Return true if this shuffle mask swaps the order of elements from exactly		/// Return true if this shuffle mask swaps the order of elements from exactly
/// one source vector.		/// one source vector.
/// Example: <7,6,undef,4>		/// Example: <7,6,undef,4>
/// This assumes that vector operands are the same length as the mask.		/// This assumes that vector operands are the same length as the mask.
		RKSimonUnsubmitted Not Done Reply Inline Actions Update comment RKSimon: Update comment
static bool isReverseMask(ArrayRef<int> Mask);		static bool isReverseMask(ArrayRef<int> Mask, int NumSrcElts);
static bool isReverseMask(const Constant *Mask) {		static bool isReverseMask(const Constant *Mask, int NumSrcElts) {
assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");		assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");
SmallVector<int, 16> MaskAsInts;		SmallVector<int, 16> MaskAsInts;
getShuffleMask(Mask, MaskAsInts);		getShuffleMask(Mask, MaskAsInts);
return isReverseMask(MaskAsInts);		return isReverseMask(MaskAsInts, NumSrcElts);
}		}

/// Return true if this shuffle swaps the order of elements from exactly		/// Return true if this shuffle swaps the order of elements from exactly
/// one source vector.		/// one source vector.
/// Example: shufflevector <4 x n> A, <4 x n> B, <3,undef,1,undef>		/// Example: shufflevector <4 x n> A, <4 x n> B, <3,undef,1,undef>
/// TODO: Optionally allow length-changing shuffles.		/// TODO: Optionally allow length-changing shuffles.
bool isReverse() const {		bool isReverse() const {
return !changesLength() && isReverseMask(ShuffleMask);		return !changesLength() && isReverseMask(ShuffleMask, ShuffleMask.size());
}		}

/// Return true if this shuffle mask chooses all elements with the same value		/// Return true if this shuffle mask chooses all elements with the same value
/// as the first element of exactly one source vector.		/// as the first element of exactly one source vector.
/// Example: <4,undef,undef,4>		/// Example: <4,undef,undef,4>
/// This assumes that vector operands are the same length as the mask.		/// This assumes that vector operands are the same length as the mask.
		RKSimonUnsubmitted Not Done Reply Inline Actions Update comment RKSimon: Update comment
static bool isZeroEltSplatMask(ArrayRef<int> Mask);		static bool isZeroEltSplatMask(ArrayRef<int> Mask, int NumSrcElts);
static bool isZeroEltSplatMask(const Constant *Mask) {		static bool isZeroEltSplatMask(const Constant *Mask, int NumSrcElts) {
assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");		assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");
SmallVector<int, 16> MaskAsInts;		SmallVector<int, 16> MaskAsInts;
getShuffleMask(Mask, MaskAsInts);		getShuffleMask(Mask, MaskAsInts);
return isZeroEltSplatMask(MaskAsInts);		return isZeroEltSplatMask(MaskAsInts, NumSrcElts);
}		}

/// Return true if all elements of this shuffle are the same value as the		/// Return true if all elements of this shuffle are the same value as the
/// first element of exactly one source vector without changing the length		/// first element of exactly one source vector without changing the length
/// of that vector.		/// of that vector.
/// Example: shufflevector <4 x n> A, <4 x n> B, <undef,0,undef,0>		/// Example: shufflevector <4 x n> A, <4 x n> B, <undef,0,undef,0>
/// TODO: Optionally allow length-changing shuffles.		/// TODO: Optionally allow length-changing shuffles.
/// TODO: Optionally allow splats from other elements.		/// TODO: Optionally allow splats from other elements.
bool isZeroEltSplat() const {		bool isZeroEltSplat() const {
return !changesLength() && isZeroEltSplatMask(ShuffleMask);		return !changesLength() &&
		isZeroEltSplatMask(ShuffleMask, ShuffleMask.size());
}		}

/// Return true if this shuffle mask is a transpose mask.		/// Return true if this shuffle mask is a transpose mask.
/// Transpose vector masks transpose a 2xn matrix. They read corresponding		/// Transpose vector masks transpose a 2xn matrix. They read corresponding
/// even- or odd-numbered vector elements from two n-dimensional source		/// even- or odd-numbered vector elements from two n-dimensional source
/// vectors and write each result into consecutive elements of an		/// vectors and write each result into consecutive elements of an
/// n-dimensional destination vector. Two shuffles are necessary to complete		/// n-dimensional destination vector. Two shuffles are necessary to complete
/// the transpose, one for the even elements and another for the odd elements.		/// the transpose, one for the even elements and another for the odd elements.
Show All 18 Lines	public:
///		///
/// ; Original matrix		/// ; Original matrix
/// m0 = < a, b, c, d >		/// m0 = < a, b, c, d >
/// m1 = < e, f, g, h >		/// m1 = < e, f, g, h >
///		///
/// ; Transposed matrix		/// ; Transposed matrix
/// t0 = < a, e, c, g > = shufflevector m0, m1 < 0, 4, 2, 6 >		/// t0 = < a, e, c, g > = shufflevector m0, m1 < 0, 4, 2, 6 >
/// t1 = < b, f, d, h > = shufflevector m0, m1 < 1, 5, 3, 7 >		/// t1 = < b, f, d, h > = shufflevector m0, m1 < 1, 5, 3, 7 >
static bool isTransposeMask(ArrayRef<int> Mask);		static bool isTransposeMask(ArrayRef<int> Mask, int NumSrcElts);
static bool isTransposeMask(const Constant *Mask) {		static bool isTransposeMask(const Constant *Mask, int NumSrcElts) {
assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");		assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");
SmallVector<int, 16> MaskAsInts;		SmallVector<int, 16> MaskAsInts;
getShuffleMask(Mask, MaskAsInts);		getShuffleMask(Mask, MaskAsInts);
return isTransposeMask(MaskAsInts);		return isTransposeMask(MaskAsInts, NumSrcElts);
}		}

/// Return true if this shuffle transposes the elements of its inputs without		/// Return true if this shuffle transposes the elements of its inputs without
/// changing the length of the vectors. This operation may also be known as a		/// changing the length of the vectors. This operation may also be known as a
/// merge or interleave. See the description for isTransposeMask() for the		/// merge or interleave. See the description for isTransposeMask() for the
/// exact specification.		/// exact specification.
/// Example: shufflevector <4 x n> A, <4 x n> B, <0,4,2,6>		/// Example: shufflevector <4 x n> A, <4 x n> B, <0,4,2,6>
bool isTranspose() const {		bool isTranspose() const {
return !changesLength() && isTransposeMask(ShuffleMask);		return !changesLength() && isTransposeMask(ShuffleMask, ShuffleMask.size());
}		}

/// Return true if this shuffle mask is a splice mask, concatenating the two		/// Return true if this shuffle mask is a splice mask, concatenating the two
/// inputs together and then extracts an original width vector starting from		/// inputs together and then extracts an original width vector starting from
/// the splice index.		/// the splice index.
		RKSimonUnsubmitted Not Done Reply Inline Actions Update comment RKSimon: Update comment
/// Example: shufflevector <4 x n> A, <4 x n> B, <1,2,3,4>		/// Example: shufflevector <4 x n> A, <4 x n> B, <1,2,3,4>
static bool isSpliceMask(ArrayRef<int> Mask, int &Index);		static bool isSpliceMask(ArrayRef<int> Mask, int NumSrcElts, int &Index);
static bool isSpliceMask(const Constant *Mask, int &Index) {		static bool isSpliceMask(const Constant *Mask, int NumSrcElts, int &Index) {
assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");		assert(Mask->getType()->isVectorTy() && "Shuffle needs vector constant.");
SmallVector<int, 16> MaskAsInts;		SmallVector<int, 16> MaskAsInts;
getShuffleMask(Mask, MaskAsInts);		getShuffleMask(Mask, MaskAsInts);
return isSpliceMask(MaskAsInts, Index);		return isSpliceMask(MaskAsInts, NumSrcElts, Index);
}		}

/// Return true if this shuffle splices two inputs without changing the length		/// Return true if this shuffle splices two inputs without changing the length
/// of the vectors. This operation concatenates the two inputs together and		/// of the vectors. This operation concatenates the two inputs together and
/// then extracts an original width vector starting from the splice index.		/// then extracts an original width vector starting from the splice index.
/// Example: shufflevector <4 x n> A, <4 x n> B, <1,2,3,4>		/// Example: shufflevector <4 x n> A, <4 x n> B, <1,2,3,4>
bool isSplice(int &Index) const {		bool isSplice(int &Index) const {
return !changesLength() && isSpliceMask(ShuffleMask, Index);		return !changesLength() &&
		isSpliceMask(ShuffleMask, ShuffleMask.size(), Index);
}		}

/// Return true if this shuffle mask is an extract subvector mask.		/// Return true if this shuffle mask is an extract subvector mask.
/// A valid extract subvector mask returns a smaller vector from a single		/// A valid extract subvector mask returns a smaller vector from a single
/// source operand. The base extraction index is returned as well.		/// source operand. The base extraction index is returned as well.
static bool isExtractSubvectorMask(ArrayRef<int> Mask, int NumSrcElts,		static bool isExtractSubvectorMask(ArrayRef<int> Mask, int NumSrcElts,
int &Index);		int &Index);
static bool isExtractSubvectorMask(const Constant *Mask, int NumSrcElts,		static bool isExtractSubvectorMask(const Constant *Mask, int NumSrcElts,
▲ Show 20 Lines • Show All 3,171 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 23,837 Lines • ▼ Show 20 Lines	assert(DemandedSubvectors.size() <= 2 &&
"Should have ended up demanding at most two subvectors.");		"Should have ended up demanding at most two subvectors.");

// Did we discover that the shuffle does not actually depend on operands?		// Did we discover that the shuffle does not actually depend on operands?
if (DemandedSubvectors.empty())		if (DemandedSubvectors.empty())
return DAG.getUNDEF(NarrowVT);		return DAG.getUNDEF(NarrowVT);

// Profitability check: only deal with extractions from the first subvector		// Profitability check: only deal with extractions from the first subvector
// unless the mask becomes an identity mask.		// unless the mask becomes an identity mask.
if (!ShuffleVectorInst::isIdentityMask(NewMask) \|\|		if (!ShuffleVectorInst::isIdentityMask(NewMask, NewMask.size()) \|\|
any_of(NewMask, [](int M) { return M < 0; }))		any_of(NewMask, [](int M) { return M < 0; }))
for (auto &DemandedSubvector : DemandedSubvectors)		for (auto &DemandedSubvector : DemandedSubvectors)
if (DemandedSubvector.second != 0)		if (DemandedSubvector.second != 0)
return SDValue();		return SDValue();

// We still perform the exact same EXTRACT_SUBVECTOR, just on different		// We still perform the exact same EXTRACT_SUBVECTOR, just on different
// operand[s]/index[es], so there is no point in checking for it's legality.		// operand[s]/index[es], so there is no point in checking for it's legality.

▲ Show 20 Lines • Show All 3,777 Lines • Show Last 20 Lines

llvm/lib/IR/Instructions.cpp

Show First 20 Lines • Show All 2,322 Lines • ▼ Show 20 Lines	for (int I : Mask) {
UsesRHS \|= (I >= NumOpElts);		UsesRHS \|= (I >= NumOpElts);
if (UsesLHS && UsesRHS)		if (UsesLHS && UsesRHS)
return false;		return false;
}		}
// Allow for degenerate case: completely undef mask means neither source is used.		// Allow for degenerate case: completely undef mask means neither source is used.
return UsesLHS \|\| UsesRHS;		return UsesLHS \|\| UsesRHS;
}		}

bool ShuffleVectorInst::isSingleSourceMask(ArrayRef<int> Mask) {		bool ShuffleVectorInst::isSingleSourceMask(ArrayRef<int> Mask, int NumSrcElts) {
// We don't have vector operand size information, so assume operands are the		// We don't have vector operand size information, so assume operands are the
// same size as the mask.		// same size as the mask.
return isSingleSourceMaskImpl(Mask, Mask.size());		return isSingleSourceMaskImpl(Mask, NumSrcElts);
}		}

static bool isIdentityMaskImpl(ArrayRef<int> Mask, int NumOpElts) {		static bool isIdentityMaskImpl(ArrayRef<int> Mask, int NumOpElts) {
if (!isSingleSourceMaskImpl(Mask, NumOpElts))		if (!isSingleSourceMaskImpl(Mask, NumOpElts))
return false;		return false;
for (int i = 0, NumMaskElts = Mask.size(); i < NumMaskElts; ++i) {		for (int i = 0, NumMaskElts = Mask.size(); i < NumMaskElts; ++i) {
if (Mask[i] == -1)		if (Mask[i] == -1)
continue;		continue;
if (Mask[i] != i && Mask[i] != (NumOpElts + i))		if (Mask[i] != i && Mask[i] != (NumOpElts + i))
return false;		return false;
}		}
return true;		return true;
}		}

bool ShuffleVectorInst::isIdentityMask(ArrayRef<int> Mask) {		bool ShuffleVectorInst::isIdentityMask(ArrayRef<int> Mask, int NumSrcElts) {
// We don't have vector operand size information, so assume operands are the		// We don't have vector operand size information, so assume operands are the
		craig.topperUnsubmitted Not Done Reply Inline Actions Why is the argument an int if we're going to cast it to unsigned? craig.topper: Why is the argument an int if we're going to cast it to unsigned?
		ABataevAuthorUnsubmitted Done Reply Inline Actions For conformance. All other functions have corresponding int parameter, though treat it as unsigned. ABataev: For conformance. All other functions have corresponding int parameter, though treat it as…
// same size as the mask.		// same size as the mask.
return isIdentityMaskImpl(Mask, Mask.size());		return isIdentityMaskImpl(Mask, NumSrcElts);
}		}

bool ShuffleVectorInst::isReverseMask(ArrayRef<int> Mask) {		bool ShuffleVectorInst::isReverseMask(ArrayRef<int> Mask, int NumSrcElts) {
if (!isSingleSourceMask(Mask))		if (!isSingleSourceMask(Mask, NumSrcElts))
return false;		return false;

		if (Mask.size() != static_cast<unsigned>(NumSrcElts))
		return false;
// The number of elements in the mask must be at least 2.		// The number of elements in the mask must be at least 2.
int NumElts = Mask.size();		if (NumSrcElts < 2)
if (NumElts < 2)
return false;		return false;

for (int i = 0; i < NumElts; ++i) {		for (int I = 0, E = Mask.size(); I < E; ++I) {
if (Mask[i] == -1)		if (Mask[I] == -1)
continue;		continue;
if (Mask[i] != (NumElts - 1 - i) && Mask[i] != (NumElts + NumElts - 1 - i))		if (Mask[I] != (NumSrcElts - 1 - I) &&
		Mask[I] != (NumSrcElts + NumSrcElts - 1 - I))
		craig.topperUnsubmitted Not Done Reply Inline Actions If Mask.size() == 4 and NumSrcElts == 6. Then this considers <5, 4, 3, 2> as a reverse? craig.topper: If Mask.size() == 4 and NumSrcElts == 6. Then this considers <5, 4, 3, 2> as a reverse?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes. Need to add a check that the operation does not change the size, just like non-static isReverse() does. ABataev: Yes. Need to add a check that the operation does not change the size, just like non-static…
return false;		return false;
}		}
return true;		return true;
}		}

bool ShuffleVectorInst::isZeroEltSplatMask(ArrayRef<int> Mask) {		bool ShuffleVectorInst::isZeroEltSplatMask(ArrayRef<int> Mask, int NumSrcElts) {
if (!isSingleSourceMask(Mask))		if (!isSingleSourceMask(Mask, NumSrcElts))
return false;		return false;
for (int i = 0, NumElts = Mask.size(); i < NumElts; ++i) {		for (int I = 0, E = Mask.size(); I < E; ++I) {
if (Mask[i] == -1)		if (Mask[I] == -1)
continue;		continue;
if (Mask[i] != 0 && Mask[i] != NumElts)		if (Mask[I] != 0 && Mask[I] != NumSrcElts)
return false;		return false;
}		}
return true;		return true;
}		}

bool ShuffleVectorInst::isSelectMask(ArrayRef<int> Mask) {		bool ShuffleVectorInst::isSelectMask(ArrayRef<int> Mask, int NumSrcElts) {
// Select is differentiated from identity. It requires using both sources.		// Select is differentiated from identity. It requires using both sources.
if (isSingleSourceMask(Mask))		if (isSingleSourceMask(Mask, NumSrcElts))
return false;		return false;
for (int i = 0, NumElts = Mask.size(); i < NumElts; ++i) {		if (Mask.size() != static_cast<unsigned>(NumSrcElts))
if (Mask[i] == -1)		return false;
		for (int I = 0, E = Mask.size(); I < E; ++I) {
		if (Mask[I] == -1)
continue;		continue;
if (Mask[i] != i && Mask[i] != (NumElts + i))		if (Mask[I] != I && Mask[I] != (NumSrcElts + I))
return false;		return false;
}		}
return true;		return true;
}		}

bool ShuffleVectorInst::isTransposeMask(ArrayRef<int> Mask) {		bool ShuffleVectorInst::isTransposeMask(ArrayRef<int> Mask, int NumSrcElts) {
// Example masks that will return true:		// Example masks that will return true:
// v1 = <a, b, c, d>		// v1 = <a, b, c, d>
// v2 = <e, f, g, h>		// v2 = <e, f, g, h>
// trn1 = shufflevector v1, v2 <0, 4, 2, 6> = <a, e, c, g>		// trn1 = shufflevector v1, v2 <0, 4, 2, 6> = <a, e, c, g>
// trn2 = shufflevector v1, v2 <1, 5, 3, 7> = <b, f, d, h>		// trn2 = shufflevector v1, v2 <1, 5, 3, 7> = <b, f, d, h>

		if (Mask.size() != static_cast<unsigned>(NumSrcElts))
		return false;
// 1. The number of elements in the mask must be a power-of-2 and at least 2.		// 1. The number of elements in the mask must be a power-of-2 and at least 2.
int NumElts = Mask.size();		int Sz = Mask.size();
if (NumElts < 2 \|\| !isPowerOf2_32(NumElts))		if (Sz < 2 \|\| !isPowerOf2_32(Sz))
return false;		return false;

// 2. The first element of the mask must be either a 0 or a 1.		// 2. The first element of the mask must be either a 0 or a 1.
if (Mask[0] != 0 && Mask[0] != 1)		if (Mask[0] != 0 && Mask[0] != 1)
return false;		return false;

// 3. The difference between the first 2 elements must be equal to the		// 3. The difference between the first 2 elements must be equal to the
// number of elements in the mask.		// number of elements in the mask.
if ((Mask[1] - Mask[0]) != NumElts)		if ((Mask[1] - Mask[0]) != NumSrcElts)
return false;		return false;

// 4. The difference between consecutive even-numbered and odd-numbered		// 4. The difference between consecutive even-numbered and odd-numbered
// elements must be equal to 2.		// elements must be equal to 2.
for (int i = 2; i < NumElts; ++i) {		for (int I = 2; I < Sz; ++I) {
int MaskEltVal = Mask[i];		int MaskEltVal = Mask[I];
if (MaskEltVal == -1)		if (MaskEltVal == -1)
return false;		return false;
int MaskEltPrevVal = Mask[i - 2];		int MaskEltPrevVal = Mask[I - 2];
if (MaskEltVal - MaskEltPrevVal != 2)		if (MaskEltVal - MaskEltPrevVal != 2)
return false;		return false;
}		}
return true;		return true;
}		}

bool ShuffleVectorInst::isSpliceMask(ArrayRef<int> Mask, int &Index) {		bool ShuffleVectorInst::isSpliceMask(ArrayRef<int> Mask, int NumSrcElts,
		int &Index) {
		if (Mask.size() != static_cast<unsigned>(NumSrcElts))
		return false;
// Example: shufflevector <4 x n> A, <4 x n> B, <1,2,3,4>		// Example: shufflevector <4 x n> A, <4 x n> B, <1,2,3,4>
int StartIndex = -1;		int StartIndex = -1;
for (int I = 0, E = Mask.size(); I != E; ++I) {		for (int I = 0, E = Mask.size(); I != E; ++I) {
int MaskEltVal = Mask[I];		int MaskEltVal = Mask[I];
if (MaskEltVal == -1)		if (MaskEltVal == -1)
continue;		continue;

if (StartIndex == -1) {		if (StartIndex == -1) {
// Don't support a StartIndex that begins in the second input, or if the		// Don't support a StartIndex that begins in the second input, or if the
// first non-undef index would access below the StartIndex.		// first non-undef index would access below the StartIndex.
if (MaskEltVal < I \|\| E <= (MaskEltVal - I))		if (MaskEltVal < I \|\| NumSrcElts <= (MaskEltVal - I))
return false;		return false;

StartIndex = MaskEltVal - I;		StartIndex = MaskEltVal - I;
continue;		continue;
}		}

// Splice is sequential starting from StartIndex.		// Splice is sequential starting from StartIndex.
if (MaskEltVal != (StartIndex + I))		if (MaskEltVal != (StartIndex + I))
▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines
}		}

/// Return true if this shuffle mask is a replication mask.		/// Return true if this shuffle mask is a replication mask.
bool ShuffleVectorInst::isOneUseSingleSourceMask(int VF) const {		bool ShuffleVectorInst::isOneUseSingleSourceMask(int VF) const {
// Not possible to express a shuffle mask for a scalable vector for this		// Not possible to express a shuffle mask for a scalable vector for this
// case.		// case.
if (isa<ScalableVectorType>(getType()))		if (isa<ScalableVectorType>(getType()))
return false;		return false;
if (!isSingleSourceMask(ShuffleMask))		if (!isSingleSourceMask(ShuffleMask, VF))
return false;		return false;

return isOneUseSingleSourceMask(ShuffleMask, VF);		return isOneUseSingleSourceMask(ShuffleMask, VF);
}		}

bool ShuffleVectorInst::isInterleave(unsigned Factor) {		bool ShuffleVectorInst::isInterleave(unsigned Factor) {
FixedVectorType *OpTy = dyn_cast<FixedVectorType>(getOperand(0)->getType());		FixedVectorType *OpTy = dyn_cast<FixedVectorType>(getOperand(0)->getType());
// shuffle_vector can only interleave fixed length vectors - for scalable		// shuffle_vector can only interleave fixed length vectors - for scalable
▲ Show 20 Lines • Show All 2,358 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,741 Lines • ▼ Show 20 Lines	if (isREVMask(ShuffleMask, VT, 64))
return DAG.getNode(AArch64ISD::REV64, dl, V1.getValueType(), V1, V2);		return DAG.getNode(AArch64ISD::REV64, dl, V1.getValueType(), V1, V2);
if (isREVMask(ShuffleMask, VT, 32))		if (isREVMask(ShuffleMask, VT, 32))
return DAG.getNode(AArch64ISD::REV32, dl, V1.getValueType(), V1, V2);		return DAG.getNode(AArch64ISD::REV32, dl, V1.getValueType(), V1, V2);
if (isREVMask(ShuffleMask, VT, 16))		if (isREVMask(ShuffleMask, VT, 16))
return DAG.getNode(AArch64ISD::REV16, dl, V1.getValueType(), V1, V2);		return DAG.getNode(AArch64ISD::REV16, dl, V1.getValueType(), V1, V2);

if (((VT.getVectorNumElements() == 8 && VT.getScalarSizeInBits() == 16) \|\|		if (((VT.getVectorNumElements() == 8 && VT.getScalarSizeInBits() == 16) \|\|
(VT.getVectorNumElements() == 16 && VT.getScalarSizeInBits() == 8)) &&		(VT.getVectorNumElements() == 16 && VT.getScalarSizeInBits() == 8)) &&
ShuffleVectorInst::isReverseMask(ShuffleMask)) {		ShuffleVectorInst::isReverseMask(ShuffleMask, ShuffleMask.size())) {
SDValue Rev = DAG.getNode(AArch64ISD::REV64, dl, VT, V1);		SDValue Rev = DAG.getNode(AArch64ISD::REV64, dl, VT, V1);
return DAG.getNode(AArch64ISD::EXT, dl, VT, Rev, Rev,		return DAG.getNode(AArch64ISD::EXT, dl, VT, Rev, Rev,
DAG.getConstant(8, dl, MVT::i32));		DAG.getConstant(8, dl, MVT::i32));
}		}

bool ReverseEXT = false;		bool ReverseEXT = false;
unsigned Imm;		unsigned Imm;
if (isEXTMask(ShuffleMask, VT, ReverseEXT, Imm)) {		if (isEXTMask(ShuffleMask, VT, ReverseEXT, Imm)) {
▲ Show 20 Lines • Show All 14,022 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerFixedLengthVECTOR_SHUFFLEToSVE(
// the exact SVE register size is know. The main exception to this is when		// the exact SVE register size is know. The main exception to this is when
// indices are logically relative to the first element of either		// indices are logically relative to the first element of either
// ISD::VECTOR_SHUFFLE operand because these relative indices don't change		// ISD::VECTOR_SHUFFLE operand because these relative indices don't change
// when converting from fixed-length to scalable vector types (i.e. the start		// when converting from fixed-length to scalable vector types (i.e. the start
// of a fixed length vector is always the start of a scalable vector).		// of a fixed length vector is always the start of a scalable vector).
unsigned MinSVESize = Subtarget->getMinSVEVectorSizeInBits();		unsigned MinSVESize = Subtarget->getMinSVEVectorSizeInBits();
unsigned MaxSVESize = Subtarget->getMaxSVEVectorSizeInBits();		unsigned MaxSVESize = Subtarget->getMaxSVEVectorSizeInBits();
if (MinSVESize == MaxSVESize && MaxSVESize == VT.getSizeInBits()) {		if (MinSVESize == MaxSVESize && MaxSVESize == VT.getSizeInBits()) {
if (ShuffleVectorInst::isReverseMask(ShuffleMask) && Op2.isUndef()) {		if (ShuffleVectorInst::isReverseMask(ShuffleMask, ShuffleMask.size()) &&
		Op2.isUndef()) {
Op = DAG.getNode(ISD::VECTOR_REVERSE, DL, ContainerVT, Op1);		Op = DAG.getNode(ISD::VECTOR_REVERSE, DL, ContainerVT, Op1);
return convertFromScalableVector(DAG, VT, Op);		return convertFromScalableVector(DAG, VT, Op);
}		}

if (isZIPMask(ShuffleMask, VT, WhichResult) && WhichResult != 0)		if (isZIPMask(ShuffleMask, VT, WhichResult) && WhichResult != 0)
return convertFromScalableVector(		return convertFromScalableVector(
DAG, VT, DAG.getNode(AArch64ISD::ZIP2, DL, ContainerVT, Op1, Op2));		DAG, VT, DAG.getNode(AArch64ISD::ZIP2, DL, ContainerVT, Op1, Op2));

▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,364 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::isShuffleMaskLegal(ArrayRef<int> M, EVT VT) const {
}		}

bool ReverseVEXT, isV_UNDEF;		bool ReverseVEXT, isV_UNDEF;
unsigned Imm, WhichResult;		unsigned Imm, WhichResult;

unsigned EltSize = VT.getScalarSizeInBits();		unsigned EltSize = VT.getScalarSizeInBits();
if (EltSize >= 32 \|\|		if (EltSize >= 32 \|\|
ShuffleVectorSDNode::isSplatMask(&M[0], VT) \|\|		ShuffleVectorSDNode::isSplatMask(&M[0], VT) \|\|
ShuffleVectorInst::isIdentityMask(M) \|\|		ShuffleVectorInst::isIdentityMask(M, M.size()) \|\|
isVREVMask(M, VT, 64) \|\|		isVREVMask(M, VT, 64) \|\|
isVREVMask(M, VT, 32) \|\|		isVREVMask(M, VT, 32) \|\|
isVREVMask(M, VT, 16))		isVREVMask(M, VT, 16))
return true;		return true;
else if (Subtarget->hasNEON() &&		else if (Subtarget->hasNEON() &&
(isVEXTMask(M, VT, ReverseVEXT, Imm) \|\|		(isVEXTMask(M, VT, ReverseVEXT, Imm) \|\|
isVTBLMask(M, VT) \|\|		isVTBLMask(M, VT) \|\|
isNEONTwoResultShuffleMask(M, VT, WhichResult, isV_UNDEF)))		isNEONTwoResultShuffleMask(M, VT, WhichResult, isV_UNDEF)))
▲ Show 20 Lines • Show All 13,750 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,125 Lines • ▼ Show 20 Lines	static SDValue lowerBitreverseShuffle(ShuffleVectorSDNode *SVN,
const RISCVSubtarget &Subtarget) {		const RISCVSubtarget &Subtarget) {
SDLoc DL(SVN);		SDLoc DL(SVN);
MVT VT = SVN->getSimpleValueType(0);		MVT VT = SVN->getSimpleValueType(0);
SDValue V = SVN->getOperand(0);		SDValue V = SVN->getOperand(0);
unsigned NumElts = VT.getVectorNumElements();		unsigned NumElts = VT.getVectorNumElements();

assert(VT.getVectorElementType() == MVT::i1);		assert(VT.getVectorElementType() == MVT::i1);

if (!ShuffleVectorInst::isReverseMask(SVN->getMask()) \|\|		if (!ShuffleVectorInst::isReverseMask(SVN->getMask(),
		SVN->getMask().size()) \|\|
!SVN->getOperand(1).isUndef())		!SVN->getOperand(1).isUndef())
return SDValue();		return SDValue();

unsigned ViaEltSize = std::max((uint64_t)8, PowerOf2Ceil(NumElts));		unsigned ViaEltSize = std::max((uint64_t)8, PowerOf2Ceil(NumElts));
MVT ViaVT = MVT::getVectorVT(MVT::getIntegerVT(ViaEltSize), 1);		MVT ViaVT = MVT::getVectorVT(MVT::getIntegerVT(ViaEltSize), 1);
MVT ViaBitVT = MVT::getVectorVT(MVT::i1, ViaVT.getScalarSizeInBits());		MVT ViaBitVT = MVT::getVectorVT(MVT::i1, ViaVT.getScalarSizeInBits());

// If we don't have zvbb or the larger element type > ELEN, the operation will		// If we don't have zvbb or the larger element type > ELEN, the operation will
▲ Show 20 Lines • Show All 13,793 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines	case TTI::SK_Select: {
// We are going to permute multiple sources and the result will be in		// We are going to permute multiple sources and the result will be in
// multiple destinations. Providing an accurate cost only for splits where		// multiple destinations. Providing an accurate cost only for splits where
// the element type remains the same.		// the element type remains the same.
if (LT.first.isValid() && LT.first != 1 &&		if (LT.first.isValid() && LT.first != 1 &&
LT.second.isFixedLengthVector() &&		LT.second.isFixedLengthVector() &&
LT.second.getVectorElementType().getSizeInBits() ==		LT.second.getVectorElementType().getSizeInBits() ==
Tp->getElementType()->getPrimitiveSizeInBits() &&		Tp->getElementType()->getPrimitiveSizeInBits() &&
LT.second.getVectorNumElements() <		LT.second.getVectorNumElements() <
cast<FixedVectorType>(Tp)->getNumElements()) {		cast<FixedVectorType>(Tp)->getNumElements() &&
		divideCeil(Mask.size(),
		cast<FixedVectorType>(Tp)->getNumElements()) ==
		static_cast<unsigned>(*LT.first.getValue())) {
unsigned NumRegs = *LT.first.getValue();		unsigned NumRegs = *LT.first.getValue();
unsigned VF = cast<FixedVectorType>(Tp)->getNumElements();		unsigned VF = cast<FixedVectorType>(Tp)->getNumElements();
unsigned SubVF = PowerOf2Ceil(VF / NumRegs);		unsigned SubVF = PowerOf2Ceil(VF / NumRegs);
auto *SubVecTy = FixedVectorType::get(Tp->getElementType(), SubVF);		auto *SubVecTy = FixedVectorType::get(Tp->getElementType(), SubVF);

InstructionCost Cost = 0;		InstructionCost Cost = 0;
for (unsigned I = 0; I < NumRegs; ++I) {		for (unsigned I = 0; I < NumRegs; ++I) {
bool IsSingleVector = true;		bool IsSingleVector = true;
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	InstructionCost RISCVTTIImpl::getInterleavedMemoryOpCost(
// %wide.vec = load <12 x i32>, ptr %3, align 4		// %wide.vec = load <12 x i32>, ptr %3, align 4
// %strided.vec = shufflevector %wide.vec, poison, <4 x i32> <stride mask>		// %strided.vec = shufflevector %wide.vec, poison, <4 x i32> <stride mask>
// %strided.vec1 = shufflevector %wide.vec, poison, <4 x i32> <stride mask>		// %strided.vec1 = shufflevector %wide.vec, poison, <4 x i32> <stride mask>
// %strided.vec2 = shufflevector %wide.vec, poison, <4 x i32> <stride mask>		// %strided.vec2 = shufflevector %wide.vec, poison, <4 x i32> <stride mask>
if (Opcode == Instruction::Load) {		if (Opcode == Instruction::Load) {
InstructionCost Cost = MemCost;		InstructionCost Cost = MemCost;
for (unsigned Index : Indices) {		for (unsigned Index : Indices) {
FixedVectorType *SubVecTy =		FixedVectorType *SubVecTy =
FixedVectorType::get(FVTy->getElementType(), VF);		FixedVectorType::get(FVTy->getElementType(), VF * Factor);
auto Mask = createStrideMask(Index, Factor, VF);		auto Mask = createStrideMask(Index, Factor, VF);
InstructionCost ShuffleCost =		InstructionCost ShuffleCost =
getShuffleCost(TTI::ShuffleKind::SK_PermuteSingleSrc, SubVecTy, Mask,		getShuffleCost(TTI::ShuffleKind::SK_PermuteSingleSrc, SubVecTy, Mask,
CostKind, 0, nullptr, {});		CostKind, 0, nullptr, {});
Cost += ShuffleCost;		Cost += ShuffleCost;
}		}
return Cost;		return Cost;
}		}
▲ Show 20 Lines • Show All 1,272 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,629 Lines • ▼ Show 20 Lines	if (LegalVT.isVector() &&
copy(Mask, NormalizedMask.begin());		copy(Mask, NormalizedMask.begin());
unsigned PrevSrcReg = 0;		unsigned PrevSrcReg = 0;
ArrayRef<int> PrevRegMask;		ArrayRef<int> PrevRegMask;
InstructionCost Cost = 0;		InstructionCost Cost = 0;
processShuffleMasks(		processShuffleMasks(
NormalizedMask, NumOfSrcRegs, NumOfDestRegs, NumOfDestRegs, []() {},		NormalizedMask, NumOfSrcRegs, NumOfDestRegs, NumOfDestRegs, []() {},
[this, SingleOpTy, CostKind, &PrevSrcReg, &PrevRegMask,		[this, SingleOpTy, CostKind, &PrevSrcReg, &PrevRegMask,
&Cost](ArrayRef<int> RegMask, unsigned SrcReg, unsigned DestReg) {		&Cost](ArrayRef<int> RegMask, unsigned SrcReg, unsigned DestReg) {
if (!ShuffleVectorInst::isIdentityMask(RegMask)) {		if (!ShuffleVectorInst::isIdentityMask(RegMask, RegMask.size())) {
// Check if the previous register can be just copied to the next		// Check if the previous register can be just copied to the next
// one.		// one.
if (PrevRegMask.empty() \|\| PrevSrcReg != SrcReg \|\|		if (PrevRegMask.empty() \|\| PrevSrcReg != SrcReg \|\|
PrevRegMask != RegMask)		PrevRegMask != RegMask)
Cost += getShuffleCost(TTI::SK_PermuteSingleSrc, SingleOpTy,		Cost += getShuffleCost(TTI::SK_PermuteSingleSrc, SingleOpTy,
RegMask, CostKind, 0, nullptr);		RegMask, CostKind, 0, nullptr);
else		else
// Just a copy of previous destination register.		// Just a copy of previous destination register.
▲ Show 20 Lines • Show All 5,013 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 2,116 Lines • ▼ Show 20 Lines	static Instruction *foldSelectShuffleOfSelectShuffle(ShuffleVectorInst &Shuf) {
// If the mask chooses from the earlier shuffle, the other mask value is		// If the mask chooses from the earlier shuffle, the other mask value is
// transferred to the combined select shuffle:		// transferred to the combined select shuffle:
// shuf X, (shuf X, Y, M1), M --> shuf X, Y, M'		// shuf X, (shuf X, Y, M1), M --> shuf X, Y, M'
SmallVector<int, 16> NewMask(NumElts);		SmallVector<int, 16> NewMask(NumElts);
for (unsigned i = 0; i != NumElts; ++i)		for (unsigned i = 0; i != NumElts; ++i)
NewMask[i] = Mask[i] < (signed)NumElts ? Mask[i] : Mask1[i];		NewMask[i] = Mask[i] < (signed)NumElts ? Mask[i] : Mask1[i];

// A select mask with undef elements might look like an identity mask.		// A select mask with undef elements might look like an identity mask.
assert((ShuffleVectorInst::isSelectMask(NewMask) \|\|		assert((ShuffleVectorInst::isSelectMask(NewMask, NumElts) \|\|
ShuffleVectorInst::isIdentityMask(NewMask)) &&		ShuffleVectorInst::isIdentityMask(NewMask, NumElts)) &&
"Unexpected shuffle mask");		"Unexpected shuffle mask");
return new ShuffleVectorInst(X, Y, NewMask);		return new ShuffleVectorInst(X, Y, NewMask);
}		}

static Instruction *foldSelectShuffleWith1Binop(ShuffleVectorInst &Shuf) {		static Instruction *foldSelectShuffleWith1Binop(ShuffleVectorInst &Shuf) {
assert(Shuf.isSelect() && "Must have select-equivalent shuffle");		assert(Shuf.isSelect() && "Must have select-equivalent shuffle");

// Are we shuffling together some value and that same value after it has been		// Are we shuffling together some value and that same value after it has been
▲ Show 20 Lines • Show All 1,010 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,759 Lines • ▼ Show 20 Lines	static void reorderOrder(SmallVectorImpl<unsigned> &Order, ArrayRef<int> Mask) {
SmallVector<int> MaskOrder;		SmallVector<int> MaskOrder;
if (Order.empty()) {		if (Order.empty()) {
MaskOrder.resize(Mask.size());		MaskOrder.resize(Mask.size());
std::iota(MaskOrder.begin(), MaskOrder.end(), 0);		std::iota(MaskOrder.begin(), MaskOrder.end(), 0);
} else {		} else {
inversePermutation(Order, MaskOrder);		inversePermutation(Order, MaskOrder);
}		}
reorderReuses(MaskOrder, Mask);		reorderReuses(MaskOrder, Mask);
if (ShuffleVectorInst::isIdentityMask(MaskOrder)) {		if (ShuffleVectorInst::isIdentityMask(MaskOrder, MaskOrder.size())) {
Order.clear();		Order.clear();
return;		return;
}		}
Order.assign(Mask.size(), Mask.size());		Order.assign(Mask.size(), Mask.size());
for (unsigned I = 0, E = Mask.size(); I < E; ++I)		for (unsigned I = 0, E = Mask.size(); I < E; ++I)
if (MaskOrder[I] != PoisonMaskElem)		if (MaskOrder[I] != PoisonMaskElem)
Order[MaskOrder[I]] = I;		Order[MaskOrder[I]] = I;
fixupOrderingIndices(Order);		fixupOrderingIndices(Order);
▲ Show 20 Lines • Show All 492 Lines • ▼ Show 20 Lines	BoUpSLP::getReorderingData(const TreeEntry &TE, bool TopToBottom) {
return std::nullopt;		return std::nullopt;
}		}

/// Checks if the given mask is a "clustered" mask with the same clusters of		/// Checks if the given mask is a "clustered" mask with the same clusters of
/// size \p Sz, which are not identity submasks.		/// size \p Sz, which are not identity submasks.
static bool isRepeatedNonIdentityClusteredMask(ArrayRef<int> Mask,		static bool isRepeatedNonIdentityClusteredMask(ArrayRef<int> Mask,
unsigned Sz) {		unsigned Sz) {
ArrayRef<int> FirstCluster = Mask.slice(0, Sz);		ArrayRef<int> FirstCluster = Mask.slice(0, Sz);
if (ShuffleVectorInst::isIdentityMask(FirstCluster))		if (ShuffleVectorInst::isIdentityMask(FirstCluster, Sz))
return false;		return false;
for (unsigned I = Sz, E = Mask.size(); I < E; I += Sz) {		for (unsigned I = Sz, E = Mask.size(); I < E; I += Sz) {
ArrayRef<int> Cluster = Mask.slice(I, Sz);		ArrayRef<int> Cluster = Mask.slice(I, Sz);
if (Cluster != FirstCluster)		if (Cluster != FirstCluster)
return false;		return false;
}		}
return true;		return true;
}		}
▲ Show 20 Lines • Show All 2,217 Lines • ▼ Show 20 Lines	protected:
/// Checks if the mask is an identity mask.		/// Checks if the mask is an identity mask.
/// \param IsStrict if is true the function returns false if mask size does		/// \param IsStrict if is true the function returns false if mask size does
/// not match vector size.		/// not match vector size.
static bool isIdentityMask(ArrayRef<int> Mask, const FixedVectorType *VecTy,		static bool isIdentityMask(ArrayRef<int> Mask, const FixedVectorType *VecTy,
bool IsStrict) {		bool IsStrict) {
int Limit = Mask.size();		int Limit = Mask.size();
int VF = VecTy->getNumElements();		int VF = VecTy->getNumElements();
return (VF == Limit \|\| !IsStrict) &&		return (VF == Limit \|\| !IsStrict) &&
all_of(Mask, [Limit](int Idx) { return Idx < Limit; }) &&		ShuffleVectorInst::isIdentityMask(Mask, VF);
ShuffleVectorInst::isIdentityMask(Mask);
}		}

/// Tries to combine 2 different masks into single one.		/// Tries to combine 2 different masks into single one.
/// \param LocalVF Vector length of the permuted input vector. \p Mask may		/// \param LocalVF Vector length of the permuted input vector. \p Mask may
/// change the size of the vector, \p LocalVF is the original size of the		/// change the size of the vector, \p LocalVF is the original size of the
/// shuffled vector.		/// shuffled vector.
static void combineMasks(unsigned LocalVF, SmallVectorImpl<int> &Mask,		static void combineMasks(unsigned LocalVF, SmallVectorImpl<int> &Mask,
ArrayRef<int> ExtMask) {		ArrayRef<int> ExtMask) {
unsigned VF = Mask.size();		unsigned VF = Mask.size();
SmallVector<int> NewMask(ExtMask.size(), PoisonMaskElem);		SmallVector<int> NewMask(ExtMask.size(), PoisonMaskElem);
		RKSimonUnsubmitted Done Reply Inline Actions Maybe split this into a series of separate if () { return true/false; } to make it easier to grok? RKSimon: Maybe split this into a series of separate if () { return true/false; } to make it easier to…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Done ABataev: Done
for (int I = 0, Sz = ExtMask.size(); I < Sz; ++I) {		for (int I = 0, Sz = ExtMask.size(); I < Sz; ++I) {
if (ExtMask[I] == PoisonMaskElem)		if (ExtMask[I] == PoisonMaskElem)
continue;		continue;
int MaskedIdx = Mask[ExtMask[I] % VF];		int MaskedIdx = Mask[ExtMask[I] % VF];
NewMask[I] =		NewMask[I] =
MaskedIdx == PoisonMaskElem ? PoisonMaskElem : MaskedIdx % LocalVF;		MaskedIdx == PoisonMaskElem ? PoisonMaskElem : MaskedIdx % LocalVF;
}		}
Mask.swap(NewMask);		Mask.swap(NewMask);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	while (auto *SV = dyn_cast<ShuffleVectorInst>(Op)) {
if (!SVTy)		if (!SVTy)
break;		break;
// Remember the identity or broadcast mask, if it is not a resizing		// Remember the identity or broadcast mask, if it is not a resizing
// shuffle. If no better candidates are found, this Op and Mask will be		// shuffle. If no better candidates are found, this Op and Mask will be
// used in the final shuffle.		// used in the final shuffle.
if (isIdentityMask(Mask, SVTy, /IsStrict=/false)) {		if (isIdentityMask(Mask, SVTy, /IsStrict=/false)) {
if (!IdentityOp \|\| !SinglePermute \|\|		if (!IdentityOp \|\| !SinglePermute \|\|
(isIdentityMask(Mask, SVTy, /IsStrict=/true) &&		(isIdentityMask(Mask, SVTy, /IsStrict=/true) &&
!ShuffleVectorInst::isZeroEltSplatMask(IdentityMask))) {		!ShuffleVectorInst::isZeroEltSplatMask(IdentityMask,
		IdentityMask.size()))) {
IdentityOp = SV;		IdentityOp = SV;
// Store current mask in the IdentityMask so later we did not lost		// Store current mask in the IdentityMask so later we did not lost
// this info if IdentityOp is selected as the best candidate for the		// this info if IdentityOp is selected as the best candidate for the
// permutation.		// permutation.
IdentityMask.assign(Mask);		IdentityMask.assign(Mask);
}		}
}		}
// Remember the broadcast mask. If no better candidates are found, this Op		// Remember the broadcast mask. If no better candidates are found, this Op
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	while (auto *SV = dyn_cast<ShuffleVectorInst>(Op)) {
Mask.swap(ShuffleMask);		Mask.swap(ShuffleMask);
if (IsOp2Undef)		if (IsOp2Undef)
Op = SV->getOperand(0);		Op = SV->getOperand(0);
else		else
Op = SV->getOperand(1);		Op = SV->getOperand(1);
}		}
if (auto *OpTy = dyn_cast<FixedVectorType>(Op->getType());		if (auto *OpTy = dyn_cast<FixedVectorType>(Op->getType());
!OpTy \|\| !isIdentityMask(Mask, OpTy, SinglePermute) \|\|		!OpTy \|\| !isIdentityMask(Mask, OpTy, SinglePermute) \|\|
ShuffleVectorInst::isZeroEltSplatMask(Mask)) {		ShuffleVectorInst::isZeroEltSplatMask(Mask, Mask.size())) {
if (IdentityOp) {		if (IdentityOp) {
V = IdentityOp;		V = IdentityOp;
assert(Mask.size() == IdentityMask.size() &&		assert(Mask.size() == IdentityMask.size() &&
"Expected masks of same sizes.");		"Expected masks of same sizes.");
// Clear known poison elements.		// Clear known poison elements.
for (auto [I, Idx] : enumerate(Mask))		for (auto [I, Idx] : enumerate(Mask))
if (Idx == PoisonMaskElem)		if (Idx == PoisonMaskElem)
IdentityMask[I] = PoisonMaskElem;		IdentityMask[I] = PoisonMaskElem;
Mask.swap(IdentityMask);		Mask.swap(IdentityMask);
auto *Shuffle = dyn_cast<ShuffleVectorInst>(V);		auto *Shuffle = dyn_cast<ShuffleVectorInst>(V);
return SinglePermute &&		return SinglePermute &&
(isIdentityMask(Mask, cast<FixedVectorType>(V->getType()),		(isIdentityMask(Mask, cast<FixedVectorType>(V->getType()),
/IsStrict=/true) \|\|		/IsStrict=/true) \|\|
(Shuffle && Mask.size() == Shuffle->getShuffleMask().size() &&		(Shuffle && Mask.size() == Shuffle->getShuffleMask().size() &&
Shuffle->isZeroEltSplat() &&		Shuffle->isZeroEltSplat() &&
ShuffleVectorInst::isZeroEltSplatMask(Mask)));		ShuffleVectorInst::isZeroEltSplatMask(Mask, Mask.size())));
}		}
V = Op;		V = Op;
return false;		return false;
}		}
V = Op;		V = Op;
return true;		return true;
}		}

▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	if (V2 &&
.getKnownMinValue());		.getKnownMinValue());
for (int I = 0, E = Mask.size(); I < E; ++I) {		for (int I = 0, E = Mask.size(); I < E; ++I) {
if (CombinedMask2[I] != PoisonMaskElem) {		if (CombinedMask2[I] != PoisonMaskElem) {
assert(CombinedMask1[I] == PoisonMaskElem &&		assert(CombinedMask1[I] == PoisonMaskElem &&
"Expected undefined mask element");		"Expected undefined mask element");
CombinedMask1[I] = CombinedMask2[I] + (Op1 == Op2 ? 0 : VF);		CombinedMask1[I] = CombinedMask2[I] + (Op1 == Op2 ? 0 : VF);
}		}
}		}
const int Limit = CombinedMask1.size() * 2;		if (Op1 == Op2 &&
if (Op1 == Op2 && Limit == 2 * VF &&		(ShuffleVectorInst::isIdentityMask(CombinedMask1, VF) \|\|
all_of(CombinedMask1, [=](int Idx) { return Idx < Limit; }) &&		(ShuffleVectorInst::isZeroEltSplatMask(CombinedMask1, VF) &&
(ShuffleVectorInst::isIdentityMask(CombinedMask1) \|\|
(ShuffleVectorInst::isZeroEltSplatMask(CombinedMask1) &&
isa<ShuffleVectorInst>(Op1) &&		isa<ShuffleVectorInst>(Op1) &&
cast<ShuffleVectorInst>(Op1)->getShuffleMask() ==		cast<ShuffleVectorInst>(Op1)->getShuffleMask() ==
ArrayRef(CombinedMask1))))		ArrayRef(CombinedMask1))))
return Builder.createIdentity(Op1);		return Builder.createIdentity(Op1);
return Builder.createShuffleVector(		return Builder.createShuffleVector(
Op1, Op1 == Op2 ? PoisonValue::get(Op1->getType()) : Op2,		Op1, Op1 == Op2 ? PoisonValue::get(Op1->getType()) : Op2,
CombinedMask1);		CombinedMask1);
}		}
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	return GatherCost +
? TTI::TCC_Free		? TTI::TCC_Free
: R.getGatherCost(Gathers, !Root && VL.equals(Gathers)));		: R.getGatherCost(Gathers, !Root && VL.equals(Gathers)));
};		};

/// Compute the cost of creating a vector of type \p VecTy containing the		/// Compute the cost of creating a vector of type \p VecTy containing the
/// extracted values from \p VL.		/// extracted values from \p VL.
InstructionCost computeExtractCost(ArrayRef<Value *> VL, ArrayRef<int> Mask,		InstructionCost computeExtractCost(ArrayRef<Value *> VL, ArrayRef<int> Mask,
TTI::ShuffleKind ShuffleKind) {		TTI::ShuffleKind ShuffleKind) {
auto *VecTy = FixedVectorType::get(VL.front()->getType(), VL.size());		auto *VecTy = cast<FixedVectorType>(
		cast<ExtractElementInst>(find_if(VL, [](Value V) {
		return isa<ExtractElementInst>(V);
		}))->getVectorOperandType());
unsigned NumOfParts = TTI.getNumberOfParts(VecTy);		unsigned NumOfParts = TTI.getNumberOfParts(VecTy);

if (ShuffleKind != TargetTransformInfo::SK_PermuteSingleSrc \|\|		if (ShuffleKind != TargetTransformInfo::SK_PermuteSingleSrc \|\|
!NumOfParts \|\| VecTy->getNumElements() < NumOfParts)		!NumOfParts \|\| VecTy->getNumElements() < NumOfParts)
return TTI.getShuffleCost(ShuffleKind, VecTy, Mask);		return TTI.getShuffleCost(ShuffleKind, VecTy, Mask);

bool AllConsecutive = true;		bool AllConsecutive = true;
unsigned EltsPerVector = VecTy->getNumElements() / NumOfParts;		unsigned EltsPerVector = VecTy->getNumElements() / NumOfParts;
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	InstructionCost computeExtractCost(ArrayRef<Value *> VL, ArrayRef<int> Mask,
}		}
return Cost;		return Cost;
}		}

class ShuffleCostBuilder {		class ShuffleCostBuilder {
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;

static bool isEmptyOrIdentity(ArrayRef<int> Mask, unsigned VF) {		static bool isEmptyOrIdentity(ArrayRef<int> Mask, unsigned VF) {
int Limit = 2 * VF;
return Mask.empty() \|\|		return Mask.empty() \|\|
(VF == Mask.size() &&		(VF == Mask.size() &&
all_of(Mask, [Limit](int Idx) { return Idx < Limit; }) &&		ShuffleVectorInst::isIdentityMask(Mask, VF));
ShuffleVectorInst::isIdentityMask(Mask));
}		}

public:		public:
ShuffleCostBuilder(const TargetTransformInfo &TTI) : TTI(TTI) {}		ShuffleCostBuilder(const TargetTransformInfo &TTI) : TTI(TTI) {}
~ShuffleCostBuilder() = default;		~ShuffleCostBuilder() = default;
InstructionCost createShuffleVector(Value V1, Value ,		InstructionCost createShuffleVector(Value V1, Value ,
ArrayRef<int> Mask) const {		ArrayRef<int> Mask) const {
// Empty mask or identity mask are free.		// Empty mask or identity mask are free.
▲ Show 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	if (Action) {
V = Constant::getNullValue(FixedVectorType::get(		V = Constant::getNullValue(FixedVectorType::get(
Vec.get<const TreeEntry *>()->Scalars.front()->getType(),		Vec.get<const TreeEntry *>()->Scalars.front()->getType(),
CommonMask.size()));		CommonMask.size()));
Action(V, CommonMask);		Action(V, CommonMask);
}		}
::addMask(CommonMask, ExtMask, /ExtendingManyInputs=/true);		::addMask(CommonMask, ExtMask, /ExtendingManyInputs=/true);
if (CommonMask.empty())		if (CommonMask.empty())
return Cost;		return Cost;
int Limit = CommonMask.size() * 2;		if (ShuffleVectorInst::isIdentityMask(CommonMask, CommonMask.size()))
if (all_of(CommonMask, [=](int Idx) { return Idx < Limit; }) &&
ShuffleVectorInst::isIdentityMask(CommonMask))
return Cost;		return Cost;
return Cost +		return Cost +
createShuffle(InVectors.front(),		createShuffle(InVectors.front(),
InVectors.size() == 2 ? InVectors.back() : nullptr,		InVectors.size() == 2 ? InVectors.back() : nullptr,
CommonMask);		CommonMask);
}		}

~ShuffleCostEstimator() {		~ShuffleCostEstimator() {
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	if (E->getOpcode() == Instruction::Store) {
copy(E->ReorderIndices, NewMask.begin());		copy(E->ReorderIndices, NewMask.begin());
} else {		} else {
inversePermutation(E->ReorderIndices, NewMask);		inversePermutation(E->ReorderIndices, NewMask);
}		}
::addMask(Mask, NewMask);		::addMask(Mask, NewMask);
}		}
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
::addMask(Mask, E->ReuseShuffleIndices);		::addMask(Mask, E->ReuseShuffleIndices);
if (!Mask.empty() && !ShuffleVectorInst::isIdentityMask(Mask))		if (!Mask.empty() && !ShuffleVectorInst::isIdentityMask(Mask, Mask.size()))
CommonCost =		CommonCost =
TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FinalVecTy, Mask);
assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
assert(E->getOpcode() &&		assert(E->getOpcode() &&
((allSameType(VL) && allSameBlock(VL)) \|\|		((allSameType(VL) && allSameBlock(VL)) \|\|
(E->getOpcode() == Instruction::GetElementPtr &&		(E->getOpcode() == Instruction::GetElementPtr &&
▲ Show 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	if (const TreeEntry *OpTE = getTreeEntry(VL0->getOperand(0))) {
} else if (OpTE->State == TreeEntry::Vectorize &&		} else if (OpTE->State == TreeEntry::Vectorize &&
OpTE->getOpcode() == Instruction::Load &&		OpTE->getOpcode() == Instruction::Load &&
!OpTE->isAltShuffle()) {		!OpTE->isAltShuffle()) {
if (OpTE->ReorderIndices.empty()) {		if (OpTE->ReorderIndices.empty()) {
CCH = TTI::CastContextHint::Normal;		CCH = TTI::CastContextHint::Normal;
} else {		} else {
SmallVector<int> Mask;		SmallVector<int> Mask;
inversePermutation(OpTE->ReorderIndices, Mask);		inversePermutation(OpTE->ReorderIndices, Mask);
if (ShuffleVectorInst::isReverseMask(Mask))		if (ShuffleVectorInst::isReverseMask(Mask, Mask.size()))
CCH = TTI::CastContextHint::Reversed;		CCH = TTI::CastContextHint::Reversed;
}		}
}		}
} else {		} else {
InstructionsState SrcState = getSameOpcode(E->getOperand(0), *TLI);		InstructionsState SrcState = getSameOpcode(E->getOperand(0), *TLI);
if (SrcState.getOpcode() == Instruction::Load && !SrcState.isAltShuffle())		if (SrcState.getOpcode() == Instruction::Load && !SrcState.isAltShuffle())
CCH = TTI::CastContextHint::GatherScatter;		CCH = TTI::CastContextHint::GatherScatter;
}		}
▲ Show 20 Lines • Show All 836 Lines • ▼ Show 20 Lines	InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
Cost += SpillCost + ExtractCost;		Cost += SpillCost + ExtractCost;
auto &&ResizeToVF = [this, &Cost](const TreeEntry *TE, ArrayRef<int> Mask,		auto &&ResizeToVF = [this, &Cost](const TreeEntry *TE, ArrayRef<int> Mask,
bool) {		bool) {
InstructionCost C = 0;		InstructionCost C = 0;
unsigned VF = Mask.size();		unsigned VF = Mask.size();
unsigned VecVF = TE->getVectorFactor();		unsigned VecVF = TE->getVectorFactor();
if (VF != VecVF &&		if (VF != VecVF &&
(any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); }) \|\|		(any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); }) \|\|
(all_of(Mask,		!ShuffleVectorInst::isIdentityMask(Mask, VF))) {
[VF](int Idx) { return Idx < 2 * static_cast<int>(VF); }) &&
!ShuffleVectorInst::isIdentityMask(Mask)))) {
SmallVector<int> OrigMask(VecVF, PoisonMaskElem);		SmallVector<int> OrigMask(VecVF, PoisonMaskElem);
std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)),		std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)),
OrigMask.begin());		OrigMask.begin());
C = TTI->getShuffleCost(		C = TTI->getShuffleCost(
TTI::SK_PermuteSingleSrc,		TTI::SK_PermuteSingleSrc,
FixedVectorType::get(TE->getMainOp()->getType(), VecVF), OrigMask);		FixedVectorType::get(TE->getMainOp()->getType(), VecVF), OrigMask);
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "SLP: Adding cost " << C		dbgs() << "SLP: Adding cost " << C
Show All 12 Lines	auto *FTy = FixedVectorType::get(
cast<VectorType>(FirstUsers[I].first->getType())->getElementType(), VF);		cast<VectorType>(FirstUsers[I].first->getType())->getElementType(), VF);
auto Vector = ShuffleMasks[I].takeVector();		auto Vector = ShuffleMasks[I].takeVector();
auto &&EstimateShufflesCost = [this, FTy,		auto &&EstimateShufflesCost = [this, FTy,
&Cost](ArrayRef<int> Mask,		&Cost](ArrayRef<int> Mask,
ArrayRef<const TreeEntry *> TEs) {		ArrayRef<const TreeEntry *> TEs) {
assert((TEs.size() == 1 \|\| TEs.size() == 2) &&		assert((TEs.size() == 1 \|\| TEs.size() == 2) &&
"Expected exactly 1 or 2 tree entries.");		"Expected exactly 1 or 2 tree entries.");
if (TEs.size() == 1) {		if (TEs.size() == 1) {
int Limit = 2 * Mask.size();		if (!ShuffleVectorInst::isIdentityMask(Mask, Mask.size())) {
if (!all_of(Mask, [Limit](int Idx) { return Idx < Limit; }) \|\|
!ShuffleVectorInst::isIdentityMask(Mask)) {
InstructionCost C =		InstructionCost C =
TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FTy, Mask);		TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, FTy, Mask);
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for final shuffle of insertelement "		<< " for final shuffle of insertelement "
"external users.\n";		"external users.\n";
TEs.front()->dump();		TEs.front()->dump();
dbgs() << "SLP: Current total cost = " << Cost << "\n");		dbgs() << "SLP: Current total cost = " << Cost << "\n");
Cost += C;		Cost += C;
▲ Show 20 Lines • Show All 752 Lines • ▼ Show 20 Lines	public:
}		}
/// Creates permutation of the single vector operand with the given mask, if		/// Creates permutation of the single vector operand with the given mask, if
/// it is not identity mask.		/// it is not identity mask.
Value createShuffleVector(Value V1, ArrayRef<int> Mask) {		Value createShuffleVector(Value V1, ArrayRef<int> Mask) {
if (Mask.empty())		if (Mask.empty())
return V1;		return V1;
unsigned VF = Mask.size();		unsigned VF = Mask.size();
unsigned LocalVF = cast<FixedVectorType>(V1->getType())->getNumElements();		unsigned LocalVF = cast<FixedVectorType>(V1->getType())->getNumElements();
if (VF == LocalVF && ShuffleVectorInst::isIdentityMask(Mask))		if (VF == LocalVF && ShuffleVectorInst::isIdentityMask(Mask, VF))
return V1;		return V1;
Value *Vec = Builder.CreateShuffleVector(V1, Mask);		Value *Vec = Builder.CreateShuffleVector(V1, Mask);
if (auto *I = dyn_cast<Instruction>(Vec)) {		if (auto *I = dyn_cast<Instruction>(Vec)) {
GatherShuffleExtractSeq.insert(I);		GatherShuffleExtractSeq.insert(I);
CSEBlocks.insert(I->getParent());		CSEBlocks.insert(I->getParent());
}		}
return Vec;		return Vec;
}		}
▲ Show 20 Lines • Show All 375 Lines • ▼ Show 20 Lines	auto *It =
return find_if(TE->UserTreeIndices, [=](const EdgeInfo &EI) {		return find_if(TE->UserTreeIndices, [=](const EdgeInfo &EI) {
return EI.UserTE == UserTE && EI.EdgeIdx != EdgeIdx;		return EI.UserTE == UserTE && EI.EdgeIdx != EdgeIdx;
}) != TE->UserTreeIndices.end();		}) != TE->UserTreeIndices.end();
});		});
if (It == VectorizableTree.end())		if (It == VectorizableTree.end())
return false;		return false;
unsigned I =		unsigned I =
*find_if_not(Mask, [](int Idx) { return Idx == PoisonMaskElem; });		*find_if_not(Mask, [](int Idx) { return Idx == PoisonMaskElem; });
int Sz = Mask.size();		if (ShuffleVectorInst::isIdentityMask(Mask, Mask.size()))
if (all_of(Mask, [Sz](int Idx) { return Idx < 2 * Sz; }) &&
ShuffleVectorInst::isIdentityMask(Mask))
std::iota(Mask.begin(), Mask.end(), 0);		std::iota(Mask.begin(), Mask.end(), 0);
else		else
std::fill(Mask.begin(), Mask.end(), I);		std::fill(Mask.begin(), Mask.end(), I);
return true;		return true;
};		};
BVTy ShuffleBuilder(Params...);		BVTy ShuffleBuilder(Params...);
ResTy Res = ResTy();		ResTy Res = ResTy();
SmallVector<int> Mask;		SmallVector<int> Mask;
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	if (ExtractShuffle \|\| GatherShuffle) {
int MSz = Mask.size();		int MSz = Mask.size();
// Try to build constant vector and shuffle with it only if currently we		// Try to build constant vector and shuffle with it only if currently we
// have a single permutation and more than 1 scalar constants.		// have a single permutation and more than 1 scalar constants.
bool IsSingleShuffle = !ExtractShuffle \|\| !GatherShuffle;		bool IsSingleShuffle = !ExtractShuffle \|\| !GatherShuffle;
bool IsIdentityShuffle =		bool IsIdentityShuffle =
(ExtractShuffle.value_or(TTI::SK_PermuteTwoSrc) ==		(ExtractShuffle.value_or(TTI::SK_PermuteTwoSrc) ==
TTI::SK_PermuteSingleSrc &&		TTI::SK_PermuteSingleSrc &&
none_of(ExtractMask, [&](int I) { return I >= EMSz; }) &&		none_of(ExtractMask, [&](int I) { return I >= EMSz; }) &&
ShuffleVectorInst::isIdentityMask(ExtractMask)) \|\|		ShuffleVectorInst::isIdentityMask(ExtractMask, EMSz)) \|\|
(GatherShuffle.value_or(TTI::SK_PermuteTwoSrc) ==		(GatherShuffle.value_or(TTI::SK_PermuteTwoSrc) ==
TTI::SK_PermuteSingleSrc &&		TTI::SK_PermuteSingleSrc &&
none_of(Mask, [&](int I) { return I >= MSz; }) &&		none_of(Mask, [&](int I) { return I >= MSz; }) &&
ShuffleVectorInst::isIdentityMask(Mask));		ShuffleVectorInst::isIdentityMask(Mask, MSz));
bool EnoughConstsForShuffle =		bool EnoughConstsForShuffle =
IsSingleShuffle &&		IsSingleShuffle &&
(none_of(GatheredScalars,		(none_of(GatheredScalars,
[](Value *V) {		[](Value *V) {
return isa<UndefValue>(V) && !isa<PoisonValue>(V);		return isa<UndefValue>(V) && !isa<PoisonValue>(V);
}) \|\|		}) \|\|
any_of(GatheredScalars,		any_of(GatheredScalars,
[](Value *V) {		[](Value *V) {
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	case Instruction::InsertElement: {
}		}
SmallBitVector UseMask =		SmallBitVector UseMask =
buildUseMask(NumElts, InsertMask, UseMask::UndefsAsMask);		buildUseMask(NumElts, InsertMask, UseMask::UndefsAsMask);
SmallBitVector IsFirstUndef =		SmallBitVector IsFirstUndef =
isUndefVector(FirstInsert->getOperand(0), UseMask);		isUndefVector(FirstInsert->getOperand(0), UseMask);
if ((!IsIdentity \|\| Offset != 0 \|\| !IsFirstUndef.all()) &&		if ((!IsIdentity \|\| Offset != 0 \|\| !IsFirstUndef.all()) &&
NumElts != NumScalars) {		NumElts != NumScalars) {
if (IsFirstUndef.all()) {		if (IsFirstUndef.all()) {
if (!ShuffleVectorInst::isIdentityMask(InsertMask)) {		if (!ShuffleVectorInst::isIdentityMask(InsertMask, NumElts)) {
SmallBitVector IsFirstPoison =		SmallBitVector IsFirstPoison =
isUndefVector<true>(FirstInsert->getOperand(0), UseMask);		isUndefVector<true>(FirstInsert->getOperand(0), UseMask);
if (!IsFirstPoison.all()) {		if (!IsFirstPoison.all()) {
for (unsigned I = 0; I < NumElts; I++) {		for (unsigned I = 0; I < NumElts; I++) {
if (InsertMask[I] == PoisonMaskElem && !IsFirstPoison.test(I))		if (InsertMask[I] == PoisonMaskElem && !IsFirstPoison.test(I))
InsertMask[I] = I + NumElts;		InsertMask[I] = I + NumElts;
}		}
}		}
▲ Show 20 Lines • Show All 790 Lines • ▼ Show 20 Lines	Value *NewInst = performExtractsShuffleAction<Value>(
ArrayRef<Value *> Vals) {		ArrayRef<Value *> Vals) {
assert((Vals.size() == 1 \|\| Vals.size() == 2) &&		assert((Vals.size() == 1 \|\| Vals.size() == 2) &&
"Expected exactly 1 or 2 input values.");		"Expected exactly 1 or 2 input values.");
if (Vals.size() == 1) {		if (Vals.size() == 1) {
// Do not create shuffle if the mask is a simple identity		// Do not create shuffle if the mask is a simple identity
// non-resizing mask.		// non-resizing mask.
if (Mask.size() != cast<FixedVectorType>(Vals.front()->getType())		if (Mask.size() != cast<FixedVectorType>(Vals.front()->getType())
->getNumElements() \|\|		->getNumElements() \|\|
!ShuffleVectorInst::isIdentityMask(Mask))		!ShuffleVectorInst::isIdentityMask(Mask, Mask.size()))
return CreateShuffle(Vals.front(), nullptr, Mask);		return CreateShuffle(Vals.front(), nullptr, Mask);
return Vals.front();		return Vals.front();
}		}
return CreateShuffle(Vals.front() ? Vals.front()		return CreateShuffle(Vals.front() ? Vals.front()
: FirstInsert->getOperand(0),		: FirstInsert->getOperand(0),
Vals.back(), Mask);		Vals.back(), Mask);
});		});
auto It = ShuffledInserts[I].InsertElements.rbegin();		auto It = ShuffledInserts[I].InsertElements.rbegin();
▲ Show 20 Lines • Show All 4,125 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll

Show First 20 Lines • Show All 458 Lines • ▼ Show 20 Lines	loop:
br i1 %done, label %exit, label %loop		br i1 %done, label %exit, label %loop
exit:		exit:
ret void		ret void
}		}

define void @combine_load_factor2_i64(ptr noalias %p, ptr noalias %q) {		define void @combine_load_factor2_i64(ptr noalias %p, ptr noalias %q) {
; CHECK-LABEL: @combine_load_factor2_i64(		; CHECK-LABEL: @combine_load_factor2_i64(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[TMP0]], 1
; CHECK-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[P:%.]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i64, ptr [[TMP2]], i32 0
; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <8 x i64>, ptr [[TMP3]], align 4
; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <8 x i64> [[WIDE_VEC]], <8 x i64> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <8 x i64> [[WIDE_VEC]], <8 x i64> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i64> [[STRIDED_VEC]], [[STRIDED_VEC1]]
; CHECK-NEXT: [[TMP5:%.]] = getelementptr i64, ptr [[Q:%.]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP6:%.*]] = getelementptr i64, ptr [[TMP5]], i32 0
; CHECK-NEXT: store <4 x i64> [[TMP4]], ptr [[TMP6]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[I:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[NEXTI:%.]], [[LOOP]] ]		; CHECK-NEXT: [[I:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[NEXTI:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[OFFSET0:%.*]] = shl i64 [[I]], 1		; CHECK-NEXT: [[OFFSET0:%.*]] = shl i64 [[I]], 1
; CHECK-NEXT: [[Q0:%.*]] = getelementptr i64, ptr [[P]], i64 [[OFFSET0]]		; CHECK-NEXT: [[Q0:%.]] = getelementptr i64, ptr [[P:%.]], i64 [[OFFSET0]]
; CHECK-NEXT: [[X0:%.*]] = load i64, ptr [[Q0]], align 4		; CHECK-NEXT: [[X0:%.*]] = load i64, ptr [[Q0]], align 4
; CHECK-NEXT: [[OFFSET1:%.*]] = add i64 [[OFFSET0]], 1		; CHECK-NEXT: [[OFFSET1:%.*]] = add i64 [[OFFSET0]], 1
; CHECK-NEXT: [[Q1:%.*]] = getelementptr i64, ptr [[P]], i64 [[OFFSET1]]		; CHECK-NEXT: [[Q1:%.*]] = getelementptr i64, ptr [[P]], i64 [[OFFSET1]]
; CHECK-NEXT: [[X1:%.*]] = load i64, ptr [[Q1]], align 4		; CHECK-NEXT: [[X1:%.*]] = load i64, ptr [[Q1]], align 4
; CHECK-NEXT: [[RES:%.*]] = add i64 [[X0]], [[X1]]		; CHECK-NEXT: [[RES:%.*]] = add i64 [[X0]], [[X1]]
; CHECK-NEXT: [[DST:%.*]] = getelementptr i64, ptr [[Q]], i64 [[I]]		; CHECK-NEXT: [[DST:%.]] = getelementptr i64, ptr [[Q:%.]], i64 [[I]]
; CHECK-NEXT: store i64 [[RES]], ptr [[DST]], align 4		; CHECK-NEXT: store i64 [[RES]], ptr [[DST]], align 4
; CHECK-NEXT: [[NEXTI]] = add i64 [[I]], 1		; CHECK-NEXT: [[NEXTI]] = add i64 [[I]], 1
; CHECK-NEXT: [[DONE:%.*]] = icmp eq i64 [[NEXTI]], 1024		; CHECK-NEXT: [[DONE:%.*]] = icmp eq i64 [[NEXTI]], 1024
; CHECK-NEXT: br i1 [[DONE]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP9:![0-9]+]]		; CHECK-NEXT: br i1 [[DONE]], label [[EXIT:%.*]], label [[LOOP]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %loop		br label %loop
loop:		loop:
%i = phi i64 [0, %entry], [%nexti, %loop]		%i = phi i64 [0, %entry], [%nexti, %loop]

Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

	Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> poison, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <4 x i16> [[ARG0:%.]], i64 2
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[ARG0_3:%.*]] = extractelement <4 x i16> [[ARG0]], i64 3
				; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <4 x i16> [[ARG1:%.]], i64 2
				; GFX8-NEXT: [[ARG1_3:%.*]] = extractelement <4 x i16> [[ARG1]], i64 3
				; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
				; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[ADD_3:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_3]], i16 [[ARG1_3]])
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
	; GFX8-NEXT: [[INS_31:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[TMP3]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <4 x i16> [[INS_31]]			; GFX8-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
				; GFX8-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	%arg1.1 = extractelement <4 x i16> %arg1, i64 1			%arg1.1 = extractelement <4 x i16> %arg1, i64 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

	Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0			; GFX7-NEXT: [[INS_0:%.*]] = insertelement <4 x i16> undef, i16 [[ADD_0]], i64 0
	; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1			; GFX7-NEXT: [[INS_1:%.*]] = insertelement <4 x i16> [[INS_0]], i16 [[ADD_1]], i64 1
	; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2			; GFX7-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[INS_1]], i16 [[ADD_2]], i64 2
	; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3			; GFX7-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
	; GFX7-NEXT: ret <4 x i16> [[INS_3]]			; GFX7-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	; GFX8-LABEL: @uadd_sat_v4i16(			; GFX8-LABEL: @uadd_sat_v4i16(
	; GFX8-NEXT: bb:			; GFX8-NEXT: bb:
	; GFX8-NEXT: [[TMP0:%.]] = shufflevector <4 x i16> [[ARG0:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[ARG0_2:%.]] = extractelement <4 x i16> [[ARG0:%.]], i64 2
	; GFX8-NEXT: [[TMP1:%.]] = shufflevector <4 x i16> [[ARG1:%.]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>			; GFX8-NEXT: [[ARG0_3:%.*]] = extractelement <4 x i16> [[ARG0]], i64 3
				; GFX8-NEXT: [[ARG1_2:%.]] = extractelement <4 x i16> [[ARG1:%.]], i64 2
				; GFX8-NEXT: [[ARG1_3:%.*]] = extractelement <4 x i16> [[ARG1]], i64 3
				; GFX8-NEXT: [[TMP0:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
				; GFX8-NEXT: [[TMP1:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> poison, <2 x i32> <i32 0, i32 1>
	; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; GFX8-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])
	; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <4 x i16> [[ARG0]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
	; GFX8-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[ARG1]], <4 x i16> poison, <2 x i32> <i32 2, i32 3>			; GFX8-NEXT: [[ADD_3:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_3]], i16 [[ARG1_3]])
	; GFX8-NEXT: [[TMP5:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP3]], <2 x i16> [[TMP4]])			; GFX8-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
	; GFX8-NEXT: [[INS_31:%.*]] = shufflevector <2 x i16> [[TMP2]], <2 x i16> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; GFX8-NEXT: [[INS_2:%.*]] = insertelement <4 x i16> [[TMP3]], i16 [[ADD_2]], i64 2
	; GFX8-NEXT: ret <4 x i16> [[INS_31]]			; GFX8-NEXT: [[INS_3:%.*]] = insertelement <4 x i16> [[INS_2]], i16 [[ADD_3]], i64 3
				; GFX8-NEXT: ret <4 x i16> [[INS_3]]
	;			;
	bb:			bb:
	%arg0.0 = extractelement <4 x i16> %arg0, i64 0			%arg0.0 = extractelement <4 x i16> %arg0, i64 0
	%arg0.1 = extractelement <4 x i16> %arg0, i64 1			%arg0.1 = extractelement <4 x i16> %arg0, i64 1
	%arg0.2 = extractelement <4 x i16> %arg0, i64 2			%arg0.2 = extractelement <4 x i16> %arg0, i64 2
	%arg0.3 = extractelement <4 x i16> %arg0, i64 3			%arg0.3 = extractelement <4 x i16> %arg0, i64 3
	%arg1.0 = extractelement <4 x i16> %arg1, i64 0			%arg1.0 = extractelement <4 x i16> %arg1, i64 0
	%arg1.1 = extractelement <4 x i16> %arg1, i64 1			%arg1.1 = extractelement <4 x i16> %arg1, i64 1
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/crash_extract_subvector_cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=slp-vectorizer %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -passes=slp-vectorizer %s \| FileCheck %s

	define <2 x i16> @uadd_sat_v9i16_combine_vi16(<9 x i16> %arg0, <9 x i16> %arg1) {			define <2 x i16> @uadd_sat_v9i16_combine_vi16(<9 x i16> %arg0, <9 x i16> %arg1) {
	; CHECK-LABEL: @uadd_sat_v9i16_combine_vi16(			; CHECK-LABEL: @uadd_sat_v9i16_combine_vi16(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP0:%.]] = shufflevector <9 x i16> [[ARG0:%.]], <9 x i16> poison, <2 x i32> <i32 poison, i32 8>			; CHECK-NEXT: [[ARG0_1:%.*]] = extractelement <9 x i16> undef, i64 7
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <9 x i16> [[ARG1:%.]], <9 x i16> poison, <2 x i32> <i32 7, i32 8>			; CHECK-NEXT: [[ARG0_2:%.]] = extractelement <9 x i16> [[ARG0:%.]], i64 8
	; CHECK-NEXT: [[TMP2:%.*]] = call <2 x i16> @llvm.uadd.sat.v2i16(<2 x i16> [[TMP0]], <2 x i16> [[TMP1]])			; CHECK-NEXT: [[ARG1_1:%.]] = extractelement <9 x i16> [[ARG1:%.]], i64 7
	; CHECK-NEXT: ret <2 x i16> [[TMP2]]			; CHECK-NEXT: [[ARG1_2:%.*]] = extractelement <9 x i16> [[ARG1]], i64 8
				; CHECK-NEXT: [[ADD_1:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_1]], i16 [[ARG1_1]])
				; CHECK-NEXT: [[ADD_2:%.*]] = call i16 @llvm.uadd.sat.i16(i16 [[ARG0_2]], i16 [[ARG1_2]])
				; CHECK-NEXT: [[INS_1:%.*]] = insertelement <2 x i16> undef, i16 [[ADD_1]], i64 0
				; CHECK-NEXT: [[INS_2:%.*]] = insertelement <2 x i16> [[INS_1]], i16 [[ADD_2]], i64 1
				; CHECK-NEXT: ret <2 x i16> [[INS_2]]
	;			;
	bb:			bb:
	%arg0.1 = extractelement <9 x i16> undef, i64 7			%arg0.1 = extractelement <9 x i16> undef, i64 7
	%arg0.2 = extractelement <9 x i16> %arg0, i64 8			%arg0.2 = extractelement <9 x i16> %arg0, i64 8
	%arg1.1 = extractelement <9 x i16> %arg1, i64 7			%arg1.1 = extractelement <9 x i16> %arg1, i64 7
	%arg1.2 = extractelement <9 x i16> %arg1, i64 8			%arg1.2 = extractelement <9 x i16> %arg1, i64 8
	%add.1 = call i16 @llvm.uadd.sat.i16(i16 %arg0.1, i16 %arg1.1)			%add.1 = call i16 @llvm.uadd.sat.i16(i16 %arg0.1, i16 %arg1.1)
	%add.2 = call i16 @llvm.uadd.sat.i16(i16 %arg0.2, i16 %arg1.2)			%add.2 = call i16 @llvm.uadd.sat.i16(i16 %arg0.2, i16 %arg1.2)
	%ins.1 = insertelement <2 x i16> undef, i16 %add.1, i64 0			%ins.1 = insertelement <2 x i16> undef, i16 %add.1, i64 0
	%ins.2 = insertelement <2 x i16> %ins.1, i16 %add.2, i64 1			%ins.2 = insertelement <2 x i16> %ins.1, i16 %add.2, i64 1
	ret <2 x i16> %ins.2			ret <2 x i16> %ins.2
	}			}

	declare i16 @llvm.uadd.sat.i16(i16, i16) #0			declare i16 @llvm.uadd.sat.i16(i16, i16) #0
	attributes #0 = { nounwind readnone speculatable willreturn }			attributes #0 = { nounwind readnone speculatable willreturn }

llvm/test/Transforms/SLPVectorizer/AMDGPU/phi-result-use-order.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -passes=slp-vectorizer -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 < %s \| FileCheck %s		; RUN: opt -passes=slp-vectorizer -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 < %s \| FileCheck %s

define <4 x half> @phis(i1 %cmp1, <4 x half> %in1, <4 x half> %in2) {		define <4 x half> @phis(i1 %cmp1, <4 x half> %in1, <4 x half> %in2) {
; CHECK-LABEL: @phis(		; CHECK-LABEL: @phis(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = shufflevector <4 x half> [[IN1:%.]], <4 x half> poison, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[A2:%.]] = extractelement <4 x half> [[IN1:%.]], i64 2
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x half> [[IN1]], <4 x half> poison, <2 x i32> <i32 2, i32 3>		; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x half> [[IN1]], i64 3
		; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x half> [[IN1]], <4 x half> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: br i1 [[CMP1:%.]], label [[BB1:%.]], label [[BB0:%.*]]		; CHECK-NEXT: br i1 [[CMP1:%.]], label [[BB1:%.]], label [[BB0:%.*]]
; CHECK: bb0:		; CHECK: bb0:
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x half> [[IN2:%.]], <4 x half> poison, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[B2:%.]] = extractelement <4 x half> [[IN2:%.]], i64 2
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x half> [[IN2]], <4 x half> poison, <2 x i32> <i32 2, i32 3>		; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x half> [[IN2]], i64 3
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x half> [[IN2]], <4 x half> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: br label [[BB1]]		; CHECK-NEXT: br label [[BB1]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[TMP4:%.]] = phi <2 x half> [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP2]], [[BB0]] ]		; CHECK-NEXT: [[C2:%.]] = phi half [ [[A2]], [[ENTRY:%.]] ], [ [[B2]], [[BB0]] ]
; CHECK-NEXT: [[TMP5:%.*]] = phi <2 x half> [ [[TMP1]], [[ENTRY]] ], [ [[TMP3]], [[BB0]] ]		; CHECK-NEXT: [[C3:%.*]] = phi half [ [[A3]], [[ENTRY]] ], [ [[B3]], [[BB0]] ]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x half> [[TMP4]], <2 x half> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>		; CHECK-NEXT: [[TMP2:%.*]] = phi <2 x half> [ [[TMP0]], [[ENTRY]] ], [ [[TMP1]], [[BB0]] ]
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x half> [[TMP5]], <2 x half> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x half> [[TMP2]], <2 x half> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x half> [[TMP4]], <2 x half> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[O2:%.*]] = insertelement <4 x half> [[TMP3]], half [[C2]], i64 2
; CHECK-NEXT: ret <4 x half> [[TMP8]]		; CHECK-NEXT: [[O3:%.*]] = insertelement <4 x half> [[O2]], half [[C3]], i64 3
		; CHECK-NEXT: ret <4 x half> [[O3]]
;		;
entry:		entry:
%a0 = extractelement <4 x half> %in1, i64 0		%a0 = extractelement <4 x half> %in1, i64 0
%a1 = extractelement <4 x half> %in1, i64 1		%a1 = extractelement <4 x half> %in1, i64 1
%a2 = extractelement <4 x half> %in1, i64 2		%a2 = extractelement <4 x half> %in1, i64 2
%a3 = extractelement <4 x half> %in1, i64 3		%a3 = extractelement <4 x half> %in1, i64 3
br i1 %cmp1, label %bb1, label %bb0		br i1 %cmp1, label %bb1, label %bb0

Show All 15 Lines	bb1:
%o2 = insertelement <4 x half> %o1, half %c2, i64 2		%o2 = insertelement <4 x half> %o1, half %c2, i64 2
%o3 = insertelement <4 x half> %o2, half %c3, i64 3		%o3 = insertelement <4 x half> %o2, half %c3, i64 3
ret <4 x half> %o3		ret <4 x half> %o3
}		}

define <4 x half> @phis_reverse(i1 %cmp1, <4 x half> %in1, <4 x half> %in2) {		define <4 x half> @phis_reverse(i1 %cmp1, <4 x half> %in1, <4 x half> %in2) {
; CHECK-LABEL: @phis_reverse(		; CHECK-LABEL: @phis_reverse(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = shufflevector <4 x half> [[IN1:%.]], <4 x half> poison, <2 x i32> <i32 2, i32 3>		; CHECK-NEXT: [[A2:%.]] = extractelement <4 x half> [[IN1:%.]], i64 2
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x half> [[IN1]], <4 x half> poison, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x half> [[IN1]], i64 3
		; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <4 x half> [[IN1]], <4 x half> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: br i1 [[CMP1:%.]], label [[BB1:%.]], label [[BB0:%.*]]		; CHECK-NEXT: br i1 [[CMP1:%.]], label [[BB1:%.]], label [[BB0:%.*]]
; CHECK: bb0:		; CHECK: bb0:
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x half> [[IN2:%.]], <4 x half> poison, <2 x i32> <i32 2, i32 3>		; CHECK-NEXT: [[B2:%.]] = extractelement <4 x half> [[IN2:%.]], i64 2
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x half> [[IN2]], <4 x half> poison, <2 x i32> <i32 0, i32 1>		; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x half> [[IN2]], i64 3
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x half> [[IN2]], <4 x half> poison, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: br label [[BB1]]		; CHECK-NEXT: br label [[BB1]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[TMP4:%.]] = phi <2 x half> [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP2]], [[BB0]] ]		; CHECK-NEXT: [[C3:%.]] = phi half [ [[A3]], [[ENTRY:%.]] ], [ [[B3]], [[BB0]] ]
; CHECK-NEXT: [[TMP5:%.*]] = phi <2 x half> [ [[TMP1]], [[ENTRY]] ], [ [[TMP3]], [[BB0]] ]		; CHECK-NEXT: [[C2:%.*]] = phi half [ [[A2]], [[ENTRY]] ], [ [[B2]], [[BB0]] ]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x half> [[TMP5]], <2 x half> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>		; CHECK-NEXT: [[TMP2:%.*]] = phi <2 x half> [ [[TMP0]], [[ENTRY]] ], [ [[TMP1]], [[BB0]] ]
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x half> [[TMP4]], <2 x half> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x half> [[TMP2]], <2 x half> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x half> [[TMP6]], <4 x half> [[TMP7]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>		; CHECK-NEXT: [[O2:%.*]] = insertelement <4 x half> [[TMP3]], half [[C2]], i64 2
; CHECK-NEXT: ret <4 x half> [[TMP8]]		; CHECK-NEXT: [[O3:%.*]] = insertelement <4 x half> [[O2]], half [[C3]], i64 3
		; CHECK-NEXT: ret <4 x half> [[O3]]
;		;
entry:		entry:
%a0 = extractelement <4 x half> %in1, i64 0		%a0 = extractelement <4 x half> %in1, i64 0
%a1 = extractelement <4 x half> %in1, i64 1		%a1 = extractelement <4 x half> %in1, i64 1
%a2 = extractelement <4 x half> %in1, i64 2		%a2 = extractelement <4 x half> %in1, i64 2
%a3 = extractelement <4 x half> %in1, i64 3		%a3 = extractelement <4 x half> %in1, i64 3
br i1 %cmp1, label %bb1, label %bb0		br i1 %cmp1, label %bb1, label %bb0

Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll

	Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])			; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])			; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @expf(float [[VECEXT_2]])
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; CHECK-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: ret <4 x float> [[VECINS_31]]			; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @expf(float [[VECEXT_3]])
				; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	; DEFAULT-LABEL: define <4 x float> @exp_4x			; DEFAULT-LABEL: define <4 x float> @exp_4x
	; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {			; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])			; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
	; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; DEFAULT-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; DEFAULT-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])			; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @expf(float [[VECEXT_2]])
	; DEFAULT-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; DEFAULT-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; DEFAULT-NEXT: ret <4 x float> [[VECINS_31]]			; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @expf(float [[VECEXT_3]])
				; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, ptr %a, align 16			%0 = load <4 x float>, ptr %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @expf(float %vecext)			%1 = tail call fast float @expf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @expf(float %vecext.1)			%2 = tail call fast float @expf(float %vecext.1)
	Show All 16 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT]])			; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT]])
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT_1]])			; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT_1]])
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])			; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT_2]])
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; CHECK-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: ret <4 x float> [[VECINS_31]]			; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT_3]])
				; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	; DEFAULT-LABEL: define <4 x float> @int_exp_4x			; DEFAULT-LABEL: define <4 x float> @int_exp_4x
	; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {			; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT]])			; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT]])
	; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT_1]])			; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT_1]])
	; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; DEFAULT-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; DEFAULT-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])			; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT_2]])
	; DEFAULT-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; DEFAULT-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; DEFAULT-NEXT: ret <4 x float> [[VECINS_31]]			; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.exp.f32(float [[VECEXT_3]])
				; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, ptr %a, align 16			%0 = load <4 x float>, ptr %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.exp.f32(float %vecext)			%1 = tail call fast float @llvm.exp.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @llvm.exp.f32(float %vecext.1)			%2 = tail call fast float @llvm.exp.f32(float %vecext.1)
	Show All 16 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])			; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])			; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @logf(float [[VECEXT_2]])
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; CHECK-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: ret <4 x float> [[VECINS_31]]			; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @logf(float [[VECEXT_3]])
				; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	; DEFAULT-LABEL: define <4 x float> @log_4x			; DEFAULT-LABEL: define <4 x float> @log_4x
	; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {			; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])			; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
	; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; DEFAULT-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; DEFAULT-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])			; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @logf(float [[VECEXT_2]])
	; DEFAULT-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; DEFAULT-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; DEFAULT-NEXT: ret <4 x float> [[VECINS_31]]			; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @logf(float [[VECEXT_3]])
				; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, ptr %a, align 16			%0 = load <4 x float>, ptr %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @logf(float %vecext)			%1 = tail call fast float @logf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @logf(float %vecext.1)			%2 = tail call fast float @logf(float %vecext.1)
	Show All 16 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT]])			; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT]])
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT_1]])			; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT_1]])
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])			; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT_2]])
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; CHECK-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: ret <4 x float> [[VECINS_31]]			; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT_3]])
				; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	; DEFAULT-LABEL: define <4 x float> @int_log_4x			; DEFAULT-LABEL: define <4 x float> @int_log_4x
	; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {			; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT]])			; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT]])
	; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT_1]])			; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT_1]])
	; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; DEFAULT-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; DEFAULT-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])			; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT_2]])
	; DEFAULT-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; DEFAULT-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; DEFAULT-NEXT: ret <4 x float> [[VECINS_31]]			; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.log.f32(float [[VECEXT_3]])
				; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, ptr %a, align 16			%0 = load <4 x float>, ptr %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.log.f32(float %vecext)			%1 = tail call fast float @llvm.log.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @llvm.log.f32(float %vecext.1)			%2 = tail call fast float @llvm.log.f32(float %vecext.1)
	Show All 16 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])			; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])			; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @sinf(float [[VECEXT_2]])
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; CHECK-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: ret <4 x float> [[VECINS_31]]			; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])
				; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	; DEFAULT-LABEL: define <4 x float> @sin_4x			; DEFAULT-LABEL: define <4 x float> @sin_4x
	; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {			; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])			; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
	; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; DEFAULT-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; DEFAULT-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])			; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @sinf(float [[VECEXT_2]])
	; DEFAULT-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; DEFAULT-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; DEFAULT-NEXT: ret <4 x float> [[VECINS_31]]			; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])
				; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, ptr %a, align 16			%0 = load <4 x float>, ptr %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @sinf(float %vecext)			%1 = tail call fast float @sinf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @sinf(float %vecext.1)			%2 = tail call fast float @sinf(float %vecext.1)
	Show All 16 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])			; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
	; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])			; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_2]])
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; CHECK-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: ret <4 x float> [[VECINS_31]]			; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])
				; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	; DEFAULT-LABEL: define <4 x float> @int_sin_4x			; DEFAULT-LABEL: define <4 x float> @int_sin_4x
	; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {			; DEFAULT-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16			; DEFAULT-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
	; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; DEFAULT-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; DEFAULT-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; DEFAULT-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; DEFAULT-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])			; DEFAULT-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
	; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1			; DEFAULT-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; DEFAULT-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; DEFAULT-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; DEFAULT-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])			; DEFAULT-NEXT: [[TMP3:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_2]])
	; DEFAULT-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>			; DEFAULT-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
	; DEFAULT-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; DEFAULT-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; DEFAULT-NEXT: ret <4 x float> [[VECINS_31]]			; DEFAULT-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])
				; DEFAULT-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
				; DEFAULT-NEXT: ret <4 x float> [[VECINS_3]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, ptr %a, align 16			%0 = load <4 x float>, ptr %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.sin.f32(float %vecext)			%1 = tail call fast float @llvm.sin.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @llvm.sin.f32(float %vecext.1)			%2 = tail call fast float @llvm.sin.f32(float %vecext.1)
	▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX2

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
	; SSE-LABEL: @ceil_floor(			; SSE-LABEL: @ceil_floor(
	; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
	; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	Show All 30 Lines
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
	; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x float> [[R71]]			; SLM-NEXT: ret <8 x float> [[R71]]
	;			;
	; AVX-LABEL: @ceil_floor(			; AVX-LABEL: @ceil_floor(
	; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
				; AVX-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i64 1
				; AVX-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i64 2
	; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>			; AVX-NEXT: [[AB1:%.*]] = call float @llvm.floor.f32(float [[A1]])
	; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; AVX-NEXT: [[AB2:%.*]] = call float @llvm.floor.f32(float [[A2]])
	; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>			; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP1]])
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP3]])
	; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i64 0			; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i64 0
	; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; AVX-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i64 1
	; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; AVX-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i64 2
	; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3			; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i64 3
	; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>			; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
	; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; AVX-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; AVX-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; AVX-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: ret <8 x float> [[R71]]			; AVX-NEXT: ret <8 x float> [[R71]]
	;			;
				; AVX2-LABEL: @ceil_floor(
				; AVX2-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
				; AVX2-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
				; AVX2-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
				; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
				; AVX2-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
				; AVX2-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
				; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
				; AVX2-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
				; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
				; AVX2-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
				; AVX2-NEXT: [[R0:%.*]] = insertelement <8 x float> poison, float [[AB0]], i64 0
				; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
				; AVX2-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
				; AVX2-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3
				; AVX2-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
				; AVX2-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
				; AVX2-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
				; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
				; AVX2-NEXT: ret <8 x float> [[R71]]
				;
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	%a1 = extractelement <8 x float> %a, i32 1			%a1 = extractelement <8 x float> %a, i32 1
	%a2 = extractelement <8 x float> %a, i32 2			%a2 = extractelement <8 x float> %a, i32 2
	%a3 = extractelement <8 x float> %a, i32 3			%a3 = extractelement <8 x float> %a, i32 3
	%a4 = extractelement <8 x float> %a, i32 4			%a4 = extractelement <8 x float> %a, i32 4
	%a5 = extractelement <8 x float> %a, i32 5			%a5 = extractelement <8 x float> %a, i32 5
	%a6 = extractelement <8 x float> %a, i32 6			%a6 = extractelement <8 x float> %a, i32 6
	%a7 = extractelement <8 x float> %a, i32 7			%a7 = extractelement <8 x float> %a, i32 7
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=SSE			; RUN: opt < %s -mtriple=x86_64-unknown -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=SLM			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=SLM
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX2
	; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX			; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -passes=slp-vectorizer,instcombine -S \| FileCheck %s --check-prefixes=AVX2

	define <8 x float> @ceil_floor(<8 x float> %a) {			define <8 x float> @ceil_floor(<8 x float> %a) {
	; SSE-LABEL: @ceil_floor(			; SSE-LABEL: @ceil_floor(
	; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; SSE-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
	; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; SSE-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; SSE-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>			; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
	; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; SSE-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
	Show All 30 Lines
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; SLM-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>			; SLM-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
	; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; SLM-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; SLM-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; SLM-NEXT: ret <8 x float> [[R71]]			; SLM-NEXT: ret <8 x float> [[R71]]
	;			;
	; AVX-LABEL: @ceil_floor(			; AVX-LABEL: @ceil_floor(
	; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0			; AVX-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
				; AVX-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i64 1
				; AVX-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i64 2
	; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3			; AVX-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
	; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])			; AVX-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
	; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>			; AVX-NEXT: [[AB1:%.*]] = call float @llvm.floor.f32(float [[A1]])
	; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])			; AVX-NEXT: [[AB2:%.*]] = call float @llvm.floor.f32(float [[A2]])
	; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])			; AVX-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>			; AVX-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
	; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])			; AVX-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP1]])
	; AVX-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
	; AVX-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])			; AVX-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP3]])
	; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i64 0			; AVX-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i64 0
	; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; AVX-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i64 1
	; AVX-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; AVX-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i64 2
	; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3			; AVX-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i64 3
	; AVX-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>			; AVX-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
	; AVX-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>			; AVX-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
	; AVX-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>			; AVX-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
	; AVX-NEXT: ret <8 x float> [[R71]]			; AVX-NEXT: ret <8 x float> [[R71]]
	;			;
				; AVX2-LABEL: @ceil_floor(
				; AVX2-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i64 0
				; AVX2-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i64 3
				; AVX2-NEXT: [[AB0:%.*]] = call float @llvm.ceil.f32(float [[A0]])
				; AVX2-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
				; AVX2-NEXT: [[TMP2:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP1]])
				; AVX2-NEXT: [[AB3:%.*]] = call float @llvm.ceil.f32(float [[A3]])
				; AVX2-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 5>
				; AVX2-NEXT: [[TMP4:%.*]] = call <2 x float> @llvm.ceil.v2f32(<2 x float> [[TMP3]])
				; AVX2-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 6, i32 7>
				; AVX2-NEXT: [[TMP6:%.*]] = call <2 x float> @llvm.floor.v2f32(<2 x float> [[TMP5]])
				; AVX2-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i64 0
				; AVX2-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
				; AVX2-NEXT: [[R23:%.*]] = shufflevector <8 x float> [[R0]], <8 x float> [[TMP7]], <8 x i32> <i32 0, i32 8, i32 9, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
				; AVX2-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R23]], float [[AB3]], i64 3
				; AVX2-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
				; AVX2-NEXT: [[R52:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
				; AVX2-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
				; AVX2-NEXT: [[R71:%.*]] = shufflevector <8 x float> [[R52]], <8 x float> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
				; AVX2-NEXT: ret <8 x float> [[R71]]
				;
	%a0 = extractelement <8 x float> %a, i32 0			%a0 = extractelement <8 x float> %a, i32 0
	%a1 = extractelement <8 x float> %a, i32 1			%a1 = extractelement <8 x float> %a, i32 1
	%a2 = extractelement <8 x float> %a, i32 2			%a2 = extractelement <8 x float> %a, i32 2
	%a3 = extractelement <8 x float> %a, i32 3			%a3 = extractelement <8 x float> %a, i32 3
	%a4 = extractelement <8 x float> %a, i32 4			%a4 = extractelement <8 x float> %a, i32 4
	%a5 = extractelement <8 x float> %a, i32 5			%a5 = extractelement <8 x float> %a, i32 5
	%a6 = extractelement <8 x float> %a, i32 6			%a6 = extractelement <8 x float> %a, i32 6
	%a7 = extractelement <8 x float> %a, i32 7			%a7 = extractelement <8 x float> %a, i32 7
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i64 3
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i64 0
; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i64 1
; SSE-NEXT: [[R031:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i64 2
; SSE-NEXT: ret <4 x double> [[R031]]		; SSE-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
		; SSE-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
		; SSE-NEXT: [[R1:%.*]] = fadd double [[B0]], [[B1]]
		; SSE-NEXT: [[R2:%.*]] = fadd double [[A2]], [[A3]]
		; SSE-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[R0]], i64 0
		; SSE-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i64 1
		; SSE-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i64 2
		; SSE-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i64 3
		; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i64 2
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i64 3
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i64 0
; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i64 1
; SLM-NEXT: [[R031:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i64 2
; SLM-NEXT: ret <4 x double> [[R031]]		; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
		; SLM-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
		; SLM-NEXT: [[R1:%.*]] = fadd double [[B0]], [[B1]]
		; SLM-NEXT: [[R2:%.*]] = fadd double [[A2]], [[A3]]
		; SLM-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
		; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[R0]], i64 0
		; SLM-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i64 1
		; SLM-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i64 2
		; SLM-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i64 3
		; SLM-NEXT: ret <4 x double> [[R03]]
;		;
; AVX-LABEL: @test_v4f64(		; AVX-LABEL: @test_v4f64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <4 x double> [[TMP3]]		; AVX-NEXT: ret <4 x double> [[TMP3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
Show All 12 Lines	;
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

; PR50392		; PR50392
define <4 x double> @test_v4f64_partial_swizzle(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64_partial_swizzle(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @test_v4f64_partial_swizzle(		; SSE-LABEL: @test_v4f64_partial_swizzle(
; CHECK-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i64 2		; SSE-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3		; SSE-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[B:%.]], <4 x double> poison, <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[B]], <4 x double> poison, <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]		; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 poison, i32 1, i32 poison>		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[R0]], i64 0
; CHECK-NEXT: [[R03:%.*]] = insertelement <4 x double> [[TMP4]], double [[R3]], i64 3		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
; CHECK-NEXT: ret <4 x double> [[R03]]		; SSE-NEXT: [[R031:%.*]] = shufflevector <4 x double> [[R00]], <4 x double> [[TMP4]], <4 x i32> <i32 0, i32 poison, i32 4, i32 5>
		; SSE-NEXT: ret <4 x double> [[R031]]
		;
		; SLM-LABEL: @test_v4f64_partial_swizzle(
		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
		; SLM-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[B:%.]], <4 x double> poison, <2 x i32> <i32 1, i32 2>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[B]], <4 x double> poison, <2 x i32> <i32 0, i32 3>
		; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[R0]], i64 0
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
		; SLM-NEXT: [[R031:%.*]] = shufflevector <4 x double> [[R00]], <4 x double> [[TMP4]], <4 x i32> <i32 0, i32 poison, i32 4, i32 5>
		; SLM-NEXT: ret <4 x double> [[R031]]
;		;
%a0 = extractelement <4 x double> %a, i64 0		%a0 = extractelement <4 x double> %a, i64 0
%a1 = extractelement <4 x double> %a, i64 1		%a1 = extractelement <4 x double> %a, i64 1
%b0 = extractelement <4 x double> %b, i64 0		%b0 = extractelement <4 x double> %b, i64 0
%b1 = extractelement <4 x double> %b, i64 1		%b1 = extractelement <4 x double> %b, i64 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%r0 = fadd double %a0, %a1		%r0 = fadd double %a0, %a1
%r2 = fadd double %b0, %b1		%r2 = fadd double %b0, %b1
%r3 = fadd double %b2, %b3		%r3 = fadd double %b2, %b3
%r00 = insertelement <4 x double> poison, double %r0, i32 0		%r00 = insertelement <4 x double> poison, double %r0, i32 0
%r02 = insertelement <4 x double> %r00, double %r2, i32 2		%r02 = insertelement <4 x double> %r00, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: ret <8 x float> [[TMP3]]
;		;
; SLM-LABEL: @test_v8f32(		; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 0, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: [[TMP4:%.]] = shufflevector <8 x float> [[B:%.]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 0, i32 3>
; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x float> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R071:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 5, i32 6>
		; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 7>
		; SLM-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[TMP7]], [[TMP8]]
		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 5, i32 6>
		; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 4, i32 7>
		; SLM-NEXT: [[TMP12:%.*]] = fadd <2 x float> [[TMP10]], [[TMP11]]
		; SLM-NEXT: [[R033:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[R052:%.*]] = shufflevector <8 x float> [[R033]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
		; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[R071:%.*]] = shufflevector <8 x float> [[R052]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SLM-NEXT: ret <8 x float> [[R071]]		; SLM-NEXT: ret <8 x float> [[R071]]
;		;
; AVX-LABEL: @test_v8f32(		; AVX-LABEL: @test_v8f32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x float> [[TMP3]]		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	;
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; SSE-NEXT: [[B0:%.]] = extractelement <16 x i16> [[B:%.]], i64 0
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; SSE-NEXT: [[B1:%.*]] = extractelement <16 x i16> [[B]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[B2:%.*]] = extractelement <16 x i16> [[B]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[B3:%.*]] = extractelement <16 x i16> [[B]], i64 3
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[B4:%.*]] = extractelement <16 x i16> [[B]], i64 4
; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[B5:%.*]] = extractelement <16 x i16> [[B]], i64 5
; SSE-NEXT: [[RV151:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[B6:%.*]] = extractelement <16 x i16> [[B]], i64 6
; SSE-NEXT: ret <16 x i16> [[RV151]]		; SSE-NEXT: [[B7:%.*]] = extractelement <16 x i16> [[B]], i64 7
		; SSE-NEXT: [[B8:%.*]] = extractelement <16 x i16> [[B]], i64 8
		; SSE-NEXT: [[B9:%.*]] = extractelement <16 x i16> [[B]], i64 9
		; SSE-NEXT: [[B10:%.*]] = extractelement <16 x i16> [[B]], i64 10
		; SSE-NEXT: [[B11:%.*]] = extractelement <16 x i16> [[B]], i64 11
		; SSE-NEXT: [[B12:%.*]] = extractelement <16 x i16> [[B]], i64 12
		; SSE-NEXT: [[B13:%.*]] = extractelement <16 x i16> [[B]], i64 13
		; SSE-NEXT: [[B14:%.*]] = extractelement <16 x i16> [[B]], i64 14
		; SSE-NEXT: [[B15:%.*]] = extractelement <16 x i16> [[B]], i64 15
		; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
		; SSE-NEXT: [[TMP3:%.*]] = add <4 x i16> [[TMP1]], [[TMP2]]
		; SSE-NEXT: [[R4:%.*]] = add i16 [[B0]], [[B1]]
		; SSE-NEXT: [[R5:%.*]] = add i16 [[B2]], [[B3]]
		; SSE-NEXT: [[R6:%.*]] = add i16 [[B4]], [[B5]]
		; SSE-NEXT: [[R7:%.*]] = add i16 [[B6]], [[B7]]
		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 8, i32 10, i32 12, i32 14>
		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 9, i32 11, i32 13, i32 15>
		; SSE-NEXT: [[TMP6:%.*]] = add <4 x i16> [[TMP4]], [[TMP5]]
		; SSE-NEXT: [[R12:%.*]] = add i16 [[B8]], [[B9]]
		; SSE-NEXT: [[R13:%.*]] = add i16 [[B10]], [[B11]]
		; SSE-NEXT: [[R14:%.*]] = add i16 [[B12]], [[B13]]
		; SSE-NEXT: [[R15:%.*]] = add i16 [[B14]], [[B15]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV4:%.*]] = insertelement <16 x i16> [[TMP7]], i16 [[R4]], i64 4
		; SSE-NEXT: [[RV5:%.*]] = insertelement <16 x i16> [[RV4]], i16 [[R5]], i64 5
		; SSE-NEXT: [[RV6:%.*]] = insertelement <16 x i16> [[RV5]], i16 [[R6]], i64 6
		; SSE-NEXT: [[RV7:%.*]] = insertelement <16 x i16> [[RV6]], i16 [[R7]], i64 7
		; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i16> [[TMP6]], <4 x i16> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV111:%.*]] = shufflevector <16 x i16> [[RV7]], <16 x i16> [[TMP8]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV12:%.*]] = insertelement <16 x i16> [[RV111]], i16 [[R12]], i64 12
		; SSE-NEXT: [[RV13:%.*]] = insertelement <16 x i16> [[RV12]], i16 [[R13]], i64 13
		; SSE-NEXT: [[RV14:%.*]] = insertelement <16 x i16> [[RV13]], i16 [[R14]], i64 14
		; SSE-NEXT: [[RV15:%.*]] = insertelement <16 x i16> [[RV14]], i16 [[R15]], i64 15
		; SSE-NEXT: ret <16 x i16> [[RV15]]
;		;
; SLM-LABEL: @test_v16i16(		; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]		; SLM-NEXT: ret <16 x i16> [[TMP3]]
;		;
; AVX-LABEL: @test_v16i16(		; AVX-LABEL: @test_v16i16(
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i64 3
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i64 0
; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i64 1
; SSE-NEXT: [[R031:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i64 2
; SSE-NEXT: ret <4 x double> [[R031]]		; SSE-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
		; SSE-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
		; SSE-NEXT: [[R1:%.*]] = fadd double [[B0]], [[B1]]
		; SSE-NEXT: [[R2:%.*]] = fadd double [[A2]], [[A3]]
		; SSE-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[R0]], i64 0
		; SSE-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i64 1
		; SSE-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i64 2
		; SSE-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i64 3
		; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i64 2
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i64 3
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i64 0
; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i64 1
; SLM-NEXT: [[R031:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i64 2
; SLM-NEXT: ret <4 x double> [[R031]]		; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
		; SLM-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
		; SLM-NEXT: [[R1:%.*]] = fadd double [[B0]], [[B1]]
		; SLM-NEXT: [[R2:%.*]] = fadd double [[A2]], [[A3]]
		; SLM-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]
		; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[R0]], i64 0
		; SLM-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i64 1
		; SLM-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i64 2
		; SLM-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i64 3
		; SLM-NEXT: ret <4 x double> [[R03]]
;		;
; AVX-LABEL: @test_v4f64(		; AVX-LABEL: @test_v4f64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <4 x double> [[TMP3]]		; AVX-NEXT: ret <4 x double> [[TMP3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
Show All 12 Lines	;
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

; PR50392		; PR50392
define <4 x double> @test_v4f64_partial_swizzle(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64_partial_swizzle(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @test_v4f64_partial_swizzle(		; SSE-LABEL: @test_v4f64_partial_swizzle(
; CHECK-NEXT: [[B2:%.]] = extractelement <4 x double> [[B:%.]], i64 2		; SSE-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3		; SSE-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[B:%.]], <4 x double> poison, <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[B]], <4 x double> poison, <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[R3:%.*]] = fadd double [[B2]], [[B3]]		; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> undef, <4 x i32> <i32 0, i32 poison, i32 1, i32 poison>		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[R0]], i64 0
; CHECK-NEXT: [[R03:%.*]] = insertelement <4 x double> [[TMP4]], double [[R3]], i64 3		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
; CHECK-NEXT: ret <4 x double> [[R03]]		; SSE-NEXT: [[R031:%.*]] = shufflevector <4 x double> [[R00]], <4 x double> [[TMP4]], <4 x i32> <i32 0, i32 poison, i32 4, i32 5>
		; SSE-NEXT: ret <4 x double> [[R031]]
		;
		; SLM-LABEL: @test_v4f64_partial_swizzle(
		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
		; SLM-NEXT: [[R0:%.*]] = fadd double [[A0]], [[A1]]
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[B:%.]], <4 x double> poison, <2 x i32> <i32 1, i32 2>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[B]], <4 x double> poison, <2 x i32> <i32 0, i32 3>
		; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
		; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[R0]], i64 0
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
		; SLM-NEXT: [[R031:%.*]] = shufflevector <4 x double> [[R00]], <4 x double> [[TMP4]], <4 x i32> <i32 0, i32 poison, i32 4, i32 5>
		; SLM-NEXT: ret <4 x double> [[R031]]
;		;
%a0 = extractelement <4 x double> %a, i64 0		%a0 = extractelement <4 x double> %a, i64 0
%a1 = extractelement <4 x double> %a, i64 1		%a1 = extractelement <4 x double> %a, i64 1
%b0 = extractelement <4 x double> %b, i64 0		%b0 = extractelement <4 x double> %b, i64 0
%b1 = extractelement <4 x double> %b, i64 1		%b1 = extractelement <4 x double> %b, i64 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%r0 = fadd double %a0, %a1		%r0 = fadd double %a0, %a1
%r2 = fadd double %b0, %b1		%r2 = fadd double %b0, %b1
%r3 = fadd double %b2, %b3		%r3 = fadd double %b2, %b3
%r00 = insertelement <4 x double> undef, double %r0, i32 0		%r00 = insertelement <4 x double> undef, double %r0, i32 0
%r02 = insertelement <4 x double> %r00, double %r2, i32 2		%r02 = insertelement <4 x double> %r00, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: ret <8 x float> [[TMP3]]
;		;
; SLM-LABEL: @test_v8f32(		; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 0, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: [[TMP4:%.]] = shufflevector <8 x float> [[B:%.]], <8 x float> poison, <2 x i32> <i32 1, i32 2>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 0, i32 3>
; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x float> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R071:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 5, i32 6>
		; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 7>
		; SLM-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[TMP7]], [[TMP8]]
		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 5, i32 6>
		; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 4, i32 7>
		; SLM-NEXT: [[TMP12:%.*]] = fadd <2 x float> [[TMP10]], [[TMP11]]
		; SLM-NEXT: [[R033:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[R052:%.*]] = shufflevector <8 x float> [[R033]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
		; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[R071:%.*]] = shufflevector <8 x float> [[R052]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SLM-NEXT: ret <8 x float> [[R071]]		; SLM-NEXT: ret <8 x float> [[R071]]
;		;
; AVX-LABEL: @test_v8f32(		; AVX-LABEL: @test_v8f32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x float> [[TMP3]]		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	;
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; SSE-NEXT: [[B0:%.]] = extractelement <16 x i16> [[B:%.]], i64 0
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; SSE-NEXT: [[B1:%.*]] = extractelement <16 x i16> [[B]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[B2:%.*]] = extractelement <16 x i16> [[B]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[B3:%.*]] = extractelement <16 x i16> [[B]], i64 3
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[B4:%.*]] = extractelement <16 x i16> [[B]], i64 4
; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[B5:%.*]] = extractelement <16 x i16> [[B]], i64 5
; SSE-NEXT: [[RV151:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[B6:%.*]] = extractelement <16 x i16> [[B]], i64 6
; SSE-NEXT: ret <16 x i16> [[RV151]]		; SSE-NEXT: [[B7:%.*]] = extractelement <16 x i16> [[B]], i64 7
		; SSE-NEXT: [[B8:%.*]] = extractelement <16 x i16> [[B]], i64 8
		; SSE-NEXT: [[B9:%.*]] = extractelement <16 x i16> [[B]], i64 9
		; SSE-NEXT: [[B10:%.*]] = extractelement <16 x i16> [[B]], i64 10
		; SSE-NEXT: [[B11:%.*]] = extractelement <16 x i16> [[B]], i64 11
		; SSE-NEXT: [[B12:%.*]] = extractelement <16 x i16> [[B]], i64 12
		; SSE-NEXT: [[B13:%.*]] = extractelement <16 x i16> [[B]], i64 13
		; SSE-NEXT: [[B14:%.*]] = extractelement <16 x i16> [[B]], i64 14
		; SSE-NEXT: [[B15:%.*]] = extractelement <16 x i16> [[B]], i64 15
		; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
		; SSE-NEXT: [[TMP3:%.*]] = add <4 x i16> [[TMP1]], [[TMP2]]
		; SSE-NEXT: [[R4:%.*]] = add i16 [[B0]], [[B1]]
		; SSE-NEXT: [[R5:%.*]] = add i16 [[B2]], [[B3]]
		; SSE-NEXT: [[R6:%.*]] = add i16 [[B4]], [[B5]]
		; SSE-NEXT: [[R7:%.*]] = add i16 [[B6]], [[B7]]
		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 8, i32 10, i32 12, i32 14>
		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 9, i32 11, i32 13, i32 15>
		; SSE-NEXT: [[TMP6:%.*]] = add <4 x i16> [[TMP4]], [[TMP5]]
		; SSE-NEXT: [[R12:%.*]] = add i16 [[B8]], [[B9]]
		; SSE-NEXT: [[R13:%.*]] = add i16 [[B10]], [[B11]]
		; SSE-NEXT: [[R14:%.*]] = add i16 [[B12]], [[B13]]
		; SSE-NEXT: [[R15:%.*]] = add i16 [[B14]], [[B15]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV4:%.*]] = insertelement <16 x i16> [[TMP7]], i16 [[R4]], i64 4
		; SSE-NEXT: [[RV5:%.*]] = insertelement <16 x i16> [[RV4]], i16 [[R5]], i64 5
		; SSE-NEXT: [[RV6:%.*]] = insertelement <16 x i16> [[RV5]], i16 [[R6]], i64 6
		; SSE-NEXT: [[RV7:%.*]] = insertelement <16 x i16> [[RV6]], i16 [[R7]], i64 7
		; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i16> [[TMP6]], <4 x i16> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV111:%.*]] = shufflevector <16 x i16> [[RV7]], <16 x i16> [[TMP8]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV12:%.*]] = insertelement <16 x i16> [[RV111]], i16 [[R12]], i64 12
		; SSE-NEXT: [[RV13:%.*]] = insertelement <16 x i16> [[RV12]], i16 [[R13]], i64 13
		; SSE-NEXT: [[RV14:%.*]] = insertelement <16 x i16> [[RV13]], i16 [[R14]], i64 14
		; SSE-NEXT: [[RV15:%.*]] = insertelement <16 x i16> [[RV14]], i16 [[R15]], i64 15
		; SSE-NEXT: ret <16 x i16> [[RV15]]
;		;
; SLM-LABEL: @test_v16i16(		; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]		; SLM-NEXT: ret <16 x i16> [[TMP3]]
;		;
; AVX-LABEL: @test_v16i16(		; AVX-LABEL: @test_v16i16(
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hsub-inseltpoison.ll

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i64 3
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i64 0
; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i64 1
; SSE-NEXT: [[R031:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i64 2
; SSE-NEXT: ret <4 x double> [[R031]]		; SSE-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
		; SSE-NEXT: [[R0:%.*]] = fsub double [[A0]], [[A1]]
		; SSE-NEXT: [[R1:%.*]] = fsub double [[B0]], [[B1]]
		; SSE-NEXT: [[R2:%.*]] = fsub double [[A2]], [[A3]]
		; SSE-NEXT: [[R3:%.*]] = fsub double [[B2]], [[B3]]
		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[R0]], i64 0
		; SSE-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i64 1
		; SSE-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i64 2
		; SSE-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i64 3
		; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; SLM-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i64 2
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i64 3
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i64 0
; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i64 1
; SLM-NEXT: [[R031:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i64 2
; SLM-NEXT: ret <4 x double> [[R031]]		; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
		; SLM-NEXT: [[R0:%.*]] = fsub double [[A0]], [[A1]]
		; SLM-NEXT: [[R1:%.*]] = fsub double [[B0]], [[B1]]
		; SLM-NEXT: [[R2:%.*]] = fsub double [[A2]], [[A3]]
		; SLM-NEXT: [[R3:%.*]] = fsub double [[B2]], [[B3]]
		; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> poison, double [[R0]], i64 0
		; SLM-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i64 1
		; SLM-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i64 2
		; SLM-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i64 3
		; SLM-NEXT: ret <4 x double> [[R03]]
;		;
; AVX-LABEL: @test_v4f64(		; AVX-LABEL: @test_v4f64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <4 x double> [[TMP3]]		; AVX-NEXT: ret <4 x double> [[TMP3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
Show All 18 Lines
define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: ret <8 x float> [[TMP3]]
;		;
; SLM-LABEL: @test_v8f32(		; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <2 x i32> <i32 0, i32 2>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fsub <2 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: [[TMP4:%.]] = shufflevector <8 x float> [[B:%.]], <8 x float> poison, <2 x i32> <i32 0, i32 2>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 1, i32 3>
; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x float> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R071:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 6>
		; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 5, i32 7>
		; SLM-NEXT: [[TMP9:%.*]] = fsub <2 x float> [[TMP7]], [[TMP8]]
		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 4, i32 6>
		; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 5, i32 7>
		; SLM-NEXT: [[TMP12:%.*]] = fsub <2 x float> [[TMP10]], [[TMP11]]
		; SLM-NEXT: [[R033:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[R052:%.*]] = shufflevector <8 x float> [[R033]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
		; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[R071:%.*]] = shufflevector <8 x float> [[R052]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SLM-NEXT: ret <8 x float> [[R071]]		; SLM-NEXT: ret <8 x float> [[R071]]
;		;
; AVX-LABEL: @test_v8f32(		; AVX-LABEL: @test_v8f32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x float> [[TMP3]]		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	;
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; SSE-NEXT: [[B0:%.]] = extractelement <16 x i16> [[B:%.]], i64 0
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; SSE-NEXT: [[B1:%.*]] = extractelement <16 x i16> [[B]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[B2:%.*]] = extractelement <16 x i16> [[B]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[B3:%.*]] = extractelement <16 x i16> [[B]], i64 3
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[B4:%.*]] = extractelement <16 x i16> [[B]], i64 4
; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[B5:%.*]] = extractelement <16 x i16> [[B]], i64 5
; SSE-NEXT: [[RV151:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[B6:%.*]] = extractelement <16 x i16> [[B]], i64 6
; SSE-NEXT: ret <16 x i16> [[RV151]]		; SSE-NEXT: [[B7:%.*]] = extractelement <16 x i16> [[B]], i64 7
		; SSE-NEXT: [[B8:%.*]] = extractelement <16 x i16> [[B]], i64 8
		; SSE-NEXT: [[B9:%.*]] = extractelement <16 x i16> [[B]], i64 9
		; SSE-NEXT: [[B10:%.*]] = extractelement <16 x i16> [[B]], i64 10
		; SSE-NEXT: [[B11:%.*]] = extractelement <16 x i16> [[B]], i64 11
		; SSE-NEXT: [[B12:%.*]] = extractelement <16 x i16> [[B]], i64 12
		; SSE-NEXT: [[B13:%.*]] = extractelement <16 x i16> [[B]], i64 13
		; SSE-NEXT: [[B14:%.*]] = extractelement <16 x i16> [[B]], i64 14
		; SSE-NEXT: [[B15:%.*]] = extractelement <16 x i16> [[B]], i64 15
		; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
		; SSE-NEXT: [[TMP3:%.*]] = sub <4 x i16> [[TMP1]], [[TMP2]]
		; SSE-NEXT: [[R4:%.*]] = sub i16 [[B0]], [[B1]]
		; SSE-NEXT: [[R5:%.*]] = sub i16 [[B2]], [[B3]]
		; SSE-NEXT: [[R6:%.*]] = sub i16 [[B4]], [[B5]]
		; SSE-NEXT: [[R7:%.*]] = sub i16 [[B6]], [[B7]]
		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 8, i32 10, i32 12, i32 14>
		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 9, i32 11, i32 13, i32 15>
		; SSE-NEXT: [[TMP6:%.*]] = sub <4 x i16> [[TMP4]], [[TMP5]]
		; SSE-NEXT: [[R12:%.*]] = sub i16 [[B8]], [[B9]]
		; SSE-NEXT: [[R13:%.*]] = sub i16 [[B10]], [[B11]]
		; SSE-NEXT: [[R14:%.*]] = sub i16 [[B12]], [[B13]]
		; SSE-NEXT: [[R15:%.*]] = sub i16 [[B14]], [[B15]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV4:%.*]] = insertelement <16 x i16> [[TMP7]], i16 [[R4]], i64 4
		; SSE-NEXT: [[RV5:%.*]] = insertelement <16 x i16> [[RV4]], i16 [[R5]], i64 5
		; SSE-NEXT: [[RV6:%.*]] = insertelement <16 x i16> [[RV5]], i16 [[R6]], i64 6
		; SSE-NEXT: [[RV7:%.*]] = insertelement <16 x i16> [[RV6]], i16 [[R7]], i64 7
		; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i16> [[TMP6]], <4 x i16> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV111:%.*]] = shufflevector <16 x i16> [[RV7]], <16 x i16> [[TMP8]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV12:%.*]] = insertelement <16 x i16> [[RV111]], i16 [[R12]], i64 12
		; SSE-NEXT: [[RV13:%.*]] = insertelement <16 x i16> [[RV12]], i16 [[R13]], i64 13
		; SSE-NEXT: [[RV14:%.*]] = insertelement <16 x i16> [[RV13]], i16 [[R14]], i64 14
		; SSE-NEXT: [[RV15:%.*]] = insertelement <16 x i16> [[RV14]], i16 [[R15]], i64 15
		; SSE-NEXT: ret <16 x i16> [[RV15]]
;		;
; SLM-LABEL: @test_v16i16(		; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]		; SLM-NEXT: ret <16 x i16> [[TMP3]]
;		;
; AVX-LABEL: @test_v16i16(		; AVX-LABEL: @test_v16i16(
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i64 3
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SSE-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i64 0
; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i64 1
; SSE-NEXT: [[R031:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i64 2
; SSE-NEXT: ret <4 x double> [[R031]]		; SSE-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
		; SSE-NEXT: [[R0:%.*]] = fsub double [[A0]], [[A1]]
		; SSE-NEXT: [[R1:%.*]] = fsub double [[B0]], [[B1]]
		; SSE-NEXT: [[R2:%.*]] = fsub double [[A2]], [[A3]]
		; SSE-NEXT: [[R3:%.*]] = fsub double [[B2]], [[B3]]
		; SSE-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[R0]], i64 0
		; SSE-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i64 1
		; SSE-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i64 2
		; SSE-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i64 3
		; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[A0:%.]] = extractelement <4 x double> [[A:%.]], i64 0
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x double> [[A]], i64 1
; SLM-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x double> [[A]], i64 2
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x double> [[A]], i64 3
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SLM-NEXT: [[B0:%.]] = extractelement <4 x double> [[B:%.]], i64 0
; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[B1:%.*]] = extractelement <4 x double> [[B]], i64 1
; SLM-NEXT: [[R031:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[B2:%.*]] = extractelement <4 x double> [[B]], i64 2
; SLM-NEXT: ret <4 x double> [[R031]]		; SLM-NEXT: [[B3:%.*]] = extractelement <4 x double> [[B]], i64 3
		; SLM-NEXT: [[R0:%.*]] = fsub double [[A0]], [[A1]]
		; SLM-NEXT: [[R1:%.*]] = fsub double [[B0]], [[B1]]
		; SLM-NEXT: [[R2:%.*]] = fsub double [[A2]], [[A3]]
		; SLM-NEXT: [[R3:%.*]] = fsub double [[B2]], [[B3]]
		; SLM-NEXT: [[R00:%.*]] = insertelement <4 x double> undef, double [[R0]], i64 0
		; SLM-NEXT: [[R01:%.*]] = insertelement <4 x double> [[R00]], double [[R1]], i64 1
		; SLM-NEXT: [[R02:%.*]] = insertelement <4 x double> [[R01]], double [[R2]], i64 2
		; SLM-NEXT: [[R03:%.*]] = insertelement <4 x double> [[R02]], double [[R3]], i64 3
		; SLM-NEXT: ret <4 x double> [[R03]]
;		;
; AVX-LABEL: @test_v4f64(		; AVX-LABEL: @test_v4f64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <4 x double> [[TMP3]]		; AVX-NEXT: ret <4 x double> [[TMP3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
Show All 18 Lines
define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: ret <8 x float> [[TMP3]]
;		;
; SLM-LABEL: @test_v8f32(		; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <2 x i32> <i32 0, i32 2>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 1, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fsub <2 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; SLM-NEXT: [[TMP4:%.]] = shufflevector <8 x float> [[B:%.]], <8 x float> poison, <2 x i32> <i32 0, i32 2>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 1, i32 3>
; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x float> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R071:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 4, i32 6>
		; SLM-NEXT: [[TMP8:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <2 x i32> <i32 5, i32 7>
		; SLM-NEXT: [[TMP9:%.*]] = fsub <2 x float> [[TMP7]], [[TMP8]]
		; SLM-NEXT: [[TMP10:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 4, i32 6>
		; SLM-NEXT: [[TMP11:%.*]] = shufflevector <8 x float> [[B]], <8 x float> poison, <2 x i32> <i32 5, i32 7>
		; SLM-NEXT: [[TMP12:%.*]] = fsub <2 x float> [[TMP10]], [[TMP11]]
		; SLM-NEXT: [[R033:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP9]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[R052:%.*]] = shufflevector <8 x float> [[R033]], <8 x float> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
		; SLM-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SLM-NEXT: [[R071:%.*]] = shufflevector <8 x float> [[R052]], <8 x float> [[TMP14]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>
; SLM-NEXT: ret <8 x float> [[R071]]		; SLM-NEXT: ret <8 x float> [[R071]]
;		;
; AVX-LABEL: @test_v8f32(		; AVX-LABEL: @test_v8f32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]		; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x float> [[TMP3]]		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	;
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; SSE-NEXT: [[B0:%.]] = extractelement <16 x i16> [[B:%.]], i64 0
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; SSE-NEXT: [[B1:%.*]] = extractelement <16 x i16> [[B]], i64 1
; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[B2:%.*]] = extractelement <16 x i16> [[B]], i64 2
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[B3:%.*]] = extractelement <16 x i16> [[B]], i64 3
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[B4:%.*]] = extractelement <16 x i16> [[B]], i64 4
; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]		; SSE-NEXT: [[B5:%.*]] = extractelement <16 x i16> [[B]], i64 5
; SSE-NEXT: [[RV151:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; SSE-NEXT: [[B6:%.*]] = extractelement <16 x i16> [[B]], i64 6
; SSE-NEXT: ret <16 x i16> [[RV151]]		; SSE-NEXT: [[B7:%.*]] = extractelement <16 x i16> [[B]], i64 7
		; SSE-NEXT: [[B8:%.*]] = extractelement <16 x i16> [[B]], i64 8
		; SSE-NEXT: [[B9:%.*]] = extractelement <16 x i16> [[B]], i64 9
		; SSE-NEXT: [[B10:%.*]] = extractelement <16 x i16> [[B]], i64 10
		; SSE-NEXT: [[B11:%.*]] = extractelement <16 x i16> [[B]], i64 11
		; SSE-NEXT: [[B12:%.*]] = extractelement <16 x i16> [[B]], i64 12
		; SSE-NEXT: [[B13:%.*]] = extractelement <16 x i16> [[B]], i64 13
		; SSE-NEXT: [[B14:%.*]] = extractelement <16 x i16> [[B]], i64 14
		; SSE-NEXT: [[B15:%.*]] = extractelement <16 x i16> [[B]], i64 15
		; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
		; SSE-NEXT: [[TMP3:%.*]] = sub <4 x i16> [[TMP1]], [[TMP2]]
		; SSE-NEXT: [[R4:%.*]] = sub i16 [[B0]], [[B1]]
		; SSE-NEXT: [[R5:%.*]] = sub i16 [[B2]], [[B3]]
		; SSE-NEXT: [[R6:%.*]] = sub i16 [[B4]], [[B5]]
		; SSE-NEXT: [[R7:%.*]] = sub i16 [[B6]], [[B7]]
		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 8, i32 10, i32 12, i32 14>
		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> poison, <4 x i32> <i32 9, i32 11, i32 13, i32 15>
		; SSE-NEXT: [[TMP6:%.*]] = sub <4 x i16> [[TMP4]], [[TMP5]]
		; SSE-NEXT: [[R12:%.*]] = sub i16 [[B8]], [[B9]]
		; SSE-NEXT: [[R13:%.*]] = sub i16 [[B10]], [[B11]]
		; SSE-NEXT: [[R14:%.*]] = sub i16 [[B12]], [[B13]]
		; SSE-NEXT: [[R15:%.*]] = sub i16 [[B14]], [[B15]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV4:%.*]] = insertelement <16 x i16> [[TMP7]], i16 [[R4]], i64 4
		; SSE-NEXT: [[RV5:%.*]] = insertelement <16 x i16> [[RV4]], i16 [[R5]], i64 5
		; SSE-NEXT: [[RV6:%.*]] = insertelement <16 x i16> [[RV5]], i16 [[R6]], i64 6
		; SSE-NEXT: [[RV7:%.*]] = insertelement <16 x i16> [[RV6]], i16 [[R7]], i64 7
		; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i16> [[TMP6]], <4 x i16> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV111:%.*]] = shufflevector <16 x i16> [[RV7]], <16 x i16> [[TMP8]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 poison, i32 poison, i32 poison, i32 poison>
		; SSE-NEXT: [[RV12:%.*]] = insertelement <16 x i16> [[RV111]], i16 [[R12]], i64 12
		; SSE-NEXT: [[RV13:%.*]] = insertelement <16 x i16> [[RV12]], i16 [[R13]], i64 13
		; SSE-NEXT: [[RV14:%.*]] = insertelement <16 x i16> [[RV13]], i16 [[R14]], i64 14
		; SSE-NEXT: [[RV15:%.*]] = insertelement <16 x i16> [[RV14]], i16 [[R15]], i64 15
		; SSE-NEXT: ret <16 x i16> [[RV15]]
;		;
; SLM-LABEL: @test_v16i16(		; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]		; SLM-NEXT: ret <16 x i16> [[TMP3]]
;		;
; AVX-LABEL: @test_v16i16(		; AVX-LABEL: @test_v16i16(
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-transpose.ll

	Show All 22 Lines
	; SSE2-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])			; SSE2-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; SSE2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP0]])			; SSE2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP0]])
	; SSE2-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP2]], [[TMP3]]			; SSE2-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP2]], [[TMP3]]
	; SSE2-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]			; SSE2-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]
	; SSE2-NEXT: ret i32 [[OP_RDX1]]			; SSE2-NEXT: ret i32 [[OP_RDX1]]
	;			;
	; SSE42-LABEL: @reduce_and4(			; SSE42-LABEL: @reduce_and4(
	; SSE42-NEXT: entry:			; SSE42-NEXT: entry:
	; SSE42-NEXT: [[TMP0:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V4:%.]])			; SSE42-NEXT: [[TMP0:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>
	; SSE42-NEXT: [[TMP1:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V3:%.]])			; SSE42-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>
	; SSE42-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP0]], [[TMP1]]			; SSE42-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; SSE42-NEXT: [[TMP2:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V2:%.]])			; SSE42-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP0]])
	; SSE42-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP2]]			; SSE42-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP2]], [[TMP3]]
	; SSE42-NEXT: [[TMP3:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V1:%.]])			; SSE42-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]
	; SSE42-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP3]]			; SSE42-NEXT: ret i32 [[OP_RDX1]]
	; SSE42-NEXT: [[OP_RDX3:%.]] = and i32 [[OP_RDX2]], [[ACC:%.]]
	; SSE42-NEXT: ret i32 [[OP_RDX3]]
	;			;
	; AVX-LABEL: @reduce_and4(			; AVX-LABEL: @reduce_and4(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[TMP0:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>			; AVX-NEXT: [[TMP0:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 1, i32 0, i32 2, i32 3, i32 5, i32 4, i32 6, i32 7>
	; AVX-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])			; AVX-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP0]])			; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP0]])
	; AVX-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP2]], [[TMP3]]			; AVX-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP2]], [[TMP3]]
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>			; SSE2-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	; SSE2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP2]])			; SSE2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP2]])
	; SSE2-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])			; SSE2-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; SSE2-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP3]], [[TMP4]]			; SSE2-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP3]], [[TMP4]]
	; SSE2-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]			; SSE2-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]
	; SSE2-NEXT: ret i32 [[OP_RDX1]]			; SSE2-NEXT: ret i32 [[OP_RDX1]]
	;			;
	; SSE42-LABEL: @reduce_and4_transpose(			; SSE42-LABEL: @reduce_and4_transpose(
	; SSE42-NEXT: [[TMP1:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V4:%.]])			; SSE42-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	; SSE42-NEXT: [[TMP2:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V3:%.]])			; SSE42-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	; SSE42-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP1]], [[TMP2]]			; SSE42-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP2]])
	; SSE42-NEXT: [[TMP3:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V2:%.]])			; SSE42-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; SSE42-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP3]]			; SSE42-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP3]], [[TMP4]]
	; SSE42-NEXT: [[TMP4:%.]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[V1:%.]])			; SSE42-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]
	; SSE42-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP4]]			; SSE42-NEXT: ret i32 [[OP_RDX1]]
	; SSE42-NEXT: [[OP_RDX3:%.]] = and i32 [[OP_RDX2]], [[ACC:%.]]
	; SSE42-NEXT: ret i32 [[OP_RDX3]]
	;			;
	; AVX-LABEL: @reduce_and4_transpose(			; AVX-LABEL: @reduce_and4_transpose(
	; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>			; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	; AVX-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>			; AVX-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V4:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP2]])			; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP2]])
	; AVX-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])			; AVX-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP1]])
	; AVX-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP3]], [[TMP4]]			; AVX-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP3]], [[TMP4]]
	; AVX-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]			; AVX-NEXT: [[OP_RDX1:%.]] = and i32 [[OP_RDX]], [[ACC:%.]]
	Show All 36 Lines

llvm/unittests/IR/InstructionsTest.cpp

Show First 20 Lines • Show All 1,018 Lines • ▼ Show 20 Lines	TEST(InstructionsTest, ShuffleMaskQueries) {
Constant *C2 = ConstantInt::get(Int32Ty, 2);		Constant *C2 = ConstantInt::get(Int32Ty, 2);
Constant *C3 = ConstantInt::get(Int32Ty, 3);		Constant *C3 = ConstantInt::get(Int32Ty, 3);
Constant *C4 = ConstantInt::get(Int32Ty, 4);		Constant *C4 = ConstantInt::get(Int32Ty, 4);
Constant *C5 = ConstantInt::get(Int32Ty, 5);		Constant *C5 = ConstantInt::get(Int32Ty, 5);
Constant *C6 = ConstantInt::get(Int32Ty, 6);		Constant *C6 = ConstantInt::get(Int32Ty, 6);
Constant *C7 = ConstantInt::get(Int32Ty, 7);		Constant *C7 = ConstantInt::get(Int32Ty, 7);

Constant *Identity = ConstantVector::get({C0, CU, C2, C3, C4});		Constant *Identity = ConstantVector::get({C0, CU, C2, C3, C4});
EXPECT_TRUE(ShuffleVectorInst::isIdentityMask(Identity));		EXPECT_TRUE(ShuffleVectorInst::isIdentityMask(
EXPECT_FALSE(ShuffleVectorInst::isSelectMask(Identity)); // identity is distinguished from select		Identity, cast<FixedVectorType>(Identity->getType())->getNumElements()));
EXPECT_FALSE(ShuffleVectorInst::isReverseMask(Identity));		EXPECT_FALSE(ShuffleVectorInst::isSelectMask(
EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(Identity)); // identity is always single source		Identity,
EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(Identity));		cast<FixedVectorType>(Identity->getType())
EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(Identity));		->getNumElements())); // identity is distinguished from select
		EXPECT_FALSE(ShuffleVectorInst::isReverseMask(
		Identity, cast<FixedVectorType>(Identity->getType())->getNumElements()));
		EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(
		Identity, cast<FixedVectorType>(Identity->getType())
		->getNumElements())); // identity is always single source
		EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(
		Identity, cast<FixedVectorType>(Identity->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(
		Identity, cast<FixedVectorType>(Identity->getType())->getNumElements()));

Constant *Select = ConstantVector::get({CU, C1, C5});		Constant *Select = ConstantVector::get({CU, C1, C5});
EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(Select));		EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(
EXPECT_TRUE(ShuffleVectorInst::isSelectMask(Select));		Select, cast<FixedVectorType>(Select->getType())->getNumElements()));
EXPECT_FALSE(ShuffleVectorInst::isReverseMask(Select));		EXPECT_TRUE(ShuffleVectorInst::isSelectMask(
EXPECT_FALSE(ShuffleVectorInst::isSingleSourceMask(Select));		Select, cast<FixedVectorType>(Select->getType())->getNumElements()));
EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(Select));		EXPECT_FALSE(ShuffleVectorInst::isReverseMask(
EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(Select));		Select, cast<FixedVectorType>(Select->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isSingleSourceMask(
		Select, cast<FixedVectorType>(Select->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(
		Select, cast<FixedVectorType>(Select->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(
		Select, cast<FixedVectorType>(Select->getType())->getNumElements()));

Constant *Reverse = ConstantVector::get({C3, C2, C1, CU});		Constant *Reverse = ConstantVector::get({C3, C2, C1, CU});
EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(Reverse));		EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(
EXPECT_FALSE(ShuffleVectorInst::isSelectMask(Reverse));		Reverse, cast<FixedVectorType>(Reverse->getType())->getNumElements()));
EXPECT_TRUE(ShuffleVectorInst::isReverseMask(Reverse));		EXPECT_FALSE(ShuffleVectorInst::isSelectMask(
EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(Reverse)); // reverse is always single source		Reverse, cast<FixedVectorType>(Reverse->getType())->getNumElements()));
EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(Reverse));		EXPECT_TRUE(ShuffleVectorInst::isReverseMask(
EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(Reverse));		Reverse, cast<FixedVectorType>(Reverse->getType())->getNumElements()));
		EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(
		Reverse, cast<FixedVectorType>(Reverse->getType())
		->getNumElements())); // reverse is always single source
		EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(
		Reverse, cast<FixedVectorType>(Reverse->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(
		Reverse, cast<FixedVectorType>(Reverse->getType())->getNumElements()));

Constant *SingleSource = ConstantVector::get({C2, C2, C0, CU});		Constant *SingleSource = ConstantVector::get({C2, C2, C0, CU});
EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(SingleSource));		EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(
EXPECT_FALSE(ShuffleVectorInst::isSelectMask(SingleSource));		SingleSource,
EXPECT_FALSE(ShuffleVectorInst::isReverseMask(SingleSource));		cast<FixedVectorType>(SingleSource->getType())->getNumElements()));
EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(SingleSource));		EXPECT_FALSE(ShuffleVectorInst::isSelectMask(
EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(SingleSource));		SingleSource,
EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(SingleSource));		cast<FixedVectorType>(SingleSource->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isReverseMask(
		SingleSource,
		cast<FixedVectorType>(SingleSource->getType())->getNumElements()));
		EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(
		SingleSource,
		cast<FixedVectorType>(SingleSource->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(
		SingleSource,
		cast<FixedVectorType>(SingleSource->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(
		SingleSource,
		cast<FixedVectorType>(SingleSource->getType())->getNumElements()));

Constant *ZeroEltSplat = ConstantVector::get({C0, C0, CU, C0});		Constant *ZeroEltSplat = ConstantVector::get({C0, C0, CU, C0});
EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(ZeroEltSplat));		EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(
EXPECT_FALSE(ShuffleVectorInst::isSelectMask(ZeroEltSplat));		ZeroEltSplat,
EXPECT_FALSE(ShuffleVectorInst::isReverseMask(ZeroEltSplat));		cast<FixedVectorType>(ZeroEltSplat->getType())->getNumElements()));
EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(ZeroEltSplat)); // 0-splat is always single source		EXPECT_FALSE(ShuffleVectorInst::isSelectMask(
EXPECT_TRUE(ShuffleVectorInst::isZeroEltSplatMask(ZeroEltSplat));		ZeroEltSplat,
EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(ZeroEltSplat));		cast<FixedVectorType>(ZeroEltSplat->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isReverseMask(
		ZeroEltSplat,
		cast<FixedVectorType>(ZeroEltSplat->getType())->getNumElements()));
		EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(
		ZeroEltSplat, cast<FixedVectorType>(ZeroEltSplat->getType())
		->getNumElements())); // 0-splat is always single source
		EXPECT_TRUE(ShuffleVectorInst::isZeroEltSplatMask(
		ZeroEltSplat,
		cast<FixedVectorType>(ZeroEltSplat->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isTransposeMask(
		ZeroEltSplat,
		cast<FixedVectorType>(ZeroEltSplat->getType())->getNumElements()));

Constant *Transpose = ConstantVector::get({C0, C4, C2, C6});		Constant *Transpose = ConstantVector::get({C0, C4, C2, C6});
EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(Transpose));		EXPECT_FALSE(ShuffleVectorInst::isIdentityMask(
EXPECT_FALSE(ShuffleVectorInst::isSelectMask(Transpose));		Transpose,
EXPECT_FALSE(ShuffleVectorInst::isReverseMask(Transpose));		cast<FixedVectorType>(Transpose->getType())->getNumElements()));
EXPECT_FALSE(ShuffleVectorInst::isSingleSourceMask(Transpose));		EXPECT_FALSE(ShuffleVectorInst::isSelectMask(
EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(Transpose));		Transpose,
EXPECT_TRUE(ShuffleVectorInst::isTransposeMask(Transpose));		cast<FixedVectorType>(Transpose->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isReverseMask(
		Transpose,
		cast<FixedVectorType>(Transpose->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isSingleSourceMask(
		Transpose,
		cast<FixedVectorType>(Transpose->getType())->getNumElements()));
		EXPECT_FALSE(ShuffleVectorInst::isZeroEltSplatMask(
		Transpose,
		cast<FixedVectorType>(Transpose->getType())->getNumElements()));
		EXPECT_TRUE(ShuffleVectorInst::isTransposeMask(
		Transpose,
		cast<FixedVectorType>(Transpose->getType())->getNumElements()));

// More tests to make sure the logic is/stays correct...		// More tests to make sure the logic is/stays correct...
EXPECT_TRUE(ShuffleVectorInst::isIdentityMask(ConstantVector::get({CU, C1, CU, C3})));		EXPECT_TRUE(ShuffleVectorInst::isIdentityMask(
EXPECT_TRUE(ShuffleVectorInst::isIdentityMask(ConstantVector::get({C4, CU, C6, CU})));		ConstantVector::get({CU, C1, CU, C3}), 4));
		EXPECT_TRUE(ShuffleVectorInst::isIdentityMask(
EXPECT_TRUE(ShuffleVectorInst::isSelectMask(ConstantVector::get({C4, C1, C6, CU})));		ConstantVector::get({C4, CU, C6, CU}), 4));
EXPECT_TRUE(ShuffleVectorInst::isSelectMask(ConstantVector::get({CU, C1, C6, C3})));
		EXPECT_TRUE(ShuffleVectorInst::isSelectMask(
EXPECT_TRUE(ShuffleVectorInst::isReverseMask(ConstantVector::get({C7, C6, CU, C4})));		ConstantVector::get({C4, C1, C6, CU}), 4));
EXPECT_TRUE(ShuffleVectorInst::isReverseMask(ConstantVector::get({C3, CU, C1, CU})));		EXPECT_TRUE(ShuffleVectorInst::isSelectMask(
		ConstantVector::get({CU, C1, C6, C3}), 4));
EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(ConstantVector::get({C7, C5, CU, C7})));
EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(ConstantVector::get({C3, C0, CU, C3})));		EXPECT_TRUE(ShuffleVectorInst::isReverseMask(
		ConstantVector::get({C7, C6, CU, C4}), 4));
EXPECT_TRUE(ShuffleVectorInst::isZeroEltSplatMask(ConstantVector::get({C4, CU, CU, C4})));		EXPECT_TRUE(ShuffleVectorInst::isReverseMask(
EXPECT_TRUE(ShuffleVectorInst::isZeroEltSplatMask(ConstantVector::get({CU, C0, CU, C0})));		ConstantVector::get({C3, CU, C1, CU}), 4));

EXPECT_TRUE(ShuffleVectorInst::isTransposeMask(ConstantVector::get({C1, C5, C3, C7})));		EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(
EXPECT_TRUE(ShuffleVectorInst::isTransposeMask(ConstantVector::get({C1, C3})));		ConstantVector::get({C7, C5, CU, C7}), 4));
		EXPECT_TRUE(ShuffleVectorInst::isSingleSourceMask(
		ConstantVector::get({C3, C0, CU, C3}), 4));

		EXPECT_TRUE(ShuffleVectorInst::isZeroEltSplatMask(
		ConstantVector::get({C4, CU, CU, C4}), 4));
		EXPECT_TRUE(ShuffleVectorInst::isZeroEltSplatMask(
		ConstantVector::get({CU, C0, CU, C0}), 4));

		EXPECT_TRUE(ShuffleVectorInst::isTransposeMask(
		ConstantVector::get({C1, C5, C3, C7}), 4));
		EXPECT_TRUE(
		ShuffleVectorInst::isTransposeMask(ConstantVector::get({C1, C3}), 2));

// Nothing special about the values here - just re-using inputs to reduce code.		// Nothing special about the values here - just re-using inputs to reduce code.
Constant *V0 = ConstantVector::get({C0, C1, C2, C3});		Constant *V0 = ConstantVector::get({C0, C1, C2, C3});
Constant *V1 = ConstantVector::get({C3, C2, C1, C0});		Constant *V1 = ConstantVector::get({C3, C2, C1, C0});

// Identity with undef elts.		// Identity with undef elts.
ShuffleVectorInst *Id1 = new ShuffleVectorInst(V0, V1,		ShuffleVectorInst *Id1 = new ShuffleVectorInst(V0, V1,
ConstantVector::get({C0, C1, CU, CU}));		ConstantVector::get({C0, C1, CU, CU}));
▲ Show 20 Lines • Show All 629 Lines • Show Last 20 Lines

llvm/unittests/IR/ShuffleVectorInstTest.cpp

	//===- llvm/unittest/IR/ShuffleVectorInstTest.cpp - Shuffle unit tests ----===//			//===- llvm/unittest/IR/ShuffleVectorInstTest.cpp - Shuffle unit tests ----===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/IR/Instructions.h"			#include "llvm/IR/Instructions.h"
	#include "gtest/gtest.h"			#include "gtest/gtest.h"

	using namespace llvm;			using namespace llvm;

	namespace {			namespace {
				RKSimonUnsubmitted Not Done Reply Inline Actions All of these tests need support for length changing shuffles RKSimon: All of these tests need support for length changing shuffles

	TEST(ShuffleVectorInst, isIdentityMask) {			TEST(ShuffleVectorInst, isIdentityMask) {
	ASSERT_TRUE(ShuffleVectorInst::isIdentityMask({0, 1, 2, 3}));			ASSERT_TRUE(ShuffleVectorInst::isIdentityMask({0, 1, 2, 3}, 4));
	ASSERT_TRUE(ShuffleVectorInst::isIdentityMask({0, 1, 2, 3, -1}));			ASSERT_TRUE(ShuffleVectorInst::isIdentityMask({0, 1, 2, 3, -1}, 5));
	ASSERT_TRUE(ShuffleVectorInst::isIdentityMask({0, 1, -1, 3}));			ASSERT_TRUE(ShuffleVectorInst::isIdentityMask({0, 1, -1, 3}, 4));

	ASSERT_FALSE(ShuffleVectorInst::isIdentityMask({0, 1, 2, 4}));			ASSERT_FALSE(ShuffleVectorInst::isIdentityMask({0, 1, 2, 4}, 4));
	ASSERT_FALSE(ShuffleVectorInst::isIdentityMask({0, -1, 2, 4}));			ASSERT_FALSE(ShuffleVectorInst::isIdentityMask({0, -1, 2, 4}, 4));
	}			}

	TEST(ShuffleVectorInst, isSelectMask) {			TEST(ShuffleVectorInst, isSelectMask) {
	ASSERT_TRUE(ShuffleVectorInst::isSelectMask({0, 5, 6, 3}));			ASSERT_TRUE(ShuffleVectorInst::isSelectMask({0, 5, 6, 3}, 4));

	ASSERT_FALSE(ShuffleVectorInst::isSelectMask({0, 1, 2, 3}));			ASSERT_FALSE(ShuffleVectorInst::isSelectMask({0, 1, 2, 3}, 4));
	}			}

	TEST(ShuffleVectorInst, isReverseMask) {			TEST(ShuffleVectorInst, isReverseMask) {
	ASSERT_TRUE(ShuffleVectorInst::isReverseMask({3, 2, 1, 0}));			ASSERT_TRUE(ShuffleVectorInst::isReverseMask({3, 2, 1, 0}, 4));
	ASSERT_TRUE(ShuffleVectorInst::isReverseMask({-1, -1, 1, 0}));			ASSERT_TRUE(ShuffleVectorInst::isReverseMask({-1, -1, 1, 0}, 4));

	ASSERT_FALSE(ShuffleVectorInst::isReverseMask({4, 3, 2, 1}));			ASSERT_FALSE(ShuffleVectorInst::isReverseMask({4, 3, 2, 1}, 4));
	}			}

	TEST(ShuffleVectorInst, isZeroEltSplatMask) {			TEST(ShuffleVectorInst, isZeroEltSplatMask) {
	ASSERT_TRUE(ShuffleVectorInst::isZeroEltSplatMask({0, 0, 0, 0}));			ASSERT_TRUE(ShuffleVectorInst::isZeroEltSplatMask({0, 0, 0, 0}, 4));
	ASSERT_TRUE(ShuffleVectorInst::isZeroEltSplatMask({0, -1, 0, -1}));			ASSERT_TRUE(ShuffleVectorInst::isZeroEltSplatMask({0, -1, 0, -1}, 4));

	ASSERT_FALSE(ShuffleVectorInst::isZeroEltSplatMask({1, 1, 1, 1}));			ASSERT_FALSE(ShuffleVectorInst::isZeroEltSplatMask({1, 1, 1, 1}, 4));
	}			}

	TEST(ShuffleVectorInst, isTransposeMask) {			TEST(ShuffleVectorInst, isTransposeMask) {
	ASSERT_TRUE(ShuffleVectorInst::isTransposeMask({0, 4, 2, 6}));			ASSERT_TRUE(ShuffleVectorInst::isTransposeMask({0, 4, 2, 6}, 4));
	ASSERT_TRUE(ShuffleVectorInst::isTransposeMask({1, 5, 3, 7}));			ASSERT_TRUE(ShuffleVectorInst::isTransposeMask({1, 5, 3, 7}, 4));

	ASSERT_FALSE(ShuffleVectorInst::isTransposeMask({2, 6, 4, 8}));			ASSERT_FALSE(ShuffleVectorInst::isTransposeMask({2, 6, 4, 8}, 4));
	}			}

	TEST(ShuffleVectorInst, isSpliceMask) {			TEST(ShuffleVectorInst, isSpliceMask) {
	int Index;			int Index;

	ASSERT_TRUE(ShuffleVectorInst::isSpliceMask({0, 1, 2, 3}, Index));			ASSERT_TRUE(ShuffleVectorInst::isSpliceMask({0, 1, 2, 3}, 4, Index));
	ASSERT_EQ(0, Index);			ASSERT_EQ(0, Index);

	ASSERT_TRUE(ShuffleVectorInst::isSpliceMask({1, 2, 3, 4, 5, 6, 7}, Index));			ASSERT_TRUE(ShuffleVectorInst::isSpliceMask({1, 2, 3, 4, 5, 6, 7}, 7, Index));
	ASSERT_EQ(1, Index);			ASSERT_EQ(1, Index);

	ASSERT_FALSE(ShuffleVectorInst::isSpliceMask({4, 5, 6, 7}, Index));			ASSERT_FALSE(ShuffleVectorInst::isSpliceMask({4, 5, 6, 7}, 4, Index));
	}			}

	TEST(ShuffleVectorInst, isExtractSubvectorMask) {			TEST(ShuffleVectorInst, isExtractSubvectorMask) {
	int Index;			int Index;

	ASSERT_TRUE(			ASSERT_TRUE(
	ShuffleVectorInst::isExtractSubvectorMask({0, 1, 2, 3}, 8, Index));			ShuffleVectorInst::isExtractSubvectorMask({0, 1, 2, 3}, 8, Index));
	ASSERT_EQ(0, Index);			ASSERT_EQ(0, Index);
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 552154

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/IR/Instructions.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/IR/Instructions.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/crash_extract_subvector_cost.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/phi-result-use-order.ll

llvm/test/Transforms/SLPVectorizer/RISCV/math-function.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-calls.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

llvm/test/Transforms/SLPVectorizer/X86/hsub-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-transpose.ll

llvm/unittests/IR/InstructionsTest.cpp

llvm/unittests/IR/ShuffleVectorInstTest.cpp

[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
ClosedPublic