This is an archive of the discontinued LLVM Phabricator instance.

I'm not certain I've entirely following this, but it looks like its just trying to account for some weaknesses in getScalarizationOverhead()? Might we be better off trying to improve that?

In D115462#3484627, @RKSimon wrote:

I'm not certain I've entirely following this, but it looks like its just trying to account for some weaknesses in getScalarizationOverhead()? Might we be better off trying to improve that?

Actually, this is a bug fix. We're not quite correct in some cases, the patch fixesbthe cost of some of the shuffles.

Rebase

Harbormaster completed remote builds in B166087: Diff 431724.May 24 2022, 11:53 AM

LGTM

This revision is now accepted and ready to land.May 26 2022, 7:05 AM

dmgreen added a subscriber: dmgreen.May 26 2022, 8:17 AM

dmgreen added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6079	I'm not sure I understand why this would be a SK_Select. That is a bit of a X86 special as far as I understand and doesn't always correlate well to other architectures. Why is the Mask missing too? That might be enough to help avoid the regressions if it was re-added.
llvm/test/Transforms/SLPVectorizer/AArch64/loadorder.ll
1335 ↗	(On Diff #431724)	This seems worse I'm afraid - I don't think it should be keeping all these individual loads that are inserted. The insert_subvector cost should be low enough for them to be profitable to vectorize undef AArch64 - they are just a s register load.

ABataev added inline comments.May 26 2022, 8:35 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6079	It is a permuatation of 2 sub-vectors: the root of the buildvector and a subvector after the vectorization. Since it was a buildvector, the compiler selects elements from the root and corresponding elements from the resulting vector. Mask is not required, if TTI::SK_Select is used, mask is used only with SK_PermuteSingleSrc and SK_PermuteTwoSrc. But I'll check it.
llvm/test/Transforms/SLPVectorizer/AArch64/loadorder.ll
1335 ↗	(On Diff #431724)	I'll try to improve cost estimation there with insert_subvector cost.

dmgreen added inline comments.May 26 2022, 8:48 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6079	AArch64 (and most other architectures AFAIU) do not have SK_Select shuffles, so is not a lot better than SK_PermuteTwoSrc. A Mask can help to improve the cost though, if the backend can come up with something more accurate for it. I'm surprised this is not a SK_InsertSubvector with adjacent elements though - that seems like the most natural fit, unless I'm missing how this works.

reopen to address @dmgreen's comments

This revision now requires changes to proceed.May 26 2022, 9:20 AM

ABataev added inline comments.May 26 2022, 9:20 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6079	Yep, you right, it must be an InserSubvector kind, changed it to Select because some cost for InsertSubvector were not implemented.

RKSimon added inline comments.May 26 2022, 9:54 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6079	was this on x86 / aarch64 or some other target?

ABataev added inline comments.May 26 2022, 9:57 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6079	x86, IIRC.

RKSimon added inline comments.May 26 2022, 9:58 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6079	if you can email a test case I'll take a look

Address comments.

ABataev added inline comments.May 26 2022, 11:26 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6079	Some of the lit test, do not remember already. Most of the cases were fixed already, I believe.

Harbormaster completed remote builds in B166510: Diff 432337.May 26 2022, 12:08 PM

LGTM

This revision is now accepted and ready to land.May 27 2022, 4:30 AM

Thanks for the updates.

This revision was landed with ongoing or failed builds.Jun 1 2022, 11:04 AM

Closed by commit rGfd5a6ce9dcb7: [SLP]Improve shuffles cost estimation where possible. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGfd5a6ce9dcb7: [SLP]Improve shuffles cost estimation where possible..

ABataev added a reverting change: rG73020b45407f: Revert "[SLP]Improve shuffles cost estimation where possible.".Jun 1 2022, 3:47 PM

ABataev added a commit: rG9980c9971892: [SLP]Improve shuffles cost estimation where possible..Jun 2 2022, 11:20 AM

Seems that the relanded 9980c9971892378ea82475e000de8df210a58e69 caused wasm test failures for Halide.
I'll try to find folks to give a smaller reproduce (I myself know nearly nothing about Halide), but here is the stack trace in case you notice something immediate.

# related to correctness_interleave in wasm-32-wasmrt-wasm_simd128-wasm_signext-wasm_sat_float_to_int mode

assert.h assertion failed at llvm/CodeGen/BasicTTIImpl.h:152 in llvm::InstructionCost llvm::BasicTTIImplBase<llvm::WebAssemblyTTIImpl>::getInsertSubvectorOverhead(llvm::VectorType *, int, llvm::FixedVectorType *) [T = llvm::WebAssemblyTTIImpl]: (!isa<FixedVectorType>(VTy) || (Index + NumSubElts) <= (int)cast<FixedVectorType>(VTy)->getNumElements()) && "SK_InsertSubvector index out of range"

*** Check failure stack trace: ***
    @     0x5606259259e4  absl::log_internal::LogMessage::SendToLog()
    @     0x56062592518a  absl::log_internal::LogMessage::Flush()
    @     0x560625925dc9  absl::log_internal::LogMessageFatal::~LogMessageFatal()
    @     0x560625905ce4  __assert_fail
    @     0x5606232d2f51  llvm::BasicTTIImplBase<>::getInsertSubvectorOverhead()
    @     0x5606232ce898  llvm::BasicTTIImplBase<>::getShuffleCost()
    @     0x560624ce91c5  llvm::TargetTransformInfo::getShuffleCost()
    @     0x56062477966b  llvm::slpvectorizer::BoUpSLP::getEntryCost()
    @     0x56062477ed63  llvm::slpvectorizer::BoUpSLP::getTreeCost()
    @     0x560624797c28  llvm::SLPVectorizerPass::tryToVectorizeList()
    @     0x56062479c4d8  llvm::SLPVectorizerPass::vectorizeInsertElementInst()
    @     0x56062479c67c  llvm::SLPVectorizerPass::vectorizeSimpleInstructions()
    @     0x560624793e82  llvm::SLPVectorizerPass::vectorizeChainsInBlock()
    @     0x560624791dbc  llvm::SLPVectorizerPass::runImpl()
    @     0x560624791168  llvm::SLPVectorizerPass::run()
    @     0x5606236a4b92  llvm::detail::PassModel<>::run()
    @     0x56062512c0b5  llvm::PassManager<>::run()
    @     0x560622c278b2  llvm::detail::PassModel<>::run()
    @     0x5606251300e0  llvm::ModuleToFunctionPassAdaptor::run()
    @     0x560621fc84f2  llvm::detail::PassModel<>::run()
    @     0x56062512b1de  llvm::PassManager<>::run()
    @     0x560621f9d82b  Halide::Internal::CodeGen_LLVM::optimize_module()
    @     0x560621f9ae1a  Halide::Internal::CodeGen_LLVM::finish_codegen()
    @     0x560621f9bdf5  Halide::Internal::CodeGen_LLVM::compile()
    @     0x56062250c909  Halide::Internal::WasmModuleContents::WasmModuleContents()
    @     0x5606225118f2  Halide::Internal::WasmModule::compile()
    @     0x560622155f52  Halide::Pipeline::compile_jit()
    @     0x5606221584ea  Halide::Pipeline::realize()
    @     0x560622157f21  Halide::Pipeline::realize()
    @     0x560622157701  Halide::Pipeline::realize()
    @     0x560621ef6ac5  Halide::Func::realize()
    @     0x560621ec01ad  main
    @     0x7f33b32b08d3  __libc_start_main
    @     0x560621ebe90a  _start

uabelho added a subscriber: uabelho.Jun 2 2022, 10:37 PM

I also bisected a bunch of failing assertions on ARM and aarch64 to this commit. Here's a reduced reproducer:

$ cat repro.c 
char *a;
long b;
int c() {
  int d, e = d = 0;
  for (; d < 8; d++)
    e += a[d] - a[b ^ d] - a[b] >> -a[d] >> 1;
  return e;
}
$ clang -target aarch64-linux-gnu -c repro.c -O3
clang: ../include/llvm/CodeGen/BasicTTIImpl.h:149: llvm::InstructionCost llvm::BasicTTIImplBase<T>::getInsertSubvectorOverhead(llvm::VectorType*, int, llvm::FixedVectorType*) [with T = llvm::AArch64TTIImpl]: Assertion `(!isa<FixedVectorType>(VTy) || (Index + NumSubElts) <= (int)cast<FixedVectorType>(VTy)->getNumElements()) && "SK_InsertSubvector index out of range"' failed.

MaskRay added a reverting change: rGdf0f30dc36c1: Revert "[SLP]Improve shuffles cost estimation where possible.".Jun 3 2022, 12:30 AM

ABataev added a commit: rGcac60940b771: [SLP]Improve shuffles cost estimation where possible..Jun 3 2022, 8:08 AM

Looks like cac60940b771a0685d058a5b471c84cea05fdc46 causes a miscompile again.
This can be reproduced at this commit or current head.
The commit causes https://github.com/pytorch/cpuinfo src/x86/isa.c:cpuinfo_x86_detect_isa to be miscompiled at least in -Os -fsanitize=memory -march=haswell -fsanitize-memory-param-retval -fsanitize-memory-use-after-dtor mode.

For the test test/init.cc, the address of isa soon incorrectly becomes zero:

(gdb) p &isa
$3 = (struct cpuinfo_x86_isa *) 0x0

Then the test will crash here isa.sysenter = !!(basic_info.edx & UINT32_C(0x00000800)); since the address of isa.sysenter is incorrectly considered as 0x2.

Herald added a subscriber: nlopes. · View Herald TranscriptJun 22 2022, 10:13 PM

cac60940b771a0685d058a5b471c84cea05fdc46 is not fixed by f96fbc5d96869e7c75f64dacab6e4894ed291530 [SLP]Fix a crash when insert subvector is out of range..
I am sorry as I am preparing a revert.

Here is -mllvm -print-changed -mllvm -print-module-scope output before SLPVectorizer: https://gist.github.com/MaskRay/23f0db50e136127fda1b4f83db2488da

MaskRay added a reverting change: rG1ffd2d99c29e: Revert D115462 "[SLP]Improve shuffles cost estimation where possible.".Jun 22 2022, 11:16 PM

In D115462#3603899, @MaskRay wrote:

Here is -mllvm -print-changed -mllvm -print-module-scope output before SLPVectorizer: https://gist.github.com/MaskRay/23f0db50e136127fda1b4f83db2488da

Double checked the patch and the results. The patch itself does not change the vectorization, just adjusts the cost model estimation. Some of the buildvector sequences are not profitable for re-vectorization after this patch, nothing else. Plus, the code after this patch results in the same code, being applied to the compiler without the patch, i.e. the code transformations are the same, just different vectorization.
Maybe the debug info is corrupted somehow? Or there are other effects?

In D115462#3605808, @ABataev wrote:

In D115462#3603899, @MaskRay wrote:

Here is -mllvm -print-changed -mllvm -print-module-scope output before SLPVectorizer: https://gist.github.com/MaskRay/23f0db50e136127fda1b4f83db2488da

Double checked the patch and the results. The patch itself does not change the vectorization, just adjusts the cost model estimation. Some of the buildvector sequences are not profitable for re-vectorization after this patch, nothing else. Plus, the code after this patch results in the same code, being applied to the compiler without the patch, i.e. the code transformations are the same, just different vectorization.

I have added IR before/after SLPVectorizer, with and without the two commits, and generated assembly to https://gist.github.com/MaskRay/23f0db50e136127fda1b4f83db2488da
Hope they are useful.

Maybe the debug info is corrupted somehow? Or there are other effects?

The codegen is corrupted. I provide debug info just in case it helps analyze the problem.

In D115462#3606101, @MaskRay wrote:

In D115462#3605808, @ABataev wrote:

In D115462#3603899, @MaskRay wrote:

Here is -mllvm -print-changed -mllvm -print-module-scope output before SLPVectorizer: https://gist.github.com/MaskRay/23f0db50e136127fda1b4f83db2488da

Double checked the patch and the results. The patch itself does not change the vectorization, just adjusts the cost model estimation. Some of the buildvector sequences are not profitable for re-vectorization after this patch, nothing else. Plus, the code after this patch results in the same code, being applied to the compiler without the patch, i.e. the code transformations are the same, just different vectorization.

I have added IR before/after SLPVectorizer, with and without the two commits, and generated assembly to https://gist.github.com/MaskRay/23f0db50e136127fda1b4f83db2488da
Hope they are useful.

Maybe the debug info is corrupted somehow? Or there are other effects?

The codegen is corrupted. I provide debug info just in case it helps analyze the problem.

Try to run something like:
opt -slp-vectorizer -S ./post-slp-bad.ll -o post-slp-after.ll

(opt without these patches) and compare post-slp-after.ll with post-slp-good.ll. It will be the same. The only thing the patch does is prevents some insertelement vectorization in your case, nothing else. That's why the result will be same. Most probably it just reveals the bug somewhere in the compiler, maybe in lowering.

Yes, I suspect that this change just exposed a bug in the X86 backend handling inline asm. I have added some notes on D128461 and am keeping investigation.

I more firmly believe I made a mistake. Sorry for that. You may recommit.

In D115462#3607097, @MaskRay wrote:

I more firmly believe I made a mistake. Sorry for that. You may recommit.

No problem

ABataev added a commit: rG2faacf61a50e: [SLP]Improve shuffles cost estimation where possible..Jun 24 2022, 9:30 AM

Heads up: I am seeing a clang crash on arm with this commit:

commit 2faacf61a50e7f23fd10927cbbb98c59799bfcd0
Author: Alexey Bataev <a.bataev@outlook.com>
Date: Thu Dec 9 10:34:08 2021 -0800
CommitDate: Fri Jun 24 09:28:01 2022 -0700

[SLP]Improve shuffles cost estimation where possible.

I am trying to create a reduced test case.

In D115462#3611255, @manojgupta wrote:
Heads up: I am seeing a clang crash on arm with this commit:

commit 2faacf61a50e7f23fd10927cbbb98c59799bfcd0
Author: Alexey Bataev <a.bataev@outlook.com>
Date: Thu Dec 9 10:34:08 2021 -0800
CommitDate: Fri Jun 24 09:28:01 2022 -0700
[SLP]Improve shuffles cost estimation where possible.
I am trying to create a reduced test case.

It must be fixed already, please, check trunc.

Still crashes on trunk.

C-reduce Test case:

typedef __INT64_TYPE__ int64_t;
int sbr_autocorrelate_c_x_i;
void phiautocorr_calc(int64_t );

void sbr_autocorrelate_c_x(void) {
  int(*x)[2] = sbr_autocorrelate_c_x;
  int64_t accu_re , accu_im = 0;
  for (; sbr_autocorrelate_c_x_i; sbr_autocorrelate_c_x_i++) {
    accu_re +=
        x[sbr_autocorrelate_c_x_i][0] * x[sbr_autocorrelate_c_x_i + 2][0];
    accu_re +=
        x[sbr_autocorrelate_c_x_i][1] * x[sbr_autocorrelate_c_x_i + 2][1];
    accu_im +=
        x[sbr_autocorrelate_c_x_i][0] * x[sbr_autocorrelate_c_x_i + 2][1];
    accu_im -=
        x[sbr_autocorrelate_c_x_i][1] * x[sbr_autocorrelate_c_x_i + 2][0];
  }
  phiautocorr_calc(accu_im);
  phiautocorr_calc(accu_re);
}

Crashes with
clang -Os -c test.c --target=armv7a-linux-gnueabihf -mfpu=neon -Wno-error

llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7744: llvm::Value *llvm::slpvectorizer::BoUpSLP::createBuildVector(ArrayRef<llvm::Value *>): Assertion `any_of(VectorizableTree, [VL](const std::unique_ptr<TreeEntry> &TE) { return TE->State == TreeEntry::NeedToGather && TE->isSame(VL); }) && "Non-matching gather node."' failed.

In D115462#3613388, @manojgupta wrote:
Still crashes on trunk.

C-reduce Test case:
typedef __INT64_TYPE__ int64_t;
int sbr_autocorrelate_c_x_i;
void phiautocorr_calc(int64_t );

void sbr_autocorrelate_c_x(void) {
  int(*x)[2] = sbr_autocorrelate_c_x;
  int64_t accu_re , accu_im = 0;
  for (; sbr_autocorrelate_c_x_i; sbr_autocorrelate_c_x_i++) {
    accu_re +=
        x[sbr_autocorrelate_c_x_i][0] * x[sbr_autocorrelate_c_x_i + 2][0];
    accu_re +=
        x[sbr_autocorrelate_c_x_i][1] * x[sbr_autocorrelate_c_x_i + 2][1];
    accu_im +=
        x[sbr_autocorrelate_c_x_i][0] * x[sbr_autocorrelate_c_x_i + 2][1];
    accu_im -=
        x[sbr_autocorrelate_c_x_i][1] * x[sbr_autocorrelate_c_x_i + 2][0];
  }
  phiautocorr_calc(accu_im);
  phiautocorr_calc(accu_re);
}
Crashes with
clang -Os -c test.c --target=armv7a-linux-gnueabihf -mfpu=neon -Wno-error

llvm/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:7744: llvm::Value *llvm::slpvectorizer::BoUpSLP::createBuildVector(ArrayRef<llvm::Value *>): Assertion `any_of(VectorizableTree, [VL](const std::unique_ptr<TreeEntry> &TE) { return TE->State == TreeEntry::NeedToGather && TE->isSame(VL); }) && "Non-matching gather node."' failed.

Thanks, there is PR56251 already. The patch is not a cause of the crash, it has an extra assert that reveals internal bug in SLP vectorizer. Will send a patch with a fix in a few minutes.

I saw another crash on chromium builds - not sure if this is related to the previous mentioned crash but here's a creduced repro:

struct f {
  float g;
  float h;
};
struct j {
  j() = default;
  j(float k, float l) : c(k), d(l) {}
  j operator*(j k) const { return j(k.a + c, b + k.d); }
  float a = 1.0f;
  float b = 0.0f;
  float c = 0.0f;
  float d = 1.0f;
  float e = 0.0f;
};
struct m {
  j n() const;
  f o;
  j p;
};
j m::n() const {
  if (o.g || o.h)
    return j();
  j a;
  return p * a;
}

build with

clang -cc1 -O2 -vectorize-slp -emit-llvm -fno-delete-null-pointer-checks t.cpp

In D115462#3613570, @akhuang wrote:

I saw another crash on chromium builds - not sure if this is related to the previous mentioned crash but here's a creduced repro:

struct f {
  float g;
  float h;
};
struct j {
  j() = default;
  j(float k, float l) : c(k), d(l) {}
  j operator*(j k) const { return j(k.a + c, b + k.d); }
  float a = 1.0f;
  float b = 0.0f;
  float c = 0.0f;
  float d = 1.0f;
  float e = 0.0f;
};
struct m {
  j n() const;
  f o;
  j p;
};
j m::n() const {
  if (o.g || o.h)
    return j();
  j a;
  return p * a;
}

build with

clang -cc1 -O2 -vectorize-slp -emit-llvm -fno-delete-null-pointer-checks t.cpp

Yes, the cause is the same but need to adjust the patch D128680 a bit.

In D115462#3613570, @akhuang wrote:

I saw another crash on chromium builds - not sure if this is related to the previous mentioned crash but here's a creduced repro:

struct f {
  float g;
  float h;
};
struct j {
  j() = default;
  j(float k, float l) : c(k), d(l) {}
  j operator*(j k) const { return j(k.a + c, b + k.d); }
  float a = 1.0f;
  float b = 0.0f;
  float c = 0.0f;
  float d = 1.0f;
  float e = 0.0f;
};
struct m {
  j n() const;
  f o;
  j p;
};
j m::n() const {
  if (o.g || o.h)
    return j();
  j a;
  return p * a;
}

build with

clang -cc1 -O2 -vectorize-slp -emit-llvm -fno-delete-null-pointer-checks t.cpp

Fixed in D128680

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

81 lines

test/

Transforms/

SLPVectorizer/

AArch64/

accelerate-vector-functions-inseltpoison.ll

96 lines

accelerate-vector-functions.ll

96 lines

tsc-s116.ll

29 lines

X86/

PR39774.ll

110 lines

alternate-int-inseltpoison.ll

24 lines

alternate-int.ll

24 lines

load-merge-inseltpoison.ll

13 lines

load-merge.ll

13 lines

malformed_phis.ll

34 lines

resched.ll

29 lines

Diff 433462

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,551 Lines • ▼ Show 20 Lines	computeExtractCost(ArrayRef<Value > VL, FixedVectorType VecTy,

// Process extracts in blocks of EltsPerVector to check if the source vector		// Process extracts in blocks of EltsPerVector to check if the source vector
// operand can be re-used directly. If not, add the cost of creating a shuffle		// operand can be re-used directly. If not, add the cost of creating a shuffle
// to extract the values into a vector register.		// to extract the values into a vector register.
SmallVector<int> RegMask(EltsPerVector, UndefMaskElem);		SmallVector<int> RegMask(EltsPerVector, UndefMaskElem);
for (auto *V : VL) {		for (auto *V : VL) {
++Idx;		++Idx;

// Need to exclude undefs from analysis.
if (isa<UndefValue>(V) \|\| Mask[Idx] == UndefMaskElem)
continue;

// Reached the start of a new vector registers.		// Reached the start of a new vector registers.
if (Idx % EltsPerVector == 0) {		if (Idx % EltsPerVector == 0) {
RegMask.assign(EltsPerVector, UndefMaskElem);		RegMask.assign(EltsPerVector, UndefMaskElem);
AllConsecutive = true;		AllConsecutive = true;
continue;		continue;
}		}

		// Need to exclude undefs from analysis.
		if (isa<UndefValue>(V) \|\| Mask[Idx] == UndefMaskElem)
		continue;

// Check all extracts for a vector register on the target directly		// Check all extracts for a vector register on the target directly
// extract values in order.		// extract values in order.
unsigned CurrentIdx = *getExtractIndex(cast<Instruction>(V));		unsigned CurrentIdx = *getExtractIndex(cast<Instruction>(V));
if (!isa<UndefValue>(VL[Idx - 1]) && Mask[Idx - 1] != UndefMaskElem) {		if (!isa<UndefValue>(VL[Idx - 1]) && Mask[Idx - 1] != UndefMaskElem) {
unsigned PrevIdx = *getExtractIndex(cast<Instruction>(VL[Idx - 1]));		unsigned PrevIdx = *getExtractIndex(cast<Instruction>(VL[Idx - 1]));
AllConsecutive &= PrevIdx + 1 == CurrentIdx &&		AllConsecutive &= PrevIdx + 1 == CurrentIdx &&
CurrentIdx % EltsPerVector == Idx % EltsPerVector;		CurrentIdx % EltsPerVector == Idx % EltsPerVector;
RegMask[Idx % EltsPerVector] = CurrentIdx % EltsPerVector;		RegMask[Idx % EltsPerVector] = CurrentIdx % EltsPerVector;
▲ Show 20 Lines • Show All 428 Lines • ▼ Show 20 Lines	case Instruction::ExtractElement: {
AdjustExtractsCost(CommonCost);		AdjustExtractsCost(CommonCost);
}		}
return CommonCost;		return CommonCost;
}		}
case Instruction::InsertElement: {		case Instruction::InsertElement: {
assert(E->ReuseShuffleIndices.empty() &&		assert(E->ReuseShuffleIndices.empty() &&
"Unique insertelements only are expected.");		"Unique insertelements only are expected.");
auto *SrcVecTy = cast<FixedVectorType>(VL0->getType());		auto *SrcVecTy = cast<FixedVectorType>(VL0->getType());

unsigned const NumElts = SrcVecTy->getNumElements();		unsigned const NumElts = SrcVecTy->getNumElements();
unsigned const NumScalars = VL.size();		unsigned const NumScalars = VL.size();

		unsigned NumOfParts = TTI->getNumberOfParts(SrcVecTy);

		unsigned OffsetBeg = *getInsertIndex(VL.front());
		unsigned OffsetEnd = *getInsertIndex(VL.back());
		unsigned VecSz = NumElts;
		unsigned VecScalarsSz = NumScalars;
		if (NumOfParts > 0) {
		VecScalarsSz = NumElts / NumOfParts;
		VecSz = PowerOf2Ceil(
		(1 + OffsetEnd / VecScalarsSz - OffsetBeg / VecScalarsSz) *
		VecScalarsSz);
		}

APInt DemandedElts = APInt::getZero(NumElts);		APInt DemandedElts = APInt::getZero(NumElts);
// TODO: Add support for Instruction::InsertValue.		// TODO: Add support for Instruction::InsertValue.
SmallVector<int> Mask;		SmallVector<int> Mask;
if (!E->ReorderIndices.empty()) {		if (!E->ReorderIndices.empty()) {
inversePermutation(E->ReorderIndices, Mask);		inversePermutation(E->ReorderIndices, Mask);
Mask.append(NumElts - NumScalars, UndefMaskElem);
} else {		} else {
Mask.assign(NumElts, UndefMaskElem);		Mask.assign(NumScalars, UndefMaskElem);
std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);		std::iota(Mask.begin(), std::next(Mask.begin(), NumScalars), 0);
}		}
unsigned Offset = *getInsertIndex(VL0);
bool IsIdentity = true;		bool IsIdentity = true;
SmallVector<int> PrevMask(NumElts, UndefMaskElem);		SmallVector<int> PrevMask(VecSz, UndefMaskElem);
Mask.swap(PrevMask);		Mask.swap(PrevMask);
		unsigned Offset = VecScalarsSz * (OffsetBeg / VecScalarsSz);
for (unsigned I = 0; I < NumScalars; ++I) {		for (unsigned I = 0; I < NumScalars; ++I) {
unsigned InsertIdx = *getInsertIndex(VL[PrevMask[I]]);		unsigned InsertIdx = *getInsertIndex(VL[PrevMask[I]]);
DemandedElts.setBit(InsertIdx);		DemandedElts.setBit(InsertIdx);
IsIdentity &= InsertIdx - Offset == I;		IsIdentity &= InsertIdx - Offset == I;
Mask[InsertIdx - Offset] = I;		Mask[InsertIdx - Offset] = I;
}		}
assert(Offset < NumElts && "Failed to find vector index offset");		assert(Offset < NumElts && "Failed to find vector index offset");

InstructionCost Cost = 0;		InstructionCost Cost = 0;
Cost -= TTI->getScalarizationOverhead(SrcVecTy, DemandedElts,		Cost -= TTI->getScalarizationOverhead(SrcVecTy, DemandedElts,
/Insert/ true, /Extract/ false);		/Insert/ true, /Extract/ false);

if (IsIdentity && NumElts != NumScalars && Offset % NumScalars != 0) {		// First cost - resize to actual vector size if not identity shuffle or
// FIXME: Replace with SK_InsertSubvector once it is properly supported.		// need to shift the vector.
unsigned Sz = PowerOf2Ceil(Offset + NumScalars);		// Do not calculate the cost if the actual size is the register size and
Cost += TTI->getShuffleCost(		// we can merge this shuffle with the following SK_Select.
TargetTransformInfo::SK_PermuteSingleSrc,		auto *ActualVecTy =
FixedVectorType::get(SrcVecTy->getElementType(), Sz));		FixedVectorType::get(SrcVecTy->getElementType(), VecSz);
} else if (!IsIdentity) {		if ((!IsIdentity \|\| Offset != OffsetBeg) && VecScalarsSz != VecSz)
auto *FirstInsert =		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
cast<Instruction>(find_if(E->Scalars, [E](Value V) {		ActualVecTy, Mask);
return !is_contained(E->Scalars,		auto FirstInsert = cast<Instruction>(find_if(E->Scalars, [E](Value *V) {
cast<Instruction>(V)->getOperand(0));		return !is_contained(E->Scalars, cast<Instruction>(V)->getOperand(0));
}));		}));
if (isUndefVector(FirstInsert->getOperand(0))) {		// Second cost - permutation with subvector, if some elements are from the
Cost += TTI->getShuffleCost(TTI::SK_PermuteSingleSrc, SrcVecTy, Mask);		// initial vector or inserting a subvector.
} else {		// TODO: Implement the analysis of the FirstInsert->getOperand(0)
SmallVector<int> InsertMask(NumElts);		// subvector of ActualVecTy.
std::iota(InsertMask.begin(), InsertMask.end(), 0);		if (!isUndefVector(FirstInsert->getOperand(0)) && Offset != OffsetBeg) {
for (unsigned I = 0; I < NumElts; I++) {		Cost += TTI->getShuffleCost(
if (Mask[I] != UndefMaskElem)		TTI::SK_InsertSubvector,
InsertMask[Offset + I] = NumElts + I;		(OffsetEnd - OffsetBeg < VecScalarsSz && NumOfParts > 0)
}		? FixedVectorType::get(SrcVecTy->getElementType(), VecScalarsSz)
Cost +=		: ActualVecTy,
TTI->getShuffleCost(TTI::SK_PermuteTwoSrc, SrcVecTy, InsertMask);		None, OffsetBeg - Offset, VecTy);
}
}		}
		dmgreenUnsubmitted Not Done Reply Inline Actions I'm not sure I understand why this would be a SK_Select. That is a bit of a X86 special as far as I understand and doesn't always correlate well to other architectures. Why is the Mask missing too? That might be enough to help avoid the regressions if it was re-added. dmgreen: I'm not sure I understand why this would be a SK_Select. That is a bit of a X86 special as far…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It is a permuatation of 2 sub-vectors: the root of the buildvector and a subvector after the vectorization. Since it was a buildvector, the compiler selects elements from the root and corresponding elements from the resulting vector. Mask is not required, if TTI::SK_Select is used, mask is used only with SK_PermuteSingleSrc and SK_PermuteTwoSrc. But I'll check it. ABataev: 1. It is a permuatation of 2 sub-vectors: the root of the buildvector and a subvector after the…
		dmgreenUnsubmitted Not Done Reply Inline Actions AArch64 (and most other architectures AFAIU) do not have SK_Select shuffles, so is not a lot better than SK_PermuteTwoSrc. A Mask can help to improve the cost though, if the backend can come up with something more accurate for it. I'm surprised this is not a SK_InsertSubvector with adjacent elements though - that seems like the most natural fit, unless I'm missing how this works. dmgreen: AArch64 (and most other architectures AFAIU) do not have SK_Select shuffles, so is not a lot…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yep, you right, it must be an InserSubvector kind, changed it to Select because some cost for InsertSubvector were not implemented. ABataev: Yep, you right, it must be an InserSubvector kind, changed it to Select because some cost for…
		RKSimonUnsubmitted Not Done Reply Inline Actions was this on x86 / aarch64 or some other target? RKSimon: was this on x86 / aarch64 or some other target?
		ABataevAuthorUnsubmitted Done Reply Inline Actions x86, IIRC. ABataev: x86, IIRC.
		RKSimonUnsubmitted Not Done Reply Inline Actions if you can email a test case I'll take a look RKSimon: if you can email a test case I'll take a look
		ABataevAuthorUnsubmitted Done Reply Inline Actions Some of the lit test, do not remember already. Most of the cases were fixed already, I believe. ABataev: Some of the lit test, do not remember already. Most of the cases were fixed already, I believe.

return Cost;		return Cost;
}		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
▲ Show 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	bool BoUpSLP::isLoadCombineCandidate() const {
}		}
return true;		return true;
}		}

bool BoUpSLP::isTreeTinyAndNotFullyVectorizable(bool ForReduction) const {		bool BoUpSLP::isTreeTinyAndNotFullyVectorizable(bool ForReduction) const {
// No need to vectorize inserts of gathered values.		// No need to vectorize inserts of gathered values.
if (VectorizableTree.size() == 2 &&		if (VectorizableTree.size() == 2 &&
isa<InsertElementInst>(VectorizableTree[0]->Scalars[0]) &&		isa<InsertElementInst>(VectorizableTree[0]->Scalars[0]) &&
VectorizableTree[1]->State == TreeEntry::NeedToGather)		VectorizableTree[1]->State == TreeEntry::NeedToGather &&
		(VectorizableTree[1]->getVectorFactor() <= 2 \|\|
		!(isSplat(VectorizableTree[1]->Scalars) \|\|
		allConstant(VectorizableTree[1]->Scalars))))
return true;		return true;

// We can vectorize the tree if its size is greater than or equal to the		// We can vectorize the tree if its size is greater than or equal to the
// minimum size specified by the MinTreeSize command line option.		// minimum size specified by the MinTreeSize command line option.
if (VectorizableTree.size() >= MinTreeSize)		if (VectorizableTree.size() >= MinTreeSize)
return false;		return false;

// If we have a tiny tree (a tree whose size is less than MinTreeSize), we		// If we have a tiny tree (a tree whose size is less than MinTreeSize), we
▲ Show 20 Lines • Show All 5,702 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

	Show All 16 Lines
	;			;
	; NOACCELERATE-LABEL: @int_sin_4x(			; NOACCELERATE-LABEL: @int_sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.sin.f32(float %vecext)			%1 = tail call fast float @llvm.sin.f32(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @llvm.sin.f32(float %vecext.1)			%2 = tail call fast float @llvm.sin.f32(float %vecext.1)
	▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @exp_4x(			; NOACCELERATE-LABEL: @exp_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @expf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @expf(float %vecext)			%1 = tail call fast float @expf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @expf(float %vecext.1)			%2 = tail call fast float @expf(float %vecext.1)
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @log_4x(			; NOACCELERATE-LABEL: @log_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @logf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @logf(float %vecext)			%1 = tail call fast float @logf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @logf(float %vecext.1)			%2 = tail call fast float @logf(float %vecext.1)
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @sin_4x(			; NOACCELERATE-LABEL: @sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @sinf(float %vecext)			%1 = tail call fast float @sinf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @sinf(float %vecext.1)			%2 = tail call fast float @sinf(float %vecext.1)
	Show All 16 Lines
	;			;
	; NOACCELERATE-LABEL: @cos_4x(			; NOACCELERATE-LABEL: @cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @cosf(float %vecext)			%1 = tail call fast float @cosf(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @cosf(float %vecext.1)			%2 = tail call fast float @cosf(float %vecext.1)
	▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @int_cos_4x(			; NOACCELERATE-LABEL: @int_cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.cos.f32(float %vecext)			%1 = tail call fast float @llvm.cos.f32(float %vecext)
	%vecins = insertelement <4 x float> poison, float %1, i32 0			%vecins = insertelement <4 x float> poison, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @llvm.cos.f32(float %vecext.1)			%2 = tail call fast float @llvm.cos.f32(float %vecext.1)
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

	Show All 16 Lines
	;			;
	; NOACCELERATE-LABEL: @int_sin_4x(			; NOACCELERATE-LABEL: @int_sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.sin.f32(float %vecext)			%1 = tail call fast float @llvm.sin.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @llvm.sin.f32(float %vecext.1)			%2 = tail call fast float @llvm.sin.f32(float %vecext.1)
	▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @exp_4x(			; NOACCELERATE-LABEL: @exp_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @expf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.exp.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @expf(float %vecext)			%1 = tail call fast float @expf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @expf(float %vecext.1)			%2 = tail call fast float @expf(float %vecext.1)
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @log_4x(			; NOACCELERATE-LABEL: @log_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @logf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @logf(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @logf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.log.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @logf(float %vecext)			%1 = tail call fast float @logf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @logf(float %vecext.1)			%2 = tail call fast float @logf(float %vecext.1)
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @sin_4x(			; NOACCELERATE-LABEL: @sin_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @sinf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @sinf(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @sinf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.sin.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @sinf(float %vecext)			%1 = tail call fast float @sinf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @sinf(float %vecext.1)			%2 = tail call fast float @sinf(float %vecext.1)
	Show All 16 Lines
	;			;
	; NOACCELERATE-LABEL: @cos_4x(			; NOACCELERATE-LABEL: @cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @cosf(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @cosf(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @cosf(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @cosf(float %vecext)			%1 = tail call fast float @cosf(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @cosf(float %vecext.1)			%2 = tail call fast float @cosf(float %vecext.1)
	▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines
	;			;
	; NOACCELERATE-LABEL: @int_cos_4x(			; NOACCELERATE-LABEL: @int_cos_4x(
	; NOACCELERATE-NEXT: entry:			; NOACCELERATE-NEXT: entry:
	; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16			; NOACCELERATE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> [[A:%.*]], align 16
	; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; NOACCELERATE-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])			; NOACCELERATE-NEXT: [[TMP1:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT]])
	; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0			; NOACCELERATE-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
	; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; NOACCELERATE-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
				; NOACCELERATE-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_1]])
				; NOACCELERATE-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
	; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; NOACCELERATE-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; NOACCELERATE-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[VECEXT_1]], i32 0
	; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[VECEXT_2]], i32 1
	; NOACCELERATE-NEXT: [[TMP4:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP3]])
	; NOACCELERATE-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; NOACCELERATE-NEXT: [[VECINS_21:%.*]] = shufflevector <4 x float> [[VECINS]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; NOACCELERATE-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; NOACCELERATE-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.cos.f32(float [[VECEXT_3]])			; NOACCELERATE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[VECEXT_2]], i32 0
	; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3			; NOACCELERATE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_3]], i32 1
	; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]			; NOACCELERATE-NEXT: [[TMP5:%.*]] = call fast <2 x float> @llvm.cos.v2f32(<2 x float> [[TMP4]])
				; NOACCELERATE-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
				; NOACCELERATE-NEXT: [[VECINS_31:%.*]] = shufflevector <4 x float> [[VECINS_1]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
				; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_31]]
	;			;
	entry:			entry:
	%0 = load <4 x float>, <4 x float>* %a, align 16			%0 = load <4 x float>, <4 x float>* %a, align 16
	%vecext = extractelement <4 x float> %0, i32 0			%vecext = extractelement <4 x float> %0, i32 0
	%1 = tail call fast float @llvm.cos.f32(float %vecext)			%1 = tail call fast float @llvm.cos.f32(float %vecext)
	%vecins = insertelement <4 x float> undef, float %1, i32 0			%vecins = insertelement <4 x float> undef, float %1, i32 0
	%vecext.1 = extractelement <4 x float> %0, i32 1			%vecext.1 = extractelement <4 x float> %0, i32 1
	%2 = tail call fast float @llvm.cos.f32(float %vecext.1)			%2 = tail call fast float @llvm.cos.f32(float %vecext.1)
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

	Show All 13 Lines
	; contiguous. The score estimation needs to be corrected, so that these 4 loads			; contiguous. The score estimation needs to be corrected, so that these 4 loads
	; are not selected for vectorization. Instead we should vectorize with			; are not selected for vectorization. Instead we should vectorize with
	; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.			; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.

	define void @s116_modified(float* %a) {			define void @s116_modified(float* %a) {
	; CHECK-LABEL: @s116_modified(			; CHECK-LABEL: @s116_modified(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[A]], i64 1
	; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[A]], i64 3			; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[A]], i64 2
				; CHECK-NEXT: [[GEP4:%.]] = getelementptr inbounds float, float [[A]], i64 4
				; CHECK-NEXT: [[LD1:%.]] = load float, float [[GEP1]], align 4
	; CHECK-NEXT: [[LD0:%.]] = load float, float [[GEP0]], align 4			; CHECK-NEXT: [[LD0:%.]] = load float, float [[GEP0]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP2]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[GEP3]] to <2 x float>*			; CHECK-NEXT: [[LD4:%.]] = load float, float [[GEP4]], align 4
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> poison, float [[LD0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> poison, float [[LD0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float [[LD4]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> poison, float [[LD1]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float [[LD1]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP10]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 2, i32 4>			; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x float> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <4 x float> [[TMP9]], [[TMP11]]			; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: [[TMP13:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP11]], align 4
	; CHECK-NEXT: store <4 x float> [[TMP12]], <4 x float>* [[TMP13]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep0 = getelementptr inbounds float, float* %a, i64 0			%gep0 = getelementptr inbounds float, float* %a, i64 0
	%gep1 = getelementptr inbounds float, float* %a, i64 1			%gep1 = getelementptr inbounds float, float* %a, i64 1
	%gep2 = getelementptr inbounds float, float* %a, i64 2			%gep2 = getelementptr inbounds float, float* %a, i64 2
	%gep3 = getelementptr inbounds float, float* %a, i64 3			%gep3 = getelementptr inbounds float, float* %a, i64 3
	%gep4 = getelementptr inbounds float, float* %a, i64 4			%gep4 = getelementptr inbounds float, float* %a, i64 4
	%ld0 = load float, float* %gep0			%ld0 = load float, float* %gep0
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 -slp-min-tree-size=5 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 -slp-min-tree-size=5 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP0]], i32 1			; CHECK-NEXT: [[SHUFFLE7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[TMP0]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP0]], i32 3			; CHECK-NEXT: [[SHUFFLE6:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[TMP0]], i32 4
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP0]], i32 5
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP0]], i32 6
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP0]], i32 7
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[TMP0]], i32 4
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <16 x i32> [[TMP13]], i32 [[TMP0]], i32 5
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> [[TMP14]], i32 [[TMP0]], i32 6
	; CHECK-NEXT: [[TMP16:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP0]], i32 7
	; CHECK-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP16]], i32 [[TMP0]], i32 8
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <16 x i32> [[TMP17]], i32 [[TMP0]], i32 9
	; CHECK-NEXT: [[TMP19:%.*]] = insertelement <16 x i32> [[TMP18]], i32 [[TMP0]], i32 10
	; CHECK-NEXT: [[TMP20:%.*]] = insertelement <16 x i32> [[TMP19]], i32 [[TMP0]], i32 11
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <16 x i32> [[TMP20]], i32 [[TMP0]], i32 12
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[TMP0]], i32 13
	; CHECK-NEXT: [[TMP23:%.*]] = insertelement <16 x i32> [[TMP22]], i32 [[TMP0]], i32 14
	; CHECK-NEXT: [[TMP24:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[TMP0]], i32 15
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP25:%.]] = phi <2 x i32> [ [[TMP36:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP14:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP25]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP27:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP5:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[TMP28:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP24]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[SHUFFLE6]])
	; CHECK-NEXT: [[TMP29:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP8]])			; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[SHUFFLE7]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP28]], [[TMP29]]			; CHECK-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP30:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP27]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP5]])
	; CHECK-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP30]]			; CHECK-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP8]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = and i32 [[TMP0]], [[TMP0]]			; CHECK-NEXT: [[OP_RDX2:%.*]] = and i32 [[TMP0]], [[TMP0]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP26]]			; CHECK-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP4]]
	; CHECK-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]			; CHECK-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <2 x i32> poison, i32 [[OP_RDX1]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[OP_RDX1]], i32 0
	; CHECK-NEXT: [[TMP32:%.*]] = insertelement <2 x i32> [[TMP31]], i32 [[TMP26]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[TMP4]], i32 1
	; CHECK-NEXT: [[TMP33:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX4]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX4]], i32 0
	; CHECK-NEXT: [[TMP34:%.*]] = and <2 x i32> [[TMP32]], [[TMP33]]			; CHECK-NEXT: [[TMP12:%.*]] = and <2 x i32> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[TMP35:%.*]] = add <2 x i32> [[TMP32]], [[TMP33]]			; CHECK-NEXT: [[TMP13:%.*]] = add <2 x i32> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[TMP36]] = shufflevector <2 x i32> [[TMP34]], <2 x i32> [[TMP35]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP14]] = shufflevector <2 x i32> [[TMP12]], <2 x i32> [[TMP13]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP0]], i32 1			; FORCE_REDUCTION-NEXT: [[SHUFFLE7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[TMP0]], i32 2			; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP0]], i32 3			; FORCE_REDUCTION-NEXT: [[SHUFFLE6:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer
	; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[TMP0]], i32 4
	; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP0]], i32 5
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP0]], i32 6
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP0]], i32 7
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[TMP0]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[TMP0]], i32 2
	; FORCE_REDUCTION-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 [[TMP0]], i32 3
	; FORCE_REDUCTION-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[TMP0]], i32 4
	; FORCE_REDUCTION-NEXT: [[TMP14:%.*]] = insertelement <16 x i32> [[TMP13]], i32 [[TMP0]], i32 5
	; FORCE_REDUCTION-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> [[TMP14]], i32 [[TMP0]], i32 6
	; FORCE_REDUCTION-NEXT: [[TMP16:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP0]], i32 7
	; FORCE_REDUCTION-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP16]], i32 [[TMP0]], i32 8
	; FORCE_REDUCTION-NEXT: [[TMP18:%.*]] = insertelement <16 x i32> [[TMP17]], i32 [[TMP0]], i32 9
	; FORCE_REDUCTION-NEXT: [[TMP19:%.*]] = insertelement <16 x i32> [[TMP18]], i32 [[TMP0]], i32 10
	; FORCE_REDUCTION-NEXT: [[TMP20:%.*]] = insertelement <16 x i32> [[TMP19]], i32 [[TMP0]], i32 11
	; FORCE_REDUCTION-NEXT: [[TMP21:%.*]] = insertelement <16 x i32> [[TMP20]], i32 [[TMP0]], i32 12
	; FORCE_REDUCTION-NEXT: [[TMP22:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[TMP0]], i32 13
	; FORCE_REDUCTION-NEXT: [[TMP23:%.*]] = insertelement <16 x i32> [[TMP22]], i32 [[TMP0]], i32 14
	; FORCE_REDUCTION-NEXT: [[TMP24:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[TMP0]], i32 15
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP25:%.]] = phi <2 x i32> [ [[TMP32:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP25]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; FORCE_REDUCTION-NEXT: [[TMP26:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP27:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; FORCE_REDUCTION-NEXT: [[TMP28:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP24]])			; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[SHUFFLE6]])
	; FORCE_REDUCTION-NEXT: [[TMP29:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP8]])			; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[SHUFFLE7]])
	; FORCE_REDUCTION-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP28]], [[TMP29]]			; FORCE_REDUCTION-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP6]], [[TMP7]]
	; FORCE_REDUCTION-NEXT: [[TMP30:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP27]])			; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP5]])
	; FORCE_REDUCTION-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP30]]			; FORCE_REDUCTION-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP8]]
	; FORCE_REDUCTION-NEXT: [[OP_RDX2:%.*]] = and i32 [[TMP0]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_RDX2:%.*]] = and i32 [[TMP0]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP26]]			; FORCE_REDUCTION-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP4]]
	; FORCE_REDUCTION-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]			; FORCE_REDUCTION-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]
	; FORCE_REDUCTION-NEXT: [[OP_RDX5:%.*]] = and i32 [[OP_RDX1]], [[OP_RDX4]]			; FORCE_REDUCTION-NEXT: [[OP_RDX5:%.*]] = and i32 [[OP_RDX1]], [[OP_RDX4]]
	; FORCE_REDUCTION-NEXT: [[VAL_43:%.*]] = add i32 [[TMP26]], 14910			; FORCE_REDUCTION-NEXT: [[VAL_43:%.*]] = add i32 [[TMP4]], 14910
	; FORCE_REDUCTION-NEXT: [[TMP31:%.*]] = insertelement <2 x i32> poison, i32 [[OP_RDX5]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> poison, i32 [[OP_RDX5]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP32]] = insertelement <2 x i32> [[TMP31]], i32 [[VAL_43]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP10]] = insertelement <2 x i32> [[TMP9]], i32 [[VAL_43]], i32 1
	; FORCE_REDUCTION-NEXT: br label [[LOOP]]			; FORCE_REDUCTION-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]			%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]
	%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]			%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i64 6
; SSE-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7
		; SSE-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i64 6
		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
; SSE-NEXT: [[TMP8:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; SSE-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i64 6
; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R52]], <8 x i32> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7
; SSE-NEXT: ret <8 x i32> [[R71]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	;
%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5		%r5 = insertelement <8 x i32> %r4, i32 %ab5, i32 5
%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6		%r6 = insertelement <8 x i32> %r5, i32 %ab6, i32 6
%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7		%r7 = insertelement <8 x i32> %r6, i32 %ab7, i32 7
ret <8 x i32> %r7		ret <8 x i32> %r7
}		}

define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @ashr_lshr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @ashr_lshr_shl_v8i32(		; SSE-LABEL: @ashr_lshr_shl_v8i32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i64 6
; SSE-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i64 7
		; SSE-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i64 6
		; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i64 7
		; SSE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[B]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[TMP6:%.*]] = lshr <8 x i32> [[A]], [[B]]
; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP6]], <8 x i32> poison, <2 x i32> <i32 4, i32 5>
; SSE-NEXT: [[TMP8:%.*]] = shl <8 x i32> [[A]], [[B]]		; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
; SSE-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> poison, <2 x i32> <i32 6, i32 7>		; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
; SSE-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[R52:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>		; SSE-NEXT: [[R51:%.*]] = shufflevector <8 x i32> [[TMP8]], <8 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 undef, i32 undef>
; SSE-NEXT: [[TMP12:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R51]], i32 [[AB6]], i64 6
; SSE-NEXT: [[R71:%.*]] = shufflevector <8 x i32> [[R52]], <8 x i32> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 8, i32 9>		; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i64 7
; SSE-NEXT: ret <8 x i32> [[R71]]		; SSE-NEXT: ret <8 x i32> [[R7]]
;		;
; SLM-LABEL: @ashr_lshr_shl_v8i32(		; SLM-LABEL: @ashr_lshr_shl_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[B:%.]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = ashr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	entry:
%shl10 = shl nuw i32 %conv9, 24		%shl10 = shl nuw i32 %conv9, 24
%or11 = or i32 %or7, %shl10		%or11 = or i32 %or7, %shl10
ret i32 %or11		ret i32 %or11
}		}

define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref(		; CHECK-LABEL: @PR16739_byref(
; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0		; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1		; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[X0:%.]] = load float, float [[GEP0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*
; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> poison, float [[X0]], i32 0		; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[I21:%.*]] = shufflevector <4 x float> [[I0]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>		; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[TMP3]], float [[X2]], i32 2
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[I21]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 2, i32 5>		; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3
; CHECK-NEXT: ret <4 x float> [[TMP4]]		; CHECK-NEXT: ret <4 x float> [[I3]]
;		;
%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0		%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%x0 = load float, float* %gep0		%x0 = load float, float* %gep0
%x1 = load float, float* %gep1		%x1 = load float, float* %gep1
%x2 = load float, float* %gep2		%x2 = load float, float* %gep2
%i0 = insertelement <4 x float> poison, float %x0, i32 0		%i0 = insertelement <4 x float> poison, float %x0, i32 0
▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	entry:
%shl10 = shl nuw i32 %conv9, 24		%shl10 = shl nuw i32 %conv9, 24
%or11 = or i32 %or7, %shl10		%or11 = or i32 %or7, %shl10
ret i32 %or11		ret i32 %or11
}		}

define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {		define <4 x float> @PR16739_byref(<4 x float>* nocapture readonly dereferenceable(16) %x) {
; CHECK-LABEL: @PR16739_byref(		; CHECK-LABEL: @PR16739_byref(
; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0		; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X:%.*]], i64 0, i64 0
; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 1		; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds <4 x float>, <4 x float> [[X]], i64 0, i64 2
; CHECK-NEXT: [[X0:%.]] = load float, float [[GEP0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP0]] to <2 x float>*
; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
; CHECK-NEXT: [[I0:%.*]] = insertelement <4 x float> undef, float [[X0]], i32 0		; CHECK-NEXT: [[X2:%.]] = load float, float [[GEP2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[I21:%.*]] = shufflevector <4 x float> [[I0]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>		; CHECK-NEXT: [[I2:%.*]] = insertelement <4 x float> [[TMP3]], float [[X2]], i32 2
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[I21]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 2, i32 5>		; CHECK-NEXT: [[I3:%.*]] = insertelement <4 x float> [[I2]], float [[X2]], i32 3
; CHECK-NEXT: ret <4 x float> [[TMP4]]		; CHECK-NEXT: ret <4 x float> [[I3]]
;		;
%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0		%gep0 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 0
%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1		%gep1 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 1
%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2		%gep2 = getelementptr inbounds <4 x float>, <4 x float>* %x, i64 0, i64 2
%x0 = load float, float* %gep0		%x0 = load float, float* %gep0
%x1 = load float, float* %gep1		%x1 = load float, float* %gep1
%x2 = load float, float* %gep2		%x2 = load float, float* %gep2
%i0 = insertelement <4 x float> undef, float %x0, i32 0		%i0 = insertelement <4 x float> undef, float %x0, i32 0
▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/malformed_phis.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer < %s \| FileCheck %s
	; RUN: opt -S -passes=slp-vectorizer < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer < %s \| FileCheck %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Make sure we do not generate malformed phis not in the beginning of block.			; Make sure we do not generate malformed phis not in the beginning of block.
	define void @test() #0 {			define void @test() #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB1]] ], [ undef, [[BB:%.]] ]			; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB1]] ], [ undef, [[BB:%.]] ]
	; CHECK-NEXT: [[TMP2:%.]] = phi i32 [ [[OP_RDX:%.]], [[BB1]] ], [ undef, [[BB]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi i32 [ [[OP_RDX:%.]], [[BB1]] ], [ undef, [[BB]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[TMP]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[TMP]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[TMP]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP0]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[TMP]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.vector.reduce.mul.v16i32(<16 x i32> [[SHUFFLE]])
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[TMP]], i32 3			; CHECK-NEXT: [[OP_RDX]] = mul i32 [[TMP1]], undef
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[TMP]], i32 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> [[TMP4]], i32 [[TMP]], i32 5
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 [[TMP]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[TMP]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[TMP]], i32 8
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[TMP]], i32 9
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[TMP]], i32 10
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[TMP]], i32 11
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 [[TMP]], i32 12
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[TMP]], i32 13
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <16 x i32> [[TMP13]], i32 [[TMP]], i32 14
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> [[TMP14]], i32 [[TMP]], i32 15
	; CHECK-NEXT: [[TMP16:%.*]] = call i32 @llvm.vector.reduce.mul.v16i32(<16 x i32> [[TMP15]])
	; CHECK-NEXT: [[OP_RDX]] = mul i32 [[TMP16]], undef
	; CHECK-NEXT: br label [[BB1]]			; CHECK-NEXT: br label [[BB1]]
	;			;
	bb:			bb:
	br label %bb1			br label %bb1

	bb1: ; preds = %bb1, %bb			bb1: ; preds = %bb1, %bb
	%tmp = phi i32 [ undef, %bb1 ], [ undef, %bb ]			%tmp = phi i32 [ undef, %bb1 ], [ undef, %bb ]
	%tmp2 = phi i32 [ %tmp18, %bb1 ], [ undef, %bb ]			%tmp2 = phi i32 [ %tmp18, %bb1 ], [ undef, %bb ]
	Show All 19 Lines
	define void @test_2(i8 addrspace(1)* %arg, i32 %arg1) #0 {			define void @test_2(i8 addrspace(1)* %arg, i32 %arg1) #0 {
	; CHECK-LABEL: @test_2(			; CHECK-LABEL: @test_2(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB:%.]] ], [ undef, [[BB2]] ]			; CHECK-NEXT: [[TMP:%.]] = phi i32 [ undef, [[BB:%.]] ], [ undef, [[BB2]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = phi i32 [ 0, [[BB]] ], [ undef, [[BB2]] ]			; CHECK-NEXT: [[TMP3:%.*]] = phi i32 [ 0, [[BB]] ], [ undef, [[BB2]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[TMP]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> [[TMP0]], i32 [[TMP]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> undef)
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[TMP]], i32 3			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[SHUFFLE]])
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP]], i32 4			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[TMP]], i32 5
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP]], i32 6
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP]], i32 7
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> undef)
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP7]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP8]], [[TMP9]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], undef			; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], undef
	; CHECK-NEXT: call void @use(i32 [[OP_RDX1]])			; CHECK-NEXT: call void @use(i32 [[OP_RDX1]])
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	;			;
	bb:			bb:
	br label %bb2			br label %bb2

	bb2: ; preds = %bb2, %bb			bb2: ; preds = %bb2, %bb
	▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

	Show All 13 Lines
	; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]			; CHECK-NEXT: [[CONV31_I:%.*]] = and i32 undef, [[SUB_I]]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV31_I]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[SHUFFLE]], <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>			; CHECK-NEXT: [[TMP2:%.*]] = lshr <8 x i32> [[SHUFFLE]], <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[CONV31_I]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[CONV31_I]], i32 0
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[SHUFFLE1]], <i32 9, i32 10, i32 11, i32 12>			; CHECK-NEXT: [[TMP4:%.*]] = lshr <4 x i32> [[SHUFFLE1]], <i32 9, i32 10, i32 11, i32 12>
	; CHECK-NEXT: [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13			; CHECK-NEXT: [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[CONV31_I]], i32 0			; CHECK-NEXT: [[SHR_13_I_I:%.*]] = lshr i32 [[CONV31_I]], 14
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[CONV31_I]], i32 1			; CHECK-NEXT: [[SHR_14_I_I:%.*]] = lshr i32 [[CONV31_I]], 15
	; CHECK-NEXT: [[TMP7:%.*]] = lshr <2 x i32> [[TMP6]], <i32 14, i32 15>			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> poison, i32 [[SUB_I]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <16 x i32> [[TMP8]], <16 x i32> [[TMP9]], <16 x i32> <i32 0, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <16 x i32> [[TMP7]], <16 x i32> [[TMP8]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 16, i32 17, i32 18, i32 19, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 16, i32 17, i32 18, i32 19, i32 13, i32 14, i32 15>			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[SHR_12_I_I]], i32 13
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[SHR_12_I_I]], i32 13			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[SHR_13_I_I]], i32 14
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <16 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 [[SHR_14_I_I]], i32 15
	; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <16 x i32> [[TMP13]], <16 x i32> [[TMP14]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 16, i32 17>			; CHECK-NEXT: [[TMP13:%.*]] = trunc <16 x i32> [[TMP12]] to <16 x i8>
	; CHECK-NEXT: [[TMP16:%.*]] = trunc <16 x i32> [[TMP15]] to <16 x i8>			; CHECK-NEXT: [[TMP14:%.*]] = and <16 x i8> [[TMP13]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP17:%.*]] = and <16 x i8> [[TMP16]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; CHECK-NEXT: [[TMP15:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*
	; CHECK-NEXT: [[TMP18:%.]] = bitcast i8 [[TMP0]] to <16 x i8>*			; CHECK-NEXT: store <16 x i8> [[TMP14]], <16 x i8>* [[TMP15]], align 1
	; CHECK-NEXT: store <16 x i8> [[TMP17]], <16 x i8>* [[TMP18]], align 1
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end50.i:			; CHECK: if.end50.i:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.end50.i, label %if.then22.i			br i1 undef, label %if.end50.i, label %if.then22.i

	if.then22.i: ; preds = %entry			if.then22.i: ; preds = %entry
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve shuffles cost estimation where possible.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 433462

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-int.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/load-merge.ll

llvm/test/Transforms/SLPVectorizer/X86/malformed_phis.ll

llvm/test/Transforms/SLPVectorizer/X86/resched.ll

[SLP]Improve shuffles cost estimation where possible.
ClosedPublic