This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/7
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
extractelements-to-shuffle.ll
-
X86/
-
crash_clear_undefs.ll
-
hadd-inseltpoison.ll
-
hadd.ll
-
hsub-inseltpoison.ll
-
hsub.ll
-
reused-extractelements.ll

Differential D148855

[SLP]Improve tryToGatherExtractElements by using per-register analysis.
ClosedPublic

Authored by ABataev on Apr 20 2023, 4:04 PM.

Download Raw Diff

Details

Reviewers

RKSimon
vdmitrie
dmgreen

Commits

rGac254fc05598: [SLP]Improve tryToGatherExtractElements by using per-register analysis.
rG9dfdbd788707: [SLP]Improve tryToGatherExtractElements by using per-register analysis.
rG3e6d7c6d983d: [SLP]Improve tryToGatherExtractElements by using per-register analysis.
rG0a34aaedd8ec: [SLP]Improve tryToGatherExtractElements by using per-register analysis.

Summary

Currently tryToGatherExtractElements function analyzes the whole vector,
regrdless number of actual registers, used in this vector. It may
prevent some optimizations, because per-register analysis may allow to
simplify the final code by reusing more already emitted vectors and
better shuffles.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Apr 20 2023, 4:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 4:04 PM

Herald added subscribers: vporpo, hiraditya. · View Herald Transcript

ABataev requested review of this revision.Apr 20 2023, 4:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 4:04 PM

Herald added a subscriber: • pcwang-thead. · View Herald Transcript

Harbormaster completed remote builds in B227009: Diff 515520.Apr 20 2023, 4:38 PM

Rebase

Harbormaster completed remote builds in B227257: Diff 515835.Apr 21 2023, 12:26 PM

ABataev updated this revision to Diff 516820.Apr 25 2023, 8:38 AM

Rebase

Harbormaster completed remote builds in B228043: Diff 516820.Apr 25 2023, 9:10 AM

Rebase

Harbormaster completed remote builds in B229702: Diff 519092.May 3 2023, 9:27 AM

Rebase

Harbormaster completed remote builds in B229786: Diff 519213.May 3 2023, 1:40 PM

Rebase

Harbormaster completed remote builds in B230289: Diff 519923.May 5 2023, 11:00 AM

Rebase

Harbormaster completed remote builds in B231102: Diff 520999.May 10 2023, 8:44 AM

Rebase

Harbormaster completed remote builds in B232084: Diff 522298.May 15 2023, 1:36 PM

Rebase

Harbormaster completed remote builds in B233853: Diff 524684.May 23 2023, 7:21 AM

Rebase

Harbormaster completed remote builds in B235284: Diff 526604.May 30 2023, 8:17 AM

Ping!

Rebase

Harbormaster completed remote builds in B242240: Diff 535996.Jun 29 2023, 5:17 PM

Rebase, ping!

Herald added a subscriber: wangpc. · View Herald TranscriptJul 7 2023, 1:52 PM

Harbormaster completed remote builds in B243854: Diff 538254.Jul 7 2023, 4:01 PM

Rebase, ping!

Harbormaster completed remote builds in B244216: Diff 538738.Jul 10 2023, 12:51 PM

Rebase, ping!

Harbormaster completed remote builds in B245839: Diff 541011.Jul 17 2023, 11:11 AM

Rebase, ping!

Harbormaster completed remote builds in B248260: Diff 544358.Jul 26 2023, 11:41 AM

Rebase, ping!!!
Required to unify the cost model + non-power-2.

Harbormaster completed remote builds in B249490: Diff 546057.Aug 1 2023, 8:47 AM

Rebase, ping!

Harbormaster completed remote builds in B250877: Diff 547898.Aug 7 2023, 4:38 PM

Rebase, ping!

Harbormaster completed remote builds in B251661: Diff 548999.Aug 10 2023, 11:14 AM

Rebase, ping!

Harbormaster completed remote builds in B254089: Diff 552349.Aug 22 2023, 8:27 AM

Ping!

Rebase, ping!

rebase?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
557–558	Pull out NFC-ish refactors like this into pre-commits.

vdmitrie added inline comments.Oct 18 2023, 2:53 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
10013	Could you extend the description too please? How input values are interpreted? What are output of the routine? Mask seems to serve both ways in and out - that needs to be described too.
10042	Please add a comment describing intent of the code below
10460	Use ScalarTy ?

Rebase, address comments

RKSimon mentioned this in rG585da2651ff5: [SLP][X86] Regenerate hadd/hsub tests with full set of check-prefixes.Oct 26 2023, 6:40 AM

Rebase

Harbormaster completed remote builds in B257944: Diff 557901.Oct 26 2023, 11:21 AM

RKSimon added inline comments.Oct 27 2023, 4:26 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7225	Does enumerate make Idx + I references?

ABataev added inline comments.Oct 27 2023, 4:43 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7225	Will remove enumerate here, it is not needed. Yes, they both are refs

Rebase, remove enumerate where not required

Harbormaster completed remote builds in B257950: Diff 557911.Oct 27 2023, 8:32 AM

LGTM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
550	Add back this newline

This revision is now accepted and ready to land.Oct 30 2023, 9:50 AM

Closed by commit rG0a34aaedd8ec: [SLP]Improve tryToGatherExtractElements by using per-register analysis. (authored by ABataev). · Explain WhyNov 1 2023, 7:45 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG0a34aaedd8ec: [SLP]Improve tryToGatherExtractElements by using per-register analysis..

ABataev added a reverting change: rG6e8d957a228b: Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis.".Nov 1 2023, 8:52 AM

ABataev added a commit: rG3e6d7c6d983d: [SLP]Improve tryToGatherExtractElements by using per-register analysis..Nov 1 2023, 10:54 AM

This seems to have caused a misoptimization in ffmpeg for aarch64.

To reproduce, you can follow these steps, on aarch64 Linux:

$ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang --samples=$(pwd)/../fate-samples
$ make fate-rsync
$ make -j$(nproc) fate-vp9-00-quantizer-18

The misoptimized object file is libavcodec/vp9dsp_8bpp.o.

The standalone preprocessed input for that object file is available at https://martin.st/temp/vp9dsp_8bpp-preproc.c, you can reproduce the misoptimization with clang -target aarch64-linux-gnu -c -O3 vp9dsp_8bpp-preproc.c -o vp9dsp_8bpp.o.

Can you look into this, and possibly revert if fixing takes some time?

In D148855#4655968, @mstorsjo wrote:
This seems to have caused a misoptimization in ffmpeg for aarch64.

To reproduce, you can follow these steps, on aarch64 Linux:
$ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang --samples=$(pwd)/../fate-samples
$ make fate-rsync
$ make -j$(nproc) fate-vp9-00-quantizer-18
The misoptimized object file is libavcodec/vp9dsp_8bpp.o.

The standalone preprocessed input for that object file is available at https://martin.st/temp/vp9dsp_8bpp-preproc.c, you can reproduce the misoptimization with clang -target aarch64-linux-gnu -c -O3 vp9dsp_8bpp-preproc.c -o vp9dsp_8bpp.o.

Can you look into this, and possibly revert if fixing takes some time?

Hi, thanks for the report. Generally speaking, this change does he same, what InstCombiner does with extractelement/insertelement sequences. I'll check what's the cause. Do not know the reason yet, but most probably either TTI cost model problem, or codegen (lowering) problem.

In D148855#4655968, @mstorsjo wrote:
This seems to have caused a misoptimization in ffmpeg for aarch64.

To reproduce, you can follow these steps, on aarch64 Linux:
$ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang --samples=$(pwd)/../fate-samples
$ make fate-rsync
$ make -j$(nproc) fate-vp9-00-quantizer-18
The misoptimized object file is libavcodec/vp9dsp_8bpp.o.

The standalone preprocessed input for that object file is available at https://martin.st/temp/vp9dsp_8bpp-preproc.c, you can reproduce the misoptimization with clang -target aarch64-linux-gnu -c -O3 vp9dsp_8bpp-preproc.c -o vp9dsp_8bpp.o.

Can you look into this, and possibly revert if fixing takes some time?

Compared the output, llvm ir output actually becomes smaller. I cannot do perf run. Looks like the lowering does not do the good job, need to create a ticket against AARCH64 codegen to improve it

In D148855#4656008, @ABataev wrote:
In D148855#4655968, @mstorsjo wrote:
This seems to have caused a misoptimization in ffmpeg for aarch64.

To reproduce, you can follow these steps, on aarch64 Linux:
$ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang --samples=$(pwd)/../fate-samples
$ make fate-rsync
$ make -j$(nproc) fate-vp9-00-quantizer-18
The misoptimized object file is libavcodec/vp9dsp_8bpp.o.

The standalone preprocessed input for that object file is available at https://martin.st/temp/vp9dsp_8bpp-preproc.c, you can reproduce the misoptimization with clang -target aarch64-linux-gnu -c -O3 vp9dsp_8bpp-preproc.c -o vp9dsp_8bpp.o.

Can you look into this, and possibly revert if fixing takes some time?
Compared the output, llvm ir output actually becomes smaller. I cannot do perf run. Looks like the lowering does not do the good job, need to create a ticket against AARCH64 codegen to improve it

I'm not saying the code became slower - I'm saying the code no longer produces the right result.

I'll push a revert for now, to unbreak things.

In D148855#4656048, @mstorsjo wrote:
In D148855#4656008, @ABataev wrote:
In D148855#4655968, @mstorsjo wrote:
This seems to have caused a misoptimization in ffmpeg for aarch64.

To reproduce, you can follow these steps, on aarch64 Linux:
$ git clone https://github.com/ffmpeg/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang --samples=$(pwd)/../fate-samples
$ make fate-rsync
$ make -j$(nproc) fate-vp9-00-quantizer-18
The misoptimized object file is libavcodec/vp9dsp_8bpp.o.

The standalone preprocessed input for that object file is available at https://martin.st/temp/vp9dsp_8bpp-preproc.c, you can reproduce the misoptimization with clang -target aarch64-linux-gnu -c -O3 vp9dsp_8bpp-preproc.c -o vp9dsp_8bpp.o.

Can you look into this, and possibly revert if fixing takes some time?
Compared the output, llvm ir output actually becomes smaller. I cannot do perf run. Looks like the lowering does not do the good job, need to create a ticket against AARCH64 codegen to improve it
I'm not saying the code became slower - I'm saying the code no longer produces the right result.

I'll push a revert for now, to unbreak things.

Ah, ok, now I see. Ok, go ahead and revert it, I'll investigate it tomorrow.

mstorsjo added a reverting change: rG66152f4eed4d: Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis.".Nov 2 2023, 3:08 PM

ABataev added a commit: rG9dfdbd788707: [SLP]Improve tryToGatherExtractElements by using per-register analysis..Nov 3 2023, 10:44 AM

This causes asserts:

clang: /work/llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10082: Value *llvm::slpvectorizer::BoUpSLP::ShuffleInstructionBuilder::adjustExtracts(const TreeEntry *, MutableArrayRef<int>, unsigned int, bool &): Assertion `Part == 0 && "Expected firs part."' failed.

See https://bugs.chromium.org/p/chromium/issues/detail?id=1499846#c1 for a stand-alone reproducer.

I'll revert it for now.

hans added a reverting change: rG046c57e705e0: Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis.".Nov 6 2023, 4:57 AM

ABataev added a commit: rGac254fc05598: [SLP]Improve tryToGatherExtractElements by using per-register analysis..Nov 6 2023, 7:32 AM

we've bisected multiple test failures to this commit, trying to come up with a reduced repro

In D148855#4656934, @aeubanks wrote:

we've bisected multiple test failures to this commit, trying to come up with a reduced repro

Tanks for the report, please try to update the compiler, I fixed one bug in this patch already. If it still fails, please send the reproducer, will fix it ASAP

yeah, it repros with ToT after the reland

In D148855#4656940, @aeubanks wrote:

yeah, it repros with ToT after the reland

I mean, earlier today I landed another one fix, try to update the compiler and check if it still fails

In D148855#4656941, @ABataev wrote:

In D148855#4656940, @aeubanks wrote:

yeah, it repros with ToT after the reland

I mean, earlier today I landed another one fix, try to update the compiler and check if it still fails

ah ok, 95703642e3ef617275fd80b5316b05c5a09c6219 does fix one of the tests I was looking at, thanks

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

535 lines

test/

Transforms/

SLPVectorizer/

AArch64/

extractelements-to-shuffle.ll

135 lines

X86/

crash_clear_undefs.ll

2 lines

152 lines

152 lines

153 lines

153 lines

reused-extractelements.ll

23 lines

Diff 557957

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 541 Lines • ▼ Show 20 Lines	if (!CI)
return std::nullopt;		return std::nullopt;
return CI->getZExtValue();		return CI->getZExtValue();
}		}
auto *EI = cast<ExtractValueInst>(E);		auto *EI = cast<ExtractValueInst>(E);
if (EI->getNumIndices() != 1)		if (EI->getNumIndices() != 1)
return std::nullopt;		return std::nullopt;
return *EI->idx_begin();		return *EI->idx_begin();
}		}

RKSimonUnsubmitted Not Done Reply Inline Actions Add back this newline RKSimon: Add back this newline
/// Tries to find extractelement instructions with constant indices from fixed		/// Tries to find extractelement instructions with constant indices from fixed
/// vector type and gather such instructions into a bunch, which highly likely		/// vector type and gather such instructions into a bunch, which highly likely
/// might be detected as a shuffle of 1 or 2 input vectors. If this attempt was		/// might be detected as a shuffle of 1 or 2 input vectors. If this attempt was
/// successful, the matched scalars are replaced by poison values in \p VL for		/// successful, the matched scalars are replaced by poison values in \p VL for
/// future analysis.		/// future analysis.
static std::optional<TTI::ShuffleKind>		static std::optional<TTI::ShuffleKind>
tryToGatherSingleRegisterExtractElements(MutableArrayRef<Value *> VL,		tryToGatherSingleRegisterExtractElements(MutableArrayRef<Value *> VL,
SmallVectorImpl<int> &Mask) {		SmallVectorImpl<int> &Mask) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Pull out NFC-ish refactors like this into pre-commits. RKSimon: Pull out NFC-ish refactors like this into pre-commits.
// Scan list of gathered scalars for extractelements that can be represented		// Scan list of gathered scalars for extractelements that can be represented
// as shuffles.		// as shuffles.
MapVector<Value *, SmallVector<int>> VectorOpToIdx;		MapVector<Value *, SmallVector<int>> VectorOpToIdx;
SmallVector<int> UndefVectorExtracts;		SmallVector<int> UndefVectorExtracts;
for (int I = 0, E = VL.size(); I < E; ++I) {		for (int I = 0, E = VL.size(); I < E; ++I) {
auto *EI = dyn_cast<ExtractElementInst>(VL[I]);		auto *EI = dyn_cast<ExtractElementInst>(VL[I]);
if (!EI) {		if (!EI) {
if (isa<UndefValue>(VL[I]))		if (isa<UndefValue>(VL[I]))
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	for (int I = 0, E = GatheredExtracts.size(); I < E; ++I) {
if (!EI \|\| !isa<FixedVectorType>(EI->getVectorOperandType()) \|\|		if (!EI \|\| !isa<FixedVectorType>(EI->getVectorOperandType()) \|\|
!isa<ConstantInt, UndefValue>(EI->getIndexOperand()) \|\|		!isa<ConstantInt, UndefValue>(EI->getIndexOperand()) \|\|
is_contained(UndefVectorExtracts, I))		is_contained(UndefVectorExtracts, I))
continue;		continue;
}		}
return Res;		return Res;
}		}

		/// Tries to find extractelement instructions with constant indices from fixed
		/// vector type and gather such instructions into a bunch, which highly likely
		/// might be detected as a shuffle of 1 or 2 input vectors. If this attempt was
		/// successful, the matched scalars are replaced by poison values in \p VL for
		/// future analysis.
		static SmallVector<std::optional<TTI::ShuffleKind>>
		tryToGatherExtractElements(SmallVectorImpl<Value *> &VL,
		SmallVectorImpl<int> &Mask, unsigned NumParts) {
		assert(NumParts > 0 && "NumParts expected be greater than or equal to 1.");
		SmallVector<std::optional<TTI::ShuffleKind>> ShufflesRes(NumParts);
		Mask.assign(VL.size(), PoisonMaskElem);
		unsigned SliceSize = VL.size() / NumParts;
		for (unsigned Part = 0; Part < NumParts; ++Part) {
		// Scan list of gathered scalars for extractelements that can be represented
		// as shuffles.
		MutableArrayRef<Value *> SubVL =
		MutableArrayRef(VL).slice(Part * SliceSize, SliceSize);
		SmallVector<int> SubMask;
		std::optional<TTI::ShuffleKind> Res =
		tryToGatherSingleRegisterExtractElements(SubVL, SubMask);
		ShufflesRes[Part] = Res;
		copy(SubMask, std::next(Mask.begin(), Part * SliceSize));
		}
		if (none_of(ShufflesRes, [](const std::optional<TTI::ShuffleKind> &Res) {
		return Res.has_value();
		}))
		ShufflesRes.clear();
		return ShufflesRes;
		}

namespace {		namespace {

/// Main data required for vectorization of instructions.		/// Main data required for vectorization of instructions.
struct InstructionsState {		struct InstructionsState {
/// The very first instruction in the list with the main opcode.		/// The very first instruction in the list with the main opcode.
Value *OpValue = nullptr;		Value *OpValue = nullptr;

/// The main/alternate instruction.		/// The main/alternate instruction.
▲ Show 20 Lines • Show All 6,474 Lines • ▼ Show 20 Lines	if (VL.size() > 2 && S.getOpcode() == Instruction::Load &&
: TTI::TCC_Free);		: TTI::TCC_Free);
}		}
return GatherCost +		return GatherCost +
(all_of(Gathers, UndefValue::classof)		(all_of(Gathers, UndefValue::classof)
? TTI::TCC_Free		? TTI::TCC_Free
: R.getGatherCost(Gathers, !Root && VL.equals(Gathers)));		: R.getGatherCost(Gathers, !Root && VL.equals(Gathers)));
};		};

/// Compute the cost of creating a vector of type \p VecTy containing the		/// Compute the cost of creating a vector containing the extracted values from
/// extracted values from \p VL.		/// \p VL.
InstructionCost computeExtractCost(ArrayRef<Value *> VL, ArrayRef<int> Mask,		InstructionCost
TTI::ShuffleKind ShuffleKind) {		computeExtractCost(ArrayRef<Value *> VL, ArrayRef<int> Mask,
unsigned NumElts = 0;		ArrayRef<std::optional<TTI::ShuffleKind>> ShuffleKinds,
for (Value *V : VL) {		unsigned NumParts) {
		assert(VL.size() > NumParts && "Unexpected scalarized shuffle.");
		unsigned NumElts =
		std::accumulate(VL.begin(), VL.end(), 0, [](unsigned Sz, Value *V) {
auto *EE = dyn_cast<ExtractElementInst>(V);		auto *EE = dyn_cast<ExtractElementInst>(V);
if (!EE)		if (!EE)
continue;		return Sz;
auto *VecTy = cast<FixedVectorType>(EE->getVectorOperandType());		auto *VecTy = cast<FixedVectorType>(EE->getVectorOperandType());
NumElts = std::max(NumElts, VecTy->getNumElements());		return std::max(Sz, VecTy->getNumElements());
}		});
assert(NumElts > 0 &&		unsigned NumSrcRegs = TTI.getNumberOfParts(
"Expected at least 1-element fixed length vector(s).");		FixedVectorType::get(VL.front()->getType(), NumElts));
auto *VecTy = FixedVectorType::get(VL.front()->getType(), NumElts);		if (NumSrcRegs == 0)
unsigned NumOfParts = TTI.getNumberOfParts(VecTy);		NumSrcRegs = 1;
if (!NumOfParts \|\| NumElts < NumOfParts)		// FIXME: this must be moved to TTI for better estimation.
return TTI.getShuffleCost(ShuffleKind, VecTy, Mask);		unsigned EltsPerVector = PowerOf2Ceil(std::max(
unsigned EltsPerVector = PowerOf2Ceil(divideCeil(NumElts, NumOfParts));		divideCeil(VL.size(), NumParts), divideCeil(NumElts, NumSrcRegs)));
int ValNum = -1;		auto CheckPerRegistersShuffle =
int ValIdx = -1;		[&](MutableArrayRef<int> Mask) -> std::optional<TTI::ShuffleKind> {
// Check that if trying to permute 2 input vectors (which may result in		DenseSet<int> RegIndices;
// several vector registers), each per-register subvector is the result of		// Check that if trying to permute same single/2 input vectors.
// the permutation of 2 single registers.		TTI::ShuffleKind ShuffleKind = TTI::SK_PermuteSingleSrc;
if (ShuffleKind != TargetTransformInfo::SK_PermuteSingleSrc &&		int FirstRegId = -1;
!all_of(enumerate(Mask), [&](auto &&Arg) {		for (int &I : Mask) {
if (Arg.value() == PoisonMaskElem)		if (I == PoisonMaskElem)
return true;		continue;
int CurValNum = (Arg.value() % NumElts) / EltsPerVector;		int RegId = (I / NumElts) * NumParts + (I % NumElts) / EltsPerVector;
int CurValIdx = Arg.index() / EltsPerVector;		if (FirstRegId < 0)
if (ValIdx != CurValIdx) {		FirstRegId = RegId;
ValIdx = CurValIdx;		RegIndices.insert(RegId);
ValNum = CurValNum;		if (RegIndices.size() > 2)
return true;		return std::nullopt;
		if (RegIndices.size() == 2)
		ShuffleKind = TTI::SK_PermuteTwoSrc;
		I = (I % NumElts) % EltsPerVector +
		(RegId == FirstRegId ? 0 : EltsPerVector);
		RKSimonUnsubmitted Not Done Reply Inline Actions Does enumerate make Idx + I references? RKSimon: Does enumerate make Idx + I references?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will remove enumerate here, it is not needed. Yes, they both are refs ABataev: Will remove enumerate here, it is not needed. Yes, they both are refs
}		}
return CurValNum == ValNum;		return ShuffleKind;
}))		};
return TTI.getShuffleCost(ShuffleKind, VecTy, Mask);

InstructionCost Cost = 0;		InstructionCost Cost = 0;

// Process extracts in blocks of EltsPerVector to check if the source vector		// Process extracts in blocks of EltsPerVector to check if the source vector
// operand can be re-used directly. If not, add the cost of creating a		// operand can be re-used directly. If not, add the cost of creating a
// shuffle to extract the values into a vector register.		// shuffle to extract the values into a vector register.
auto *RegisterVecTy =		for (unsigned Part = 0; Part < NumParts; ++Part) {
FixedVectorType::get(VL.front()->getType(), EltsPerVector);		if (!ShuffleKinds[Part])
SmallVector<int> RegMask(EltsPerVector, PoisonMaskElem);
TTI::ShuffleKind RegisterSK = TargetTransformInfo::SK_PermuteSingleSrc;
Value *VecBase = nullptr;
bool IsIdentity = true;
for (auto [Idx, V] : enumerate(VL)) {
// Reached the start of a new vector registers.
if (Idx % EltsPerVector == 0) {
RegMask.assign(EltsPerVector, PoisonMaskElem);
RegisterSK = TargetTransformInfo::SK_PermuteSingleSrc;
VecBase = nullptr;
}

// Need to exclude undefs from analysis.
if (isa<UndefValue>(V) \|\| Mask[Idx] == PoisonMaskElem)
continue;

// Check all extracts for a vector register on the target directly
// extract values in order.
unsigned CurrentIdx = *getExtractIndex(cast<Instruction>(V));
unsigned PrevIdx = CurrentIdx;
if (Idx % EltsPerVector != 0 && !isa<UndefValue>(VL[Idx - 1]) &&
Mask[Idx - 1] != PoisonMaskElem)
PrevIdx = *getExtractIndex(cast<Instruction>(VL[Idx - 1])) + 1;
if (!VecBase) {
VecBase = cast<ExtractElementInst>(V)->getVectorOperand();
RegMask[Idx % EltsPerVector] = CurrentIdx % EltsPerVector;
IsIdentity = CurrentIdx % EltsPerVector == Idx % EltsPerVector;
} else if (VecBase != cast<ExtractElementInst>(V)->getVectorOperand()) {
IsIdentity = false;
RegisterSK = TargetTransformInfo::SK_PermuteTwoSrc;
RegMask[Idx % EltsPerVector] =
CurrentIdx % EltsPerVector + EltsPerVector;
} else {
IsIdentity &= PrevIdx == CurrentIdx &&
CurrentIdx % EltsPerVector == Idx % EltsPerVector;
RegMask[Idx % EltsPerVector] = CurrentIdx % EltsPerVector;
}

if (IsIdentity)
continue;

// Skip all indices, except for the last index per vector block.
if ((Idx + 1) % EltsPerVector != 0 && Idx + 1 != VL.size())
continue;		continue;
		ArrayRef<int> MaskSlice =
// If we have a series of extracts which are not consecutive and hence		Mask.slice(Part * EltsPerVector,
// cannot re-use the source vector register directly, compute the shuffle		(Part == NumParts - 1 && Mask.size() % EltsPerVector != 0)
// cost to extract the vector with EltsPerVector elements.		? Mask.size() % EltsPerVector
Cost += TTI.getShuffleCost(RegisterSK, RegisterVecTy, RegMask);		: EltsPerVector);
		SmallVector<int> SubMask(EltsPerVector, PoisonMaskElem);
		copy(MaskSlice, SubMask.begin());
		std::optional<TTI::ShuffleKind> RegShuffleKind =
		CheckPerRegistersShuffle(SubMask);
		if (!RegShuffleKind) {
		Cost += TTI.getShuffleCost(
		*ShuffleKinds[Part],
		FixedVectorType::get(VL.front()->getType(), NumElts), MaskSlice);
		continue;
		}
		if (*RegShuffleKind != TTI::SK_PermuteSingleSrc \|\|
		!ShuffleVectorInst::isIdentityMask(SubMask, EltsPerVector)) {
		Cost += TTI.getShuffleCost(
		*RegShuffleKind,
		FixedVectorType::get(VL.front()->getType(), EltsPerVector),
		SubMask);
		}
}		}
return Cost;		return Cost;
}		}
/// Transforms mask \p CommonMask per given \p Mask to make proper set after		/// Transforms mask \p CommonMask per given \p Mask to make proper set after
/// shuffle emission.		/// shuffle emission.
static void transformMaskAfterShuffle(MutableArrayRef<int> CommonMask,		static void transformMaskAfterShuffle(MutableArrayRef<int> CommonMask,
ArrayRef<int> Mask) {		ArrayRef<int> Mask) {
for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)		for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
}		}

public:		public:
ShuffleCostEstimator(TargetTransformInfo &TTI,		ShuffleCostEstimator(TargetTransformInfo &TTI,
ArrayRef<Value *> VectorizedVals, BoUpSLP &R,		ArrayRef<Value *> VectorizedVals, BoUpSLP &R,
SmallPtrSetImpl<Value *> &CheckedExtracts)		SmallPtrSetImpl<Value *> &CheckedExtracts)
: TTI(TTI), VectorizedVals(VectorizedVals.begin(), VectorizedVals.end()),		: TTI(TTI), VectorizedVals(VectorizedVals.begin(), VectorizedVals.end()),
R(R), CheckedExtracts(CheckedExtracts) {}		R(R), CheckedExtracts(CheckedExtracts) {}
Value adjustExtracts(const TreeEntry E, ArrayRef<int> Mask,		Value adjustExtracts(const TreeEntry E, MutableArrayRef<int> Mask,
TTI::ShuffleKind ShuffleKind) {		ArrayRef<std::optional<TTI::ShuffleKind>> ShuffleKinds,
		unsigned NumParts) {
if (Mask.empty())		if (Mask.empty())
return nullptr;		return nullptr;
Value *VecBase = nullptr;		Value *VecBase = nullptr;
ArrayRef<Value *> VL = E->Scalars;		ArrayRef<Value *> VL = E->Scalars;
auto *VecTy = FixedVectorType::get(VL.front()->getType(), VL.size());
// If the resulting type is scalarized, do not adjust the cost.		// If the resulting type is scalarized, do not adjust the cost.
unsigned VecNumParts = TTI.getNumberOfParts(VecTy);		if (NumParts == VL.size())
if (VecNumParts == VecTy->getNumElements())
return nullptr;		return nullptr;
DenseMap<Value *, int> ExtractVectorsTys;		// Check if it can be considered reused if same extractelements were
for (auto [I, V] : enumerate(VL)) {		// vectorized already.
		bool PrevNodeFound = any_of(
		ArrayRef(R.VectorizableTree).take_front(E->Idx),
		[&](const std::unique_ptr<TreeEntry> &TE) {
		return ((!TE->isAltShuffle() &&
		TE->getOpcode() == Instruction::ExtractElement) \|\|
		TE->State == TreeEntry::NeedToGather) &&
		all_of(enumerate(TE->Scalars), [&](auto &&Data) {
		return VL.size() > Data.index() &&
		(Mask[Data.index()] == PoisonMaskElem \|\|
		isa<UndefValue>(VL[Data.index()]) \|\|
		Data.value() == VL[Data.index()]);
		});
		});
		unsigned SliceSize = VL.size() / NumParts;
		for (unsigned Part = 0; Part < NumParts; ++Part) {
		ArrayRef<int> SubMask = Mask.slice(Part * SliceSize, SliceSize);
		for (auto [I, V] : enumerate(VL.slice(Part * SliceSize, SliceSize))) {
// Ignore non-extractelement scalars.		// Ignore non-extractelement scalars.
if (isa<UndefValue>(V) \|\| (!Mask.empty() && Mask[I] == PoisonMaskElem))		if (isa<UndefValue>(V) \|\|
		(!SubMask.empty() && SubMask[I] == PoisonMaskElem))
continue;		continue;
// If all users of instruction are going to be vectorized and this		// If all users of instruction are going to be vectorized and this
// instruction itself is not going to be vectorized, consider this		// instruction itself is not going to be vectorized, consider this
// instruction as dead and remove its cost from the final cost of the		// instruction as dead and remove its cost from the final cost of the
// vectorized tree.		// vectorized tree.
// Also, avoid adjusting the cost for extractelements with multiple uses		// Also, avoid adjusting the cost for extractelements with multiple uses
// in different graph entries.		// in different graph entries.
const TreeEntry *VE = R.getTreeEntry(V);		const TreeEntry *VE = R.getTreeEntry(V);
if (!CheckedExtracts.insert(V).second \|\|		if (!CheckedExtracts.insert(V).second \|\|
!R.areAllUsersVectorized(cast<Instruction>(V), &VectorizedVals) \|\|		!R.areAllUsersVectorized(cast<Instruction>(V), &VectorizedVals) \|\|
(VE && VE != E))		(VE && VE != E))
continue;		continue;
auto *EE = cast<ExtractElementInst>(V);		auto *EE = cast<ExtractElementInst>(V);
VecBase = EE->getVectorOperand();		VecBase = EE->getVectorOperand();
std::optional<unsigned> EEIdx = getExtractIndex(EE);		std::optional<unsigned> EEIdx = getExtractIndex(EE);
if (!EEIdx)		if (!EEIdx)
continue;		continue;
unsigned Idx = *EEIdx;		unsigned Idx = *EEIdx;
if (VecNumParts != TTI.getNumberOfParts(EE->getVectorOperandType())) {
auto It =
ExtractVectorsTys.try_emplace(EE->getVectorOperand(), Idx).first;
It->getSecond() = std::min<int>(It->second, Idx);
}
// Take credit for instruction that will become dead.		// Take credit for instruction that will become dead.
if (EE->hasOneUse()) {		if (EE->hasOneUse() \|\| !PrevNodeFound) {
Instruction *Ext = EE->user_back();		Instruction *Ext = EE->user_back();
if (isa<SExtInst, ZExtInst>(Ext) && all_of(Ext->users(), [](User *U) {		if (isa<SExtInst, ZExtInst>(Ext) && all_of(Ext->users(), [](User *U) {
return isa<GetElementPtrInst>(U);		return isa<GetElementPtrInst>(U);
})) {		})) {
// Use getExtractWithExtendCost() to calculate the cost of		// Use getExtractWithExtendCost() to calculate the cost of
// extractelement/ext pair.		// extractelement/ext pair.
Cost -= TTI.getExtractWithExtendCost(Ext->getOpcode(), Ext->getType(),		Cost -=
		TTI.getExtractWithExtendCost(Ext->getOpcode(), Ext->getType(),
EE->getVectorOperandType(), Idx);		EE->getVectorOperandType(), Idx);
// Add back the cost of s\|zext which is subtracted separately.		// Add back the cost of s\|zext which is subtracted separately.
Cost += TTI.getCastInstrCost(		Cost += TTI.getCastInstrCost(
Ext->getOpcode(), Ext->getType(), EE->getType(),		Ext->getOpcode(), Ext->getType(), EE->getType(),
TTI::getCastContextHint(Ext), CostKind, Ext);		TTI::getCastContextHint(Ext), CostKind, Ext);
continue;		continue;
}		}
}		}
Cost -= TTI.getVectorInstrCost(*EE, EE->getVectorOperandType(), CostKind,		Cost -= TTI.getVectorInstrCost(*EE, EE->getVectorOperandType(),
Idx);		CostKind, Idx);
}
// Add a cost for subvector extracts/inserts if required.
for (const auto &Data : ExtractVectorsTys) {
auto *EEVTy = cast<FixedVectorType>(Data.first->getType());
unsigned NumElts = VecTy->getNumElements();
if (Data.second % NumElts == 0)
continue;
if (TTI.getNumberOfParts(EEVTy) > VecNumParts) {
unsigned Idx = (Data.second / NumElts) * NumElts;
unsigned EENumElts = EEVTy->getNumElements();
if (Idx % NumElts == 0)
continue;
if (Idx + NumElts <= EENumElts) {
Cost += TTI.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
EEVTy, std::nullopt, CostKind, Idx, VecTy);
} else {
// Need to round up the subvector type vectorization factor to avoid a
// crash in cost model functions. Make SubVT so that Idx + VF of SubVT
// <= EENumElts.
auto *SubVT =
FixedVectorType::get(VecTy->getElementType(), EENumElts - Idx);
Cost += TTI.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
EEVTy, std::nullopt, CostKind, Idx, SubVT);
}
} else {
Cost += TTI.getShuffleCost(TargetTransformInfo::SK_InsertSubvector,
VecTy, std::nullopt, CostKind, 0, EEVTy);
}		}
}		}
// Check that gather of extractelements can be represented as just a		// Check that gather of extractelements can be represented as just a
// shuffle of a single/two vectors the scalars are extracted from.		// shuffle of a single/two vectors the scalars are extracted from.
// Found the bunch of extractelement instructions that must be gathered		// Found the bunch of extractelement instructions that must be gathered
// into a vector and can be represented as a permutation elements in a		// into a vector and can be represented as a permutation elements in a
// single input vector or of 2 input vectors.		// single input vector or of 2 input vectors.
Cost += computeExtractCost(VL, Mask, ShuffleKind);		// Done for reused if same extractelements were vectorized already.
		if (!PrevNodeFound)
		Cost += computeExtractCost(VL, Mask, ShuffleKinds, NumParts);
InVectors.assign(1, E);		InVectors.assign(1, E);
CommonMask.assign(Mask.begin(), Mask.end());		CommonMask.assign(Mask.begin(), Mask.end());
transformMaskAfterShuffle(CommonMask, CommonMask);		transformMaskAfterShuffle(CommonMask, CommonMask);
SameNodesEstimated = false;		SameNodesEstimated = false;
return VecBase;		return VecBase;
}		}
void add(const TreeEntry &E1, const TreeEntry &E2, ArrayRef<int> Mask) {		void add(const TreeEntry &E1, const TreeEntry &E2, ArrayRef<int> Mask) {
if (&E1 == &E2) {		if (&E1 == &E2) {
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	return Cost +
InVectors.size() == 2 ? InVectors.back() : nullptr,		InVectors.size() == 2 ? InVectors.back() : nullptr,
CommonMask);		CommonMask);
}		}

~ShuffleCostEstimator() {		~ShuffleCostEstimator() {
assert((IsFinalized \|\| CommonMask.empty()) &&		assert((IsFinalized \|\| CommonMask.empty()) &&
"Shuffle construction must be finalized.");		"Shuffle construction must be finalized.");
}		}
};		};

InstructionCost		InstructionCost
BoUpSLP::getEntryCost(const TreeEntry E, ArrayRef<Value > VectorizedVals,		BoUpSLP::getEntryCost(const TreeEntry E, ArrayRef<Value > VectorizedVals,
SmallPtrSetImpl<Value *> &CheckedExtracts) {		SmallPtrSetImpl<Value *> &CheckedExtracts) {
ArrayRef<Value *> VL = E->Scalars;		ArrayRef<Value *> VL = E->Scalars;

Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (E->State != TreeEntry::NeedToGather) {		if (E->State != TreeEntry::NeedToGather) {
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (E->State == TreeEntry::NeedToGather) {
// Build a mask out of the reorder indices and reorder scalars per this		// Build a mask out of the reorder indices and reorder scalars per this
// mask.		// mask.
SmallVector<int> ReorderMask;		SmallVector<int> ReorderMask;
inversePermutation(E->ReorderIndices, ReorderMask);		inversePermutation(E->ReorderIndices, ReorderMask);
if (!ReorderMask.empty())		if (!ReorderMask.empty())
reorderScalars(GatheredScalars, ReorderMask);		reorderScalars(GatheredScalars, ReorderMask);
SmallVector<int> Mask;		SmallVector<int> Mask;
SmallVector<int> ExtractMask;		SmallVector<int> ExtractMask;
std::optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;
SmallVector<std::optional<TargetTransformInfo::ShuffleKind>> GatherShuffles;		SmallVector<std::optional<TargetTransformInfo::ShuffleKind>> GatherShuffles;
SmallVector<SmallVector<const TreeEntry *>> Entries;		SmallVector<SmallVector<const TreeEntry *>> Entries;
		SmallVector<std::optional<TTI::ShuffleKind>> ExtractShuffles;
// Check for gathered extracts.		// Check for gathered extracts.
ExtractShuffle =
tryToGatherSingleRegisterExtractElements(GatheredScalars, ExtractMask);

bool Resized = false;		bool Resized = false;
unsigned NumParts = TTI->getNumberOfParts(VecTy);		unsigned NumParts = TTI->getNumberOfParts(VecTy);
if (NumParts == 0 \|\| NumParts >= GatheredScalars.size())		if (NumParts == 0 \|\| NumParts >= GatheredScalars.size())
NumParts = 1;		NumParts = 1;
		if (!all_of(GatheredScalars, UndefValue::classof)) {
		ExtractShuffles =
		tryToGatherExtractElements(GatheredScalars, ExtractMask, NumParts);
		if (!ExtractShuffles.empty()) {
if (Value *VecBase = Estimator.adjustExtracts(		if (Value *VecBase = Estimator.adjustExtracts(
E, ExtractMask, ExtractShuffle.value_or(TTI::SK_PermuteTwoSrc))) {		E, ExtractMask, ExtractShuffles, NumParts)) {
if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))		if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))
if (VF == VecBaseTy->getNumElements() && GatheredScalars.size() != VF) {		if (VF == VecBaseTy->getNumElements() &&
		GatheredScalars.size() != VF) {
Resized = true;		Resized = true;
GatheredScalars.append(VF - GatheredScalars.size(),		GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));		PoisonValue::get(ScalarTy));
}		}
} else if (ExtractShuffle &&		}
TTI->getNumberOfParts(VecTy) == VecTy->getNumElements()) {
copy(VL, GatheredScalars.begin());
}		}

// Do not try to look for reshuffled loads for gathered loads (they will be		// Do not try to look for reshuffled loads for gathered loads (they will
// handled later), for vectorized scalars, and cases, which are definitely		// be handled later), for vectorized scalars, and cases, which are
// not profitable (splats and small gather nodes.)		// definitely not profitable (splats and small gather nodes.)
if (ExtractShuffle \|\| E->getOpcode() != Instruction::Load \|\|		if (!ExtractShuffles.empty() \|\| E->getOpcode() != Instruction::Load \|\|
E->isAltShuffle() \|\|		E->isAltShuffle() \|\|
all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|		all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|
isSplat(E->Scalars) \|\|		isSplat(E->Scalars) \|\|
(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2))		(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2))
GatherShuffles =		GatherShuffles =
isGatherShuffledEntry(E, GatheredScalars, Mask, Entries, NumParts);		isGatherShuffledEntry(E, GatheredScalars, Mask, Entries, NumParts);
		}
if (!GatherShuffles.empty()) {		if (!GatherShuffles.empty()) {
if (GatherShuffles.size() == 1 &&		if (GatherShuffles.size() == 1 &&
*GatherShuffles.front() == TTI::SK_PermuteSingleSrc &&		*GatherShuffles.front() == TTI::SK_PermuteSingleSrc &&
Entries.front().front()->isSame(E->Scalars)) {		Entries.front().front()->isSame(E->Scalars)) {
// Perfect match in the graph, will reuse the previously vectorized		// Perfect match in the graph, will reuse the previously vectorized
// node. Cost is 0.		// node. Cost is 0.
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
▲ Show 20 Lines • Show All 2,224 Lines • ▼ Show 20 Lines	for (unsigned Idx = 0, Sz = CommonMask.size(); Idx < Sz; ++Idx)
if (Mask[Idx] != PoisonMaskElem)		if (Mask[Idx] != PoisonMaskElem)
CommonMask[Idx] = Idx;		CommonMask[Idx] = Idx;
}		}

public:		public:
ShuffleInstructionBuilder(IRBuilderBase &Builder, BoUpSLP &R)		ShuffleInstructionBuilder(IRBuilderBase &Builder, BoUpSLP &R)
: Builder(Builder), R(R) {}		: Builder(Builder), R(R) {}

/// Adjusts extractelements after reusing them.		/// Adjusts extractelements after reusing them.
		vdmitrieUnsubmitted Not Done Reply Inline Actions Could you extend the description too please? How input values are interpreted? What are output of the routine? Mask seems to serve both ways in and out - that needs to be described too. vdmitrie: Could you extend the description too please? How input values are interpreted? What are output…
Value adjustExtracts(const TreeEntry E, ArrayRef<int> Mask) {		Value adjustExtracts(const TreeEntry E, MutableArrayRef<int> Mask,
		unsigned NumParts, bool &UseVecBaseAsInput) {
		UseVecBaseAsInput = false;
		SmallPtrSet<Value *, 4> UniqueBases;
Value *VecBase = nullptr;		Value *VecBase = nullptr;
for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {		for (int I = 0, Sz = Mask.size(); I < Sz; ++I) {
int Idx = Mask[I];		int Idx = Mask[I];
if (Idx == PoisonMaskElem)		if (Idx == PoisonMaskElem)
continue;		continue;
auto *EI = cast<ExtractElementInst>(E->Scalars[I]);		auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
VecBase = EI->getVectorOperand();		VecBase = EI->getVectorOperand();
		UniqueBases.insert(VecBase);
// If the only one use is vectorized - can delete the extractelement		// If the only one use is vectorized - can delete the extractelement
// itself.		// itself.
if (!EI->hasOneUse() \|\| any_of(EI->users(), [&](User *U) {		if (!EI->hasOneUse() \|\| any_of(EI->users(), [&](User *U) {
return !R.ScalarToTreeEntry.count(U);		return !R.ScalarToTreeEntry.count(U);
}))		}))
continue;		continue;
R.eraseInstruction(EI);		R.eraseInstruction(EI);
}		}
		if (NumParts == 1 \|\| UniqueBases.size() == 1)
return VecBase;		return VecBase;
		UseVecBaseAsInput = true;
		auto TransformToIdentity = [](MutableArrayRef<int> Mask) {
		for (auto [I, Idx] : enumerate(Mask))
		if (Idx != PoisonMaskElem)
		Idx = I;
		};
		// Perform multi-register vector shuffle, joining them into a single virtual
		vdmitrieUnsubmitted Not Done Reply Inline Actions Please add a comment describing intent of the code below vdmitrie: Please add a comment describing intent of the code below
		// long vector.
		// Need to shuffle each part independently and then insert all this parts
		// into a long virtual vector register, forming the original vector.
		Value *Vec = nullptr;
		SmallVector<int> VecMask(Mask.size(), PoisonMaskElem);
		unsigned SliceSize = E->Scalars.size() / NumParts;
		for (unsigned Part = 0; Part < NumParts; ++Part) {
		ArrayRef<Value *> VL =
		ArrayRef(E->Scalars).slice(Part * SliceSize, SliceSize);
		MutableArrayRef<int> SubMask = Mask.slice(Part * SliceSize, SliceSize);
		constexpr int MaxBases = 2;
		SmallVector<Value *, MaxBases> Bases(MaxBases);
		#ifndef NDEBUG
		int PrevSize = 0;
		#endif // NDEBUG
		for (const auto [I, V]: enumerate(VL)) {
		if (SubMask[I] == PoisonMaskElem)
		continue;
		Value *VecOp = cast<ExtractElementInst>(V)->getVectorOperand();
		const int Size =
		cast<FixedVectorType>(VecOp->getType())->getNumElements();
		#ifndef NDEBUG
		assert((PrevSize == Size \|\| PrevSize == 0) &&
		"Expected vectors of the same size.");
		PrevSize = Size;
		#endif // NDEBUG
		Bases[SubMask[I] < Size ? 0 : 1] = VecOp;
		}
		if (!Bases.front())
		continue;
		Value *SubVec;
		if (Bases.back()) {
		SubVec = createShuffle(Bases.front(), Bases.back(), SubMask);
		TransformToIdentity(SubMask);
		} else {
		SubVec = Bases.front();
		}
		if (!Vec) {
		Vec = SubVec;
		copy(SubMask, VecMask.begin());
		} else {
		unsigned VF = cast<FixedVectorType>(Vec->getType())->getNumElements();
		if (Vec->getType() != SubVec->getType()) {
		unsigned SubVecVF =
		cast<FixedVectorType>(SubVec->getType())->getNumElements();
		if (VF < SubVecVF)
		TransformToIdentity(VecMask);
		VF = std::max(VF, SubVecVF);
		}
		// Adjust SubMask.
		for (auto [I, Idx] : enumerate(SubMask))
		if (Idx != PoisonMaskElem)
		Idx += VF;
		copy(SubMask, std::next(VecMask.begin(), Part * SliceSize));
		Vec = createShuffle(Vec, SubVec, VecMask);
		TransformToIdentity(VecMask);
		}
		}
		copy(VecMask, Mask.begin());
		return Vec;
}		}
/// Checks if the specified entry \p E needs to be delayed because of its		/// Checks if the specified entry \p E needs to be delayed because of its
/// dependency nodes.		/// dependency nodes.
Value needToDelay(const TreeEntry E,		Value needToDelay(const TreeEntry E,
ArrayRef<SmallVector<const TreeEntry *>> Deps) {		ArrayRef<SmallVector<const TreeEntry *>> Deps) {
// No need to delay emission if all deps are ready.		// No need to delay emission if all deps are ready.
if (all_of(Deps, [](ArrayRef<const TreeEntry *> TEs) {		if (all_of(Deps, [](ArrayRef<const TreeEntry *> TEs) {
return all_of(		return all_of(
▲ Show 20 Lines • Show All 326 Lines • ▼ Show 20 Lines	if (Mask.size() <= InputVF &&
*find_if_not(Mask, [](int Idx) { return Idx == PoisonMaskElem; });		*find_if_not(Mask, [](int Idx) { return Idx == PoisonMaskElem; });
std::fill(Mask.begin(), Mask.end(), I);		std::fill(Mask.begin(), Mask.end(), I);
}		}
return true;		return true;
};		};
BVTy ShuffleBuilder(Params...);		BVTy ShuffleBuilder(Params...);
ResTy Res = ResTy();		ResTy Res = ResTy();
SmallVector<int> Mask;		SmallVector<int> Mask;
SmallVector<int> ExtractMask;		SmallVector<int> ExtractMask(GatheredScalars.size(), PoisonMaskElem);
std::optional<TargetTransformInfo::ShuffleKind> ExtractShuffle;		SmallVector<std::optional<TTI::ShuffleKind>> ExtractShuffles;
		Value *ExtractVecBase = nullptr;
		bool UseVecBaseAsInput;
SmallVector<std::optional<TargetTransformInfo::ShuffleKind>> GatherShuffles;		SmallVector<std::optional<TargetTransformInfo::ShuffleKind>> GatherShuffles;
SmallVector<SmallVector<const TreeEntry *>> Entries;		SmallVector<SmallVector<const TreeEntry *>> Entries;
Type *ScalarTy = GatheredScalars.front()->getType();		Type *ScalarTy = GatheredScalars.front()->getType();
unsigned NumParts = TTI->getNumberOfParts(		auto *VecTy = FixedVectorType::get(ScalarTy, GatheredScalars.size());
FixedVectorType::get(ScalarTy, GatheredScalars.size()));		unsigned NumParts = TTI->getNumberOfParts(VecTy);
if (NumParts == 0 \|\| NumParts >= GatheredScalars.size())		if (NumParts == 0 \|\| NumParts >= GatheredScalars.size())
NumParts = 1;		NumParts = 1;
if (!all_of(GatheredScalars, UndefValue::classof)) {		if (!all_of(GatheredScalars, UndefValue::classof)) {
// Check for gathered extracts.		// Check for gathered extracts.
ExtractShuffle =
tryToGatherSingleRegisterExtractElements(GatheredScalars, ExtractMask);
bool Resized = false;		bool Resized = false;
if (Value *VecBase = ShuffleBuilder.adjustExtracts(E, ExtractMask))		ExtractShuffles =
		tryToGatherExtractElements(GatheredScalars, ExtractMask, NumParts);
		vdmitrieUnsubmitted Not Done Reply Inline Actions Use ScalarTy ? vdmitrie: Use ScalarTy ?
		if (!ExtractShuffles.empty()) {
		if (Value *VecBase = ShuffleBuilder.adjustExtracts(
		E, ExtractMask, NumParts, UseVecBaseAsInput)) {
		ExtractVecBase = VecBase;
if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))		if (auto *VecBaseTy = dyn_cast<FixedVectorType>(VecBase->getType()))
if (VF == VecBaseTy->getNumElements() && GatheredScalars.size() != VF) {		if (VF == VecBaseTy->getNumElements() &&
		GatheredScalars.size() != VF) {
Resized = true;		Resized = true;
GatheredScalars.append(VF - GatheredScalars.size(),		GatheredScalars.append(VF - GatheredScalars.size(),
PoisonValue::get(ScalarTy));		PoisonValue::get(ScalarTy));
}		}
		}
		}
// Gather extracts after we check for full matched gathers only.		// Gather extracts after we check for full matched gathers only.
if (ExtractShuffle \|\| E->getOpcode() != Instruction::Load \|\|		if (!ExtractShuffles.empty() \|\| E->getOpcode() != Instruction::Load \|\|
E->isAltShuffle() \|\|		E->isAltShuffle() \|\|
all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|		all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) \|\|
isSplat(E->Scalars) \|\|		isSplat(E->Scalars) \|\|
(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2)) {		(E->Scalars != GatheredScalars && GatheredScalars.size() <= 2)) {
GatherShuffles =		GatherShuffles =
isGatherShuffledEntry(E, GatheredScalars, Mask, Entries, NumParts);		isGatherShuffledEntry(E, GatheredScalars, Mask, Entries, NumParts);
}		}
if (!GatherShuffles.empty()) {		if (!GatherShuffles.empty()) {
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	if (NumNonConsts == 1) {
ReuseMask[I] = PoisonMaskElem;		ReuseMask[I] = PoisonMaskElem;
if (isa<UndefValue>(Scalars[I]))		if (isa<UndefValue>(Scalars[I]))
Scalars[I] = PoisonValue::get(ScalarTy);		Scalars[I] = PoisonValue::get(ScalarTy);
}		}
NeedFreeze = true;		NeedFreeze = true;
}		}
}		}
};		};
if (ExtractShuffle \|\| !GatherShuffles.empty()) {		if (!ExtractShuffles.empty() \|\| !GatherShuffles.empty()) {
bool IsNonPoisoned = true;		bool IsNonPoisoned = true;
bool IsUsedInExpr = true;		bool IsUsedInExpr = true;
Value *Vec1 = nullptr;		Value *Vec1 = nullptr;
if (ExtractShuffle) {		if (!ExtractShuffles.empty()) {
// Gather of extractelements can be represented as just a shuffle of		// Gather of extractelements can be represented as just a shuffle of
// a single/two vectors the scalars are extracted from.		// a single/two vectors the scalars are extracted from.
// Find input vectors.		// Find input vectors.
Value *Vec2 = nullptr;		Value *Vec2 = nullptr;
for (unsigned I = 0, Sz = ExtractMask.size(); I < Sz; ++I) {		for (unsigned I = 0, Sz = ExtractMask.size(); I < Sz; ++I) {
if (ExtractMask[I] == PoisonMaskElem \|\|		if (!Mask.empty() && Mask[I] != PoisonMaskElem)
(!Mask.empty() && Mask[I] != PoisonMaskElem)) {
ExtractMask[I] = PoisonMaskElem;		ExtractMask[I] = PoisonMaskElem;
continue;
}		}
		if (UseVecBaseAsInput) {
		Vec1 = ExtractVecBase;
		} else {
		for (unsigned I = 0, Sz = ExtractMask.size(); I < Sz; ++I) {
		if (ExtractMask[I] == PoisonMaskElem)
		continue;
if (isa<UndefValue>(E->Scalars[I]))		if (isa<UndefValue>(E->Scalars[I]))
continue;		continue;
auto *EI = cast<ExtractElementInst>(E->Scalars[I]);		auto *EI = cast<ExtractElementInst>(E->Scalars[I]);
if (!Vec1) {		if (!Vec1) {
Vec1 = EI->getVectorOperand();		Vec1 = EI->getVectorOperand();
} else if (Vec1 != EI->getVectorOperand()) {		} else if (Vec1 != EI->getVectorOperand()) {
assert((!Vec2 \|\| Vec2 == EI->getVectorOperand()) &&		assert((!Vec2 \|\| Vec2 == EI->getVectorOperand()) &&
"Expected only 1 or 2 vectors shuffle.");		"Expected only 1 or 2 vectors shuffle.");
Vec2 = EI->getVectorOperand();		Vec2 = EI->getVectorOperand();
}		}
}		}
		}
if (Vec2) {		if (Vec2) {
IsUsedInExpr = false;		IsUsedInExpr = false;
IsNonPoisoned &=		IsNonPoisoned &=
isGuaranteedNotToBePoison(Vec1) && isGuaranteedNotToBePoison(Vec2);		isGuaranteedNotToBePoison(Vec1) && isGuaranteedNotToBePoison(Vec2);
ShuffleBuilder.add(Vec1, Vec2, ExtractMask);		ShuffleBuilder.add(Vec1, Vec2, ExtractMask);
} else if (Vec1) {		} else if (Vec1) {
IsUsedInExpr &= FindReusedSplat(		IsUsedInExpr &= FindReusedSplat(
ExtractMask,		ExtractMask,
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	if (!ExtractShuffles.empty() \|\| !GatherShuffles.empty()) {
// Try to figure out best way to combine values: build a shuffle and insert		// Try to figure out best way to combine values: build a shuffle and insert
// elements or just build several shuffles.		// elements or just build several shuffles.
// Insert non-constant scalars.		// Insert non-constant scalars.
SmallVector<Value *> NonConstants(GatheredScalars);		SmallVector<Value *> NonConstants(GatheredScalars);
int EMSz = ExtractMask.size();		int EMSz = ExtractMask.size();
int MSz = Mask.size();		int MSz = Mask.size();
// Try to build constant vector and shuffle with it only if currently we		// Try to build constant vector and shuffle with it only if currently we
// have a single permutation and more than 1 scalar constants.		// have a single permutation and more than 1 scalar constants.
bool IsSingleShuffle = !ExtractShuffle \|\| GatherShuffles.empty();		bool IsSingleShuffle = ExtractShuffles.empty() \|\| GatherShuffles.empty();
bool IsIdentityShuffle =		bool IsIdentityShuffle =
(ExtractShuffle.value_or(TTI::SK_PermuteTwoSrc) ==		((UseVecBaseAsInput \|\|
TTI::SK_PermuteSingleSrc &&		all_of(ExtractShuffles,
		[](const std::optional<TTI::ShuffleKind> &SK) {
		return SK.value_or(TTI::SK_PermuteTwoSrc) ==
		TTI::SK_PermuteSingleSrc;
		})) &&
none_of(ExtractMask, [&](int I) { return I >= EMSz; }) &&		none_of(ExtractMask, [&](int I) { return I >= EMSz; }) &&
ShuffleVectorInst::isIdentityMask(ExtractMask, EMSz)) \|\|		ShuffleVectorInst::isIdentityMask(ExtractMask, EMSz)) \|\|
(!GatherShuffles.empty() &&		(!GatherShuffles.empty() &&
all_of(GatherShuffles,		all_of(GatherShuffles,
[](const std::optional<TTI::ShuffleKind> &SK) {		[](const std::optional<TTI::ShuffleKind> &SK) {
return SK.value_or(TTI::SK_PermuteTwoSrc) ==		return SK.value_or(TTI::SK_PermuteTwoSrc) ==
TTI::SK_PermuteSingleSrc;		TTI::SK_PermuteSingleSrc;
}) &&		}) &&
▲ Show 20 Lines • Show All 5,320 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br label [[WHILE_END]]			; CHECK-NEXT: br label [[WHILE_END]]
	; CHECK: while.end:			; CHECK: while.end:
	; CHECK-NEXT: [[TMP4FT_0_LCSSA:%.]] = phi <2 x i64> [ zeroinitializer, [[ENTRY:%.]] ], [ [[ADD_I263]], [[WHILE_END_LOOPEXIT]] ]			; CHECK-NEXT: [[TMP4FT_0_LCSSA:%.]] = phi <2 x i64> [ zeroinitializer, [[ENTRY:%.]] ], [ [[ADD_I263]], [[WHILE_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[TMP4TF_0_LCSSA:%.*]] = phi <2 x i64> [ zeroinitializer, [[ENTRY]] ], [ [[ADD_I258]], [[WHILE_END_LOOPEXIT]] ]			; CHECK-NEXT: [[TMP4TF_0_LCSSA:%.*]] = phi <2 x i64> [ zeroinitializer, [[ENTRY]] ], [ [[ADD_I258]], [[WHILE_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[TMP4FF_0_LCSSA:%.*]] = phi <2 x i64> [ zeroinitializer, [[ENTRY]] ], [ [[ADD_I253]], [[WHILE_END_LOOPEXIT]] ]			; CHECK-NEXT: [[TMP4FF_0_LCSSA:%.*]] = phi <2 x i64> [ zeroinitializer, [[ENTRY]] ], [ [[ADD_I253]], [[WHILE_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[TMP4TT_0_LCSSA:%.*]] = phi <2 x i64> [ zeroinitializer, [[ENTRY]] ], [ [[ADD_I]], [[WHILE_END_LOOPEXIT]] ]			; CHECK-NEXT: [[TMP4TT_0_LCSSA:%.*]] = phi <2 x i64> [ zeroinitializer, [[ENTRY]] ], [ [[ADD_I]], [[WHILE_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[PB_ADDR_0_LCSSA:%.*]] = phi ptr [ [[PB]], [[ENTRY]] ], [ [[SCEVGEP311]], [[WHILE_END_LOOPEXIT]] ]			; CHECK-NEXT: [[PB_ADDR_0_LCSSA:%.*]] = phi ptr [ [[PB]], [[ENTRY]] ], [ [[SCEVGEP311]], [[WHILE_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[PA_ADDR_0_LCSSA:%.*]] = phi ptr [ [[PA]], [[ENTRY]] ], [ [[SCEVGEP]], [[WHILE_END_LOOPEXIT]] ]			; CHECK-NEXT: [[PA_ADDR_0_LCSSA:%.*]] = phi ptr [ [[PA]], [[ENTRY]] ], [ [[SCEVGEP]], [[WHILE_END_LOOPEXIT]] ]
	; CHECK-NEXT: [[VGETQ_LANE:%.*]] = extractelement <2 x i64> [[TMP4TT_0_LCSSA]], i64 0			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x i64> [[TMP4FT_0_LCSSA]], <2 x i64> [[TMP4TF_0_LCSSA]], <2 x i32> <i32 0, i32 2>
	; CHECK-NEXT: [[VGETQ_LANE45:%.*]] = extractelement <2 x i64> [[TMP4TT_0_LCSSA]], i64 1			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i64> [[TMP4TT_0_LCSSA]], <2 x i64> [[TMP4FF_0_LCSSA]], <2 x i32> <i32 0, i32 2>
	; CHECK-NEXT: [[ADD:%.*]] = add i64 [[VGETQ_LANE]], [[VGETQ_LANE45]]			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x i64> [[TMP10]], <2 x i64> [[TMP11]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[CONV48:%.*]] = trunc i64 [[ADD]] to i32			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x i64> [[TMP4FT_0_LCSSA]], <2 x i64> [[TMP4TF_0_LCSSA]], <2 x i32> <i32 1, i32 3>
	; CHECK-NEXT: [[VGETQ_LANE51:%.*]] = extractelement <2 x i64> [[TMP4FF_0_LCSSA]], i64 0			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x i64> [[TMP4TT_0_LCSSA]], <2 x i64> [[TMP4FF_0_LCSSA]], <2 x i32> <i32 1, i32 3>
	; CHECK-NEXT: [[VGETQ_LANE55:%.*]] = extractelement <2 x i64> [[TMP4FF_0_LCSSA]], i64 1			; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <2 x i64> [[TMP13]], <2 x i64> [[TMP14]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[ADD57:%.*]] = add i64 [[VGETQ_LANE51]], [[VGETQ_LANE55]]			; CHECK-NEXT: [[TMP16:%.*]] = add <4 x i64> [[TMP12]], [[TMP15]]
	; CHECK-NEXT: [[CONV60:%.*]] = trunc i64 [[ADD57]] to i32			; CHECK-NEXT: [[TMP17:%.*]] = trunc <4 x i64> [[TMP16]] to <4 x i32>
	; CHECK-NEXT: [[VGETQ_LANE63:%.*]] = extractelement <2 x i64> [[TMP4TF_0_LCSSA]], i64 0
	; CHECK-NEXT: [[VGETQ_LANE67:%.*]] = extractelement <2 x i64> [[TMP4TF_0_LCSSA]], i64 1
	; CHECK-NEXT: [[ADD69:%.*]] = add i64 [[VGETQ_LANE63]], [[VGETQ_LANE67]]
	; CHECK-NEXT: [[CONV72:%.*]] = trunc i64 [[ADD69]] to i32
	; CHECK-NEXT: [[VGETQ_LANE75:%.*]] = extractelement <2 x i64> [[TMP4FT_0_LCSSA]], i64 0
	; CHECK-NEXT: [[VGETQ_LANE79:%.*]] = extractelement <2 x i64> [[TMP4FT_0_LCSSA]], i64 1
	; CHECK-NEXT: [[ADD81:%.*]] = add i64 [[VGETQ_LANE75]], [[VGETQ_LANE79]]
	; CHECK-NEXT: [[CONV84:%.*]] = trunc i64 [[ADD81]] to i32
	; CHECK-NEXT: [[AND:%.*]] = and i32 [[NUMBEROFBOOLS]], 127			; CHECK-NEXT: [[AND:%.*]] = and i32 [[NUMBEROFBOOLS]], 127
	; CHECK-NEXT: [[CMP86284:%.*]] = icmp ugt i32 [[AND]], 31			; CHECK-NEXT: [[CMP86284:%.*]] = icmp ugt i32 [[AND]], 31
	; CHECK-NEXT: br i1 [[CMP86284]], label [[WHILE_BODY88:%.]], label [[WHILE_END122:%.]]			; CHECK-NEXT: br i1 [[CMP86284]], label [[WHILE_BODY88:%.]], label [[WHILE_END122:%.]]
	; CHECK: while.body88:			; CHECK: while.body88:
	; CHECK-NEXT: [[PA_ADDR_1291:%.]] = phi ptr [ [[INCDEC_PTR:%.]], [[WHILE_END121:%.*]] ], [ [[PA_ADDR_0_LCSSA]], [[WHILE_END]] ]			; CHECK-NEXT: [[PA_ADDR_1291:%.]] = phi ptr [ [[INCDEC_PTR:%.]], [[WHILE_END121:%.*]] ], [ [[PA_ADDR_0_LCSSA]], [[WHILE_END]] ]
	; CHECK-NEXT: [[PB_ADDR_1290:%.]] = phi ptr [ [[INCDEC_PTR89:%.]], [[WHILE_END121]] ], [ [[PB_ADDR_0_LCSSA]], [[WHILE_END]] ]			; CHECK-NEXT: [[PB_ADDR_1290:%.]] = phi ptr [ [[INCDEC_PTR89:%.]], [[WHILE_END121]] ], [ [[PB_ADDR_0_LCSSA]], [[WHILE_END]] ]
	; CHECK-NEXT: [[_CTT_0289:%.]] = phi i32 [ [[ADD99:%.]], [[WHILE_END121]] ], [ [[CONV48]], [[WHILE_END]] ]
	; CHECK-NEXT: [[_CFF_0288:%.]] = phi i32 [ [[ADD106:%.]], [[WHILE_END121]] ], [ [[CONV60]], [[WHILE_END]] ]
	; CHECK-NEXT: [[_CTF_0287:%.]] = phi i32 [ [[ADD113:%.]], [[WHILE_END121]] ], [ [[CONV72]], [[WHILE_END]] ]
	; CHECK-NEXT: [[_CFT_0286:%.]] = phi i32 [ [[ADD120:%.]], [[WHILE_END121]] ], [ [[CONV84]], [[WHILE_END]] ]
	; CHECK-NEXT: [[NBBOOLBLOCK_1285:%.]] = phi i32 [ [[SUB:%.]], [[WHILE_END121]] ], [ [[AND]], [[WHILE_END]] ]			; CHECK-NEXT: [[NBBOOLBLOCK_1285:%.]] = phi i32 [ [[SUB:%.]], [[WHILE_END121]] ], [ [[AND]], [[WHILE_END]] ]
	; CHECK-NEXT: [[TMP10:%.*]] = load i32, ptr [[PA_ADDR_1291]], align 4			; CHECK-NEXT: [[TMP18:%.]] = phi <4 x i32> [ [[TMP34:%.]], [[WHILE_END121]] ], [ [[TMP17]], [[WHILE_END]] ]
	; CHECK-NEXT: [[TMP11:%.*]] = load i32, ptr [[PB_ADDR_1290]], align 4			; CHECK-NEXT: [[TMP19:%.*]] = load i32, ptr [[PA_ADDR_1291]], align 4
				; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[PB_ADDR_1290]], align 4
	; CHECK-NEXT: br label [[WHILE_BODY93:%.*]]			; CHECK-NEXT: br label [[WHILE_BODY93:%.*]]
	; CHECK: while.body93:			; CHECK: while.body93:
	; CHECK-NEXT: [[_CTT_1283:%.*]] = phi i32 [ [[_CTT_0289]], [[WHILE_BODY88]] ], [ [[ADD99]], [[WHILE_BODY93]] ]			; CHECK-NEXT: [[A_0279:%.]] = phi i32 [ [[TMP19]], [[WHILE_BODY88]] ], [ [[SHR96:%.]], [[WHILE_BODY93]] ]
	; CHECK-NEXT: [[_CFF_1282:%.*]] = phi i32 [ [[_CFF_0288]], [[WHILE_BODY88]] ], [ [[ADD106]], [[WHILE_BODY93]] ]			; CHECK-NEXT: [[B_0278:%.]] = phi i32 [ [[TMP20]], [[WHILE_BODY88]] ], [ [[SHR97:%.]], [[WHILE_BODY93]] ]
	; CHECK-NEXT: [[_CTF_1281:%.*]] = phi i32 [ [[_CTF_0287]], [[WHILE_BODY88]] ], [ [[ADD113]], [[WHILE_BODY93]] ]
	; CHECK-NEXT: [[_CFT_1280:%.*]] = phi i32 [ [[_CFT_0286]], [[WHILE_BODY88]] ], [ [[ADD120]], [[WHILE_BODY93]] ]
	; CHECK-NEXT: [[A_0279:%.]] = phi i32 [ [[TMP10]], [[WHILE_BODY88]] ], [ [[SHR96:%.]], [[WHILE_BODY93]] ]
	; CHECK-NEXT: [[B_0278:%.]] = phi i32 [ [[TMP11]], [[WHILE_BODY88]] ], [ [[SHR97:%.]], [[WHILE_BODY93]] ]
	; CHECK-NEXT: [[SHIFT_0277:%.]] = phi i32 [ 0, [[WHILE_BODY88]] ], [ [[INC:%.]], [[WHILE_BODY93]] ]			; CHECK-NEXT: [[SHIFT_0277:%.]] = phi i32 [ 0, [[WHILE_BODY88]] ], [ [[INC:%.]], [[WHILE_BODY93]] ]
				; CHECK-NEXT: [[TMP21:%.*]] = phi <4 x i32> [ [[TMP18]], [[WHILE_BODY88]] ], [ [[TMP34]], [[WHILE_BODY93]] ]
	; CHECK-NEXT: [[AND94:%.*]] = and i32 [[A_0279]], 1			; CHECK-NEXT: [[AND94:%.*]] = and i32 [[A_0279]], 1
	; CHECK-NEXT: [[AND95:%.*]] = and i32 [[B_0278]], 1			; CHECK-NEXT: [[AND95:%.*]] = and i32 [[B_0278]], 1
	; CHECK-NEXT: [[SHR96]] = lshr i32 [[A_0279]], 1			; CHECK-NEXT: [[SHR96]] = lshr i32 [[A_0279]], 1
	; CHECK-NEXT: [[SHR97]] = lshr i32 [[B_0278]], 1			; CHECK-NEXT: [[SHR97]] = lshr i32 [[B_0278]], 1
	; CHECK-NEXT: [[TOBOOL:%.*]] = icmp ne i32 [[AND94]], 0			; CHECK-NEXT: [[TMP22:%.*]] = insertelement <2 x i32> poison, i32 [[AND94]], i32 0
	; CHECK-NEXT: [[TOBOOL98:%.*]] = icmp ne i32 [[AND95]], 0			; CHECK-NEXT: [[TMP23:%.*]] = shufflevector <2 x i32> [[TMP22]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = select i1 [[TOBOOL]], i1 [[TOBOOL98]], i1 false			; CHECK-NEXT: [[TMP24:%.*]] = icmp eq <2 x i32> [[TMP23]], zeroinitializer
	; CHECK-NEXT: [[LAND_EXT:%.*]] = zext i1 [[TMP12]] to i32			; CHECK-NEXT: [[TMP25:%.*]] = icmp ne <2 x i32> [[TMP23]], zeroinitializer
	; CHECK-NEXT: [[ADD99]] = add i32 [[_CTT_1283]], [[LAND_EXT]]			; CHECK-NEXT: [[TMP26:%.*]] = shufflevector <2 x i1> [[TMP24]], <2 x i1> [[TMP25]], <4 x i32> <i32 0, i32 3, i32 3, i32 0>
	; CHECK-NEXT: [[TOBOOL100:%.*]] = icmp eq i32 [[AND94]], 0			; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x i32> poison, i32 [[AND95]], i32 0
	; CHECK-NEXT: [[TOBOOL103:%.*]] = icmp eq i32 [[AND95]], 0			; CHECK-NEXT: [[TMP28:%.*]] = shufflevector <2 x i32> [[TMP27]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP13:%.*]] = select i1 [[TOBOOL100]], i1 [[TOBOOL103]], i1 false			; CHECK-NEXT: [[TMP29:%.*]] = icmp ne <2 x i32> [[TMP28]], zeroinitializer
	; CHECK-NEXT: [[LAND_EXT105:%.*]] = zext i1 [[TMP13]] to i32			; CHECK-NEXT: [[TMP30:%.*]] = icmp eq <2 x i32> [[TMP28]], zeroinitializer
	; CHECK-NEXT: [[ADD106]] = add i32 [[_CFF_1282]], [[LAND_EXT105]]			; CHECK-NEXT: [[TMP31:%.*]] = shufflevector <2 x i1> [[TMP29]], <2 x i1> [[TMP30]], <4 x i32> <i32 0, i32 3, i32 0, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TOBOOL]], i1 [[TOBOOL103]], i1 false			; CHECK-NEXT: [[TMP32:%.*]] = select <4 x i1> [[TMP26]], <4 x i1> [[TMP31]], <4 x i1> zeroinitializer
	; CHECK-NEXT: [[LAND_EXT112:%.*]] = zext i1 [[TMP14]] to i32			; CHECK-NEXT: [[TMP33:%.*]] = zext <4 x i1> [[TMP32]] to <4 x i32>
	; CHECK-NEXT: [[ADD113]] = add i32 [[_CTF_1281]], [[LAND_EXT112]]			; CHECK-NEXT: [[TMP34]] = add <4 x i32> [[TMP21]], [[TMP33]]
	; CHECK-NEXT: [[TMP15:%.*]] = select i1 [[TOBOOL100]], i1 [[TOBOOL98]], i1 false
	; CHECK-NEXT: [[LAND_EXT119:%.*]] = zext i1 [[TMP15]] to i32
	; CHECK-NEXT: [[ADD120]] = add i32 [[_CFT_1280]], [[LAND_EXT119]]
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[SHIFT_0277]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[SHIFT_0277]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], 32			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], 32
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[WHILE_END121]], label [[WHILE_BODY93]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[WHILE_END121]], label [[WHILE_BODY93]]
	; CHECK: while.end121:			; CHECK: while.end121:
	; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i32, ptr [[PA_ADDR_1291]], i64 1			; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i32, ptr [[PA_ADDR_1291]], i64 1
	; CHECK-NEXT: [[INCDEC_PTR89]] = getelementptr inbounds i32, ptr [[PB_ADDR_1290]], i64 1			; CHECK-NEXT: [[INCDEC_PTR89]] = getelementptr inbounds i32, ptr [[PB_ADDR_1290]], i64 1
	; CHECK-NEXT: [[SUB]] = add nsw i32 [[NBBOOLBLOCK_1285]], -32			; CHECK-NEXT: [[SUB]] = add nsw i32 [[NBBOOLBLOCK_1285]], -32
	; CHECK-NEXT: [[CMP86:%.*]] = icmp ugt i32 [[SUB]], 31			; CHECK-NEXT: [[CMP86:%.*]] = icmp ugt i32 [[SUB]], 31
	; CHECK-NEXT: br i1 [[CMP86]], label [[WHILE_BODY88]], label [[WHILE_END122]]			; CHECK-NEXT: br i1 [[CMP86]], label [[WHILE_BODY88]], label [[WHILE_END122]]
	; CHECK: while.end122:			; CHECK: while.end122:
	; CHECK-NEXT: [[NBBOOLBLOCK_1_LCSSA:%.*]] = phi i32 [ [[AND]], [[WHILE_END]] ], [ [[SUB]], [[WHILE_END121]] ]			; CHECK-NEXT: [[NBBOOLBLOCK_1_LCSSA:%.*]] = phi i32 [ [[AND]], [[WHILE_END]] ], [ [[SUB]], [[WHILE_END121]] ]
	; CHECK-NEXT: [[_CFT_0_LCSSA:%.*]] = phi i32 [ [[CONV84]], [[WHILE_END]] ], [ [[ADD120]], [[WHILE_END121]] ]
	; CHECK-NEXT: [[_CTF_0_LCSSA:%.*]] = phi i32 [ [[CONV72]], [[WHILE_END]] ], [ [[ADD113]], [[WHILE_END121]] ]
	; CHECK-NEXT: [[_CFF_0_LCSSA:%.*]] = phi i32 [ [[CONV60]], [[WHILE_END]] ], [ [[ADD106]], [[WHILE_END121]] ]
	; CHECK-NEXT: [[_CTT_0_LCSSA:%.*]] = phi i32 [ [[CONV48]], [[WHILE_END]] ], [ [[ADD99]], [[WHILE_END121]] ]
	; CHECK-NEXT: [[PB_ADDR_1_LCSSA:%.*]] = phi ptr [ [[PB_ADDR_0_LCSSA]], [[WHILE_END]] ], [ [[INCDEC_PTR89]], [[WHILE_END121]] ]			; CHECK-NEXT: [[PB_ADDR_1_LCSSA:%.*]] = phi ptr [ [[PB_ADDR_0_LCSSA]], [[WHILE_END]] ], [ [[INCDEC_PTR89]], [[WHILE_END121]] ]
	; CHECK-NEXT: [[PA_ADDR_1_LCSSA:%.*]] = phi ptr [ [[PA_ADDR_0_LCSSA]], [[WHILE_END]] ], [ [[INCDEC_PTR]], [[WHILE_END121]] ]			; CHECK-NEXT: [[PA_ADDR_1_LCSSA:%.*]] = phi ptr [ [[PA_ADDR_0_LCSSA]], [[WHILE_END]] ], [ [[INCDEC_PTR]], [[WHILE_END121]] ]
				; CHECK-NEXT: [[TMP35:%.*]] = phi <4 x i32> [ [[TMP17]], [[WHILE_END]] ], [ [[TMP34]], [[WHILE_END121]] ]
	; CHECK-NEXT: [[CMP130_NOT299:%.*]] = icmp eq i32 [[NBBOOLBLOCK_1_LCSSA]], 0			; CHECK-NEXT: [[CMP130_NOT299:%.*]] = icmp eq i32 [[NBBOOLBLOCK_1_LCSSA]], 0
	; CHECK-NEXT: br i1 [[CMP130_NOT299]], label [[WHILE_END166:%.]], label [[WHILE_BODY132_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[CMP130_NOT299]], label [[WHILE_END166:%.]], label [[WHILE_BODY132_PREHEADER:%.]]
	; CHECK: while.body132.preheader:			; CHECK: while.body132.preheader:
	; CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[PB_ADDR_1_LCSSA]], align 4			; CHECK-NEXT: [[TMP36:%.*]] = load i32, ptr [[PB_ADDR_1_LCSSA]], align 4
	; CHECK-NEXT: [[SUB125:%.*]] = sub nuw nsw i32 32, [[NBBOOLBLOCK_1_LCSSA]]			; CHECK-NEXT: [[SUB125:%.*]] = sub nuw nsw i32 32, [[NBBOOLBLOCK_1_LCSSA]]
	; CHECK-NEXT: [[SHR128:%.*]] = lshr i32 [[TMP16]], [[SUB125]]			; CHECK-NEXT: [[SHR128:%.*]] = lshr i32 [[TMP36]], [[SUB125]]
	; CHECK-NEXT: [[TMP17:%.*]] = load i32, ptr [[PA_ADDR_1_LCSSA]], align 4			; CHECK-NEXT: [[TMP37:%.*]] = load i32, ptr [[PA_ADDR_1_LCSSA]], align 4
	; CHECK-NEXT: [[SHR126:%.*]] = lshr i32 [[TMP17]], [[SUB125]]			; CHECK-NEXT: [[SHR126:%.*]] = lshr i32 [[TMP37]], [[SUB125]]
	; CHECK-NEXT: br label [[WHILE_BODY132:%.*]]			; CHECK-NEXT: br label [[WHILE_BODY132:%.*]]
	; CHECK: while.body132:			; CHECK: while.body132:
	; CHECK-NEXT: [[_CTT_2306:%.]] = phi i32 [ [[ADD142:%.]], [[WHILE_BODY132]] ], [ [[_CTT_0_LCSSA]], [[WHILE_BODY132_PREHEADER]] ]
	; CHECK-NEXT: [[_CFF_2305:%.]] = phi i32 [ [[ADD150:%.]], [[WHILE_BODY132]] ], [ [[_CFF_0_LCSSA]], [[WHILE_BODY132_PREHEADER]] ]
	; CHECK-NEXT: [[_CTF_2304:%.]] = phi i32 [ [[ADD157:%.]], [[WHILE_BODY132]] ], [ [[_CTF_0_LCSSA]], [[WHILE_BODY132_PREHEADER]] ]
	; CHECK-NEXT: [[_CFT_2303:%.]] = phi i32 [ [[ADD164:%.]], [[WHILE_BODY132]] ], [ [[_CFT_0_LCSSA]], [[WHILE_BODY132_PREHEADER]] ]
	; CHECK-NEXT: [[NBBOOLBLOCK_2302:%.]] = phi i32 [ [[DEC165:%.]], [[WHILE_BODY132]] ], [ [[NBBOOLBLOCK_1_LCSSA]], [[WHILE_BODY132_PREHEADER]] ]			; CHECK-NEXT: [[NBBOOLBLOCK_2302:%.]] = phi i32 [ [[DEC165:%.]], [[WHILE_BODY132]] ], [ [[NBBOOLBLOCK_1_LCSSA]], [[WHILE_BODY132_PREHEADER]] ]
	; CHECK-NEXT: [[A_1301:%.]] = phi i32 [ [[SHR135:%.]], [[WHILE_BODY132]] ], [ [[SHR126]], [[WHILE_BODY132_PREHEADER]] ]			; CHECK-NEXT: [[A_1301:%.]] = phi i32 [ [[SHR135:%.]], [[WHILE_BODY132]] ], [ [[SHR126]], [[WHILE_BODY132_PREHEADER]] ]
	; CHECK-NEXT: [[B_1300:%.]] = phi i32 [ [[SHR136:%.]], [[WHILE_BODY132]] ], [ [[SHR128]], [[WHILE_BODY132_PREHEADER]] ]			; CHECK-NEXT: [[B_1300:%.]] = phi i32 [ [[SHR136:%.]], [[WHILE_BODY132]] ], [ [[SHR128]], [[WHILE_BODY132_PREHEADER]] ]
				; CHECK-NEXT: [[TMP38:%.]] = phi <4 x i32> [ [[TMP51:%.]], [[WHILE_BODY132]] ], [ [[TMP35]], [[WHILE_BODY132_PREHEADER]] ]
	; CHECK-NEXT: [[AND133:%.*]] = and i32 [[A_1301]], 1			; CHECK-NEXT: [[AND133:%.*]] = and i32 [[A_1301]], 1
	; CHECK-NEXT: [[AND134:%.*]] = and i32 [[B_1300]], 1			; CHECK-NEXT: [[AND134:%.*]] = and i32 [[B_1300]], 1
	; CHECK-NEXT: [[SHR135]] = lshr i32 [[A_1301]], 1			; CHECK-NEXT: [[SHR135]] = lshr i32 [[A_1301]], 1
	; CHECK-NEXT: [[SHR136]] = lshr i32 [[B_1300]], 1			; CHECK-NEXT: [[SHR136]] = lshr i32 [[B_1300]], 1
	; CHECK-NEXT: [[TOBOOL137:%.*]] = icmp ne i32 [[AND133]], 0			; CHECK-NEXT: [[TMP39:%.*]] = insertelement <2 x i32> poison, i32 [[AND133]], i32 0
	; CHECK-NEXT: [[TOBOOL139:%.*]] = icmp ne i32 [[AND134]], 0			; CHECK-NEXT: [[TMP40:%.*]] = shufflevector <2 x i32> [[TMP39]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP18:%.*]] = select i1 [[TOBOOL137]], i1 [[TOBOOL139]], i1 false			; CHECK-NEXT: [[TMP41:%.*]] = icmp eq <2 x i32> [[TMP40]], zeroinitializer
	; CHECK-NEXT: [[LAND_EXT141:%.*]] = zext i1 [[TMP18]] to i32			; CHECK-NEXT: [[TMP42:%.*]] = icmp ne <2 x i32> [[TMP40]], zeroinitializer
	; CHECK-NEXT: [[ADD142]] = add i32 [[_CTT_2306]], [[LAND_EXT141]]			; CHECK-NEXT: [[TMP43:%.*]] = shufflevector <2 x i1> [[TMP41]], <2 x i1> [[TMP42]], <4 x i32> <i32 0, i32 3, i32 3, i32 0>
	; CHECK-NEXT: [[TOBOOL144:%.*]] = icmp eq i32 [[AND133]], 0			; CHECK-NEXT: [[TMP44:%.*]] = insertelement <2 x i32> poison, i32 [[AND134]], i32 0
	; CHECK-NEXT: [[TOBOOL147:%.*]] = icmp eq i32 [[AND134]], 0			; CHECK-NEXT: [[TMP45:%.*]] = shufflevector <2 x i32> [[TMP44]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP19:%.*]] = select i1 [[TOBOOL144]], i1 [[TOBOOL147]], i1 false			; CHECK-NEXT: [[TMP46:%.*]] = icmp ne <2 x i32> [[TMP45]], zeroinitializer
	; CHECK-NEXT: [[LAND_EXT149:%.*]] = zext i1 [[TMP19]] to i32			; CHECK-NEXT: [[TMP47:%.*]] = icmp eq <2 x i32> [[TMP45]], zeroinitializer
	; CHECK-NEXT: [[ADD150]] = add i32 [[_CFF_2305]], [[LAND_EXT149]]			; CHECK-NEXT: [[TMP48:%.*]] = shufflevector <2 x i1> [[TMP46]], <2 x i1> [[TMP47]], <4 x i32> <i32 0, i32 3, i32 0, i32 3>
	; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TOBOOL137]], i1 [[TOBOOL147]], i1 false			; CHECK-NEXT: [[TMP49:%.*]] = select <4 x i1> [[TMP43]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer
	; CHECK-NEXT: [[LAND_EXT156:%.*]] = zext i1 [[TMP20]] to i32			; CHECK-NEXT: [[TMP50:%.*]] = zext <4 x i1> [[TMP49]] to <4 x i32>
	; CHECK-NEXT: [[ADD157]] = add i32 [[_CTF_2304]], [[LAND_EXT156]]			; CHECK-NEXT: [[TMP51]] = add <4 x i32> [[TMP38]], [[TMP50]]
	; CHECK-NEXT: [[TMP21:%.*]] = select i1 [[TOBOOL144]], i1 [[TOBOOL139]], i1 false
	; CHECK-NEXT: [[LAND_EXT163:%.*]] = zext i1 [[TMP21]] to i32
	; CHECK-NEXT: [[ADD164]] = add i32 [[_CFT_2303]], [[LAND_EXT163]]
	; CHECK-NEXT: [[DEC165]] = add nsw i32 [[NBBOOLBLOCK_2302]], -1			; CHECK-NEXT: [[DEC165]] = add nsw i32 [[NBBOOLBLOCK_2302]], -1
	; CHECK-NEXT: [[CMP130_NOT:%.*]] = icmp eq i32 [[DEC165]], 0			; CHECK-NEXT: [[CMP130_NOT:%.*]] = icmp eq i32 [[DEC165]], 0
	; CHECK-NEXT: br i1 [[CMP130_NOT]], label [[WHILE_END166]], label [[WHILE_BODY132]]			; CHECK-NEXT: br i1 [[CMP130_NOT]], label [[WHILE_END166]], label [[WHILE_BODY132]]
	; CHECK: while.end166:			; CHECK: while.end166:
	; CHECK-NEXT: [[_CFT_2_LCSSA:%.*]] = phi i32 [ [[_CFT_0_LCSSA]], [[WHILE_END122]] ], [ [[ADD164]], [[WHILE_BODY132]] ]			; CHECK-NEXT: [[TMP52:%.*]] = phi <4 x i32> [ [[TMP35]], [[WHILE_END122]] ], [ [[TMP51]], [[WHILE_BODY132]] ]
	; CHECK-NEXT: [[_CTF_2_LCSSA:%.*]] = phi i32 [ [[_CTF_0_LCSSA]], [[WHILE_END122]] ], [ [[ADD157]], [[WHILE_BODY132]] ]			; CHECK-NEXT: [[TMP53:%.*]] = extractelement <4 x i32> [[TMP52]], i32 2
	; CHECK-NEXT: [[_CFF_2_LCSSA:%.*]] = phi i32 [ [[_CFF_0_LCSSA]], [[WHILE_END122]] ], [ [[ADD150]], [[WHILE_BODY132]] ]			; CHECK-NEXT: store i32 [[TMP53]], ptr [[CTT:%.*]], align 4
	; CHECK-NEXT: [[_CTT_2_LCSSA:%.*]] = phi i32 [ [[_CTT_0_LCSSA]], [[WHILE_END122]] ], [ [[ADD142]], [[WHILE_BODY132]] ]			; CHECK-NEXT: [[TMP54:%.*]] = extractelement <4 x i32> [[TMP52]], i32 3
	; CHECK-NEXT: store i32 [[_CTT_2_LCSSA]], ptr [[CTT:%.*]], align 4			; CHECK-NEXT: store i32 [[TMP54]], ptr [[CFF:%.*]], align 4
	; CHECK-NEXT: store i32 [[_CFF_2_LCSSA]], ptr [[CFF:%.*]], align 4			; CHECK-NEXT: [[TMP55:%.*]] = extractelement <4 x i32> [[TMP52]], i32 1
	; CHECK-NEXT: store i32 [[_CTF_2_LCSSA]], ptr [[CTF:%.*]], align 4			; CHECK-NEXT: store i32 [[TMP55]], ptr [[CTF:%.*]], align 4
	; CHECK-NEXT: store i32 [[_CFT_2_LCSSA]], ptr [[CFT:%.*]], align 4			; CHECK-NEXT: [[TMP56:%.*]] = extractelement <4 x i32> [[TMP52]], i32 0
				; CHECK-NEXT: store i32 [[TMP56]], ptr [[CFT:%.*]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%cmp.not264 = icmp ult i32 %numberOfBools, 128			%cmp.not264 = icmp ult i32 %numberOfBools, 128
	br i1 %cmp.not264, label %while.end, label %while.body.preheader			br i1 %cmp.not264, label %while.end, label %while.body.preheader

	while.body.preheader: ; preds = %entry			while.body.preheader: ; preds = %entry
	%shr = lshr i32 %numberOfBools, 7			%shr = lshr i32 %numberOfBools, 7
	▲ Show 20 Lines • Show All 432 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_clear_undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-- -mcpu=corei7 -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-- -mcpu=corei7 -pass-remarks-output=%t \| FileCheck %s
	; RUN: FileCheck %s --input-file=%t --check-prefix=YAML			; RUN: FileCheck %s --input-file=%t --check-prefix=YAML
	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

	; YAML-LABEL: --- !Passed			; YAML-LABEL: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: foo			; YAML-NEXT: Function: foo
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'SLP vectorized with cost '
	; YAML-NEXT: - Cost: '-3'			; YAML-NEXT: - Cost: '-4'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '10'			; YAML-NEXT: - TreeSize: '10'
	; YAML-NEXT: ...			; YAML-NEXT: ...
	define i1 @foo() {			define i1 @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: [[TMP1:%.*]] = load float, ptr null, align 4			; CHECK-NEXT: [[TMP1:%.*]] = load float, ptr null, align 4
	; CHECK-NEXT: br i1 false, label [[TMP11:%.]], label [[TMP2:%.]]			; CHECK-NEXT: br i1 false, label [[TMP11:%.]], label [[TMP2:%.]]
	; CHECK: 2:			; CHECK: 2:
	Show All 37 Lines

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: ret <4 x double> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
		; SSE-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: ret <4 x double> [[TMP7]]
		;
		; SLM-LABEL: @test_v4f64(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
		; SLM-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: ret <4 x double> [[TMP7]]
		;
		; AVX-LABEL: @test_v4f64(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <4 x double> [[TMP3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	;
%r3 = fadd double %b2, %b3		%r3 = fadd double %b2, %b3
%r00 = insertelement <4 x double> poison, double %r0, i32 0		%r00 = insertelement <4 x double> poison, double %r0, i32 0
%r02 = insertelement <4 x double> %r00, double %r2, i32 2		%r02 = insertelement <4 x double> %r00, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; CHECK-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x float> [[TMP7]]
		;
		; SLM-LABEL: @test_v8f32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SLM-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x float> [[TMP7]]
		;
		; AVX-LABEL: @test_v8f32(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
		; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x float> %r03, float %r4, i32 4		%r04 = insertelement <8 x float> %r03, float %r4, i32 4
%r05 = insertelement <8 x float> %r04, float %r5, i32 5		%r05 = insertelement <8 x float> %r04, float %r5, i32 5
%r06 = insertelement <8 x float> %r05, float %r6, i32 6		%r06 = insertelement <8 x float> %r05, float %r6, i32 6
%r07 = insertelement <8 x float> %r06, float %r7, i32 7		%r07 = insertelement <8 x float> %r06, float %r7, i32 7
ret <8 x float> %r07		ret <8 x float> %r07
}		}

define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {		define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {
; CHECK-LABEL: @test_v4i64(		; SSE-LABEL: @test_v4i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: ret <4 x i64> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; SSE-NEXT: [[TMP5:%.*]] = add <2 x i64> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: ret <4 x i64> [[TMP7]]
		;
		; SLM-LABEL: @test_v4i64(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; SLM-NEXT: [[TMP5:%.*]] = add <2 x i64> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: ret <4 x i64> [[TMP7]]
		;
		; AVX-LABEL: @test_v4i64(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <4 x i64> [[TMP3]]
;		;
%a0 = extractelement <4 x i64> %a, i32 0		%a0 = extractelement <4 x i64> %a, i32 0
%a1 = extractelement <4 x i64> %a, i32 1		%a1 = extractelement <4 x i64> %a, i32 1
%a2 = extractelement <4 x i64> %a, i32 2		%a2 = extractelement <4 x i64> %a, i32 2
%a3 = extractelement <4 x i64> %a, i32 3		%a3 = extractelement <4 x i64> %a, i32 3
%b0 = extractelement <4 x i64> %b, i32 0		%b0 = extractelement <4 x i64> %b, i32 0
%b1 = extractelement <4 x i64> %b, i32 1		%b1 = extractelement <4 x i64> %b, i32 1
%b2 = extractelement <4 x i64> %b, i32 2		%b2 = extractelement <4 x i64> %b, i32 2
%b3 = extractelement <4 x i64> %b, i32 3		%b3 = extractelement <4 x i64> %b, i32 3
%r0 = add i64 %a0, %a1		%r0 = add i64 %a0, %a1
%r1 = add i64 %b0, %b1		%r1 = add i64 %b0, %b1
%r2 = add i64 %a2, %a3		%r2 = add i64 %a2, %a3
%r3 = add i64 %b2, %b3		%r3 = add i64 %b2, %b3
%r00 = insertelement <4 x i64> poison, i64 %r0, i32 0		%r00 = insertelement <4 x i64> poison, i64 %r0, i32 0
%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1		%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1
%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2		%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2
%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3		%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3
ret <4 x i64> %r03		ret <4 x i64> %r03
}		}

define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {
; CHECK-LABEL: @test_v8i32(		; SSE-LABEL: @test_v8i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; CHECK-NEXT: ret <8 x i32> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x i32> [[TMP7]]
		;
		; SLM-LABEL: @test_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SLM-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x i32> [[TMP7]]
		;
		; AVX-LABEL: @test_v8i32(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
		; AVX-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4		%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; CHECK-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; CHECK-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
; CHECK-NEXT: ret <16 x i16> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; SSE-NEXT: [[TMP5:%.*]] = add <8 x i16> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i16> [[TMP5]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SSE-NEXT: ret <16 x i16> [[TMP7]]
		;
		; SLM-LABEL: @test_v16i16(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; SLM-NEXT: [[TMP5:%.*]] = add <8 x i16> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i16> [[TMP5]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <16 x i16> [[TMP7]]
		;
		; AVX-LABEL: @test_v16i16(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; AVX-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <16 x i16> [[TMP3]]
;		;
%a0 = extractelement <16 x i16> %a, i32 0		%a0 = extractelement <16 x i16> %a, i32 0
%a1 = extractelement <16 x i16> %a, i32 1		%a1 = extractelement <16 x i16> %a, i32 1
%a2 = extractelement <16 x i16> %a, i32 2		%a2 = extractelement <16 x i16> %a, i32 2
%a3 = extractelement <16 x i16> %a, i32 3		%a3 = extractelement <16 x i16> %a, i32 3
%a4 = extractelement <16 x i16> %a, i32 4		%a4 = extractelement <16 x i16> %a, i32 4
%a5 = extractelement <16 x i16> %a, i32 5		%a5 = extractelement <16 x i16> %a, i32 5
%a6 = extractelement <16 x i16> %a, i32 6		%a6 = extractelement <16 x i16> %a, i32 6
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
%rv10 = insertelement <16 x i16> %rv9 , i16 %r10, i32 10		%rv10 = insertelement <16 x i16> %rv9 , i16 %r10, i32 10
%rv11 = insertelement <16 x i16> %rv10, i16 %r11, i32 11		%rv11 = insertelement <16 x i16> %rv10, i16 %r11, i32 11
%rv12 = insertelement <16 x i16> %rv11, i16 %r12, i32 12		%rv12 = insertelement <16 x i16> %rv11, i16 %r12, i32 12
%rv13 = insertelement <16 x i16> %rv12, i16 %r13, i32 13		%rv13 = insertelement <16 x i16> %rv12, i16 %r13, i32 13
%rv14 = insertelement <16 x i16> %rv13, i16 %r14, i32 14		%rv14 = insertelement <16 x i16> %rv13, i16 %r14, i32 14
%rv15 = insertelement <16 x i16> %rv14, i16 %r15, i32 15		%rv15 = insertelement <16 x i16> %rv14, i16 %r15, i32 15
ret <16 x i16> %rv15		ret <16 x i16> %rv15
}		}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; AVX: {{.*}}

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: ret <4 x double> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
		; SSE-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: ret <4 x double> [[TMP7]]
		;
		; SLM-LABEL: @test_v4f64(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
		; SLM-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: ret <4 x double> [[TMP7]]
		;
		; AVX-LABEL: @test_v4f64(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <4 x double> [[TMP3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	;
%r3 = fadd double %b2, %b3		%r3 = fadd double %b2, %b3
%r00 = insertelement <4 x double> undef, double %r0, i32 0		%r00 = insertelement <4 x double> undef, double %r0, i32 0
%r02 = insertelement <4 x double> %r00, double %r2, i32 2		%r02 = insertelement <4 x double> %r00, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; CHECK-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x float> [[TMP7]]
		;
		; SLM-LABEL: @test_v8f32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SLM-NEXT: [[TMP5:%.*]] = fadd <4 x float> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x float> [[TMP7]]
		;
		; AVX-LABEL: @test_v8f32(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
		; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x float> %r03, float %r4, i32 4		%r04 = insertelement <8 x float> %r03, float %r4, i32 4
%r05 = insertelement <8 x float> %r04, float %r5, i32 5		%r05 = insertelement <8 x float> %r04, float %r5, i32 5
%r06 = insertelement <8 x float> %r05, float %r6, i32 6		%r06 = insertelement <8 x float> %r05, float %r6, i32 6
%r07 = insertelement <8 x float> %r06, float %r7, i32 7		%r07 = insertelement <8 x float> %r06, float %r7, i32 7
ret <8 x float> %r07		ret <8 x float> %r07
}		}

define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {		define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {
; CHECK-LABEL: @test_v4i64(		; SSE-LABEL: @test_v4i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: ret <4 x i64> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; SSE-NEXT: [[TMP5:%.*]] = add <2 x i64> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: ret <4 x i64> [[TMP7]]
		;
		; SLM-LABEL: @test_v4i64(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; SLM-NEXT: [[TMP5:%.*]] = add <2 x i64> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: ret <4 x i64> [[TMP7]]
		;
		; AVX-LABEL: @test_v4i64(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <4 x i64> [[TMP3]]
;		;
%a0 = extractelement <4 x i64> %a, i32 0		%a0 = extractelement <4 x i64> %a, i32 0
%a1 = extractelement <4 x i64> %a, i32 1		%a1 = extractelement <4 x i64> %a, i32 1
%a2 = extractelement <4 x i64> %a, i32 2		%a2 = extractelement <4 x i64> %a, i32 2
%a3 = extractelement <4 x i64> %a, i32 3		%a3 = extractelement <4 x i64> %a, i32 3
%b0 = extractelement <4 x i64> %b, i32 0		%b0 = extractelement <4 x i64> %b, i32 0
%b1 = extractelement <4 x i64> %b, i32 1		%b1 = extractelement <4 x i64> %b, i32 1
%b2 = extractelement <4 x i64> %b, i32 2		%b2 = extractelement <4 x i64> %b, i32 2
%b3 = extractelement <4 x i64> %b, i32 3		%b3 = extractelement <4 x i64> %b, i32 3
%r0 = add i64 %a0, %a1		%r0 = add i64 %a0, %a1
%r1 = add i64 %b0, %b1		%r1 = add i64 %b0, %b1
%r2 = add i64 %a2, %a3		%r2 = add i64 %a2, %a3
%r3 = add i64 %b2, %b3		%r3 = add i64 %b2, %b3
%r00 = insertelement <4 x i64> undef, i64 %r0, i32 0		%r00 = insertelement <4 x i64> undef, i64 %r0, i32 0
%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1		%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1
%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2		%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2
%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3		%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3
ret <4 x i64> %r03		ret <4 x i64> %r03
}		}

define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {
; CHECK-LABEL: @test_v8i32(		; SSE-LABEL: @test_v8i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; CHECK-NEXT: ret <8 x i32> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x i32> [[TMP7]]
		;
		; SLM-LABEL: @test_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SLM-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x i32> [[TMP7]]
		;
		; AVX-LABEL: @test_v8i32(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
		; AVX-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4		%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; CHECK-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; CHECK-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
; CHECK-NEXT: ret <16 x i16> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; SSE-NEXT: [[TMP5:%.*]] = add <8 x i16> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i16> [[TMP5]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SSE-NEXT: ret <16 x i16> [[TMP7]]
		;
		; SLM-LABEL: @test_v16i16(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; SLM-NEXT: [[TMP5:%.*]] = add <8 x i16> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i16> [[TMP5]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <16 x i16> [[TMP7]]
		;
		; AVX-LABEL: @test_v16i16(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; AVX-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <16 x i16> [[TMP3]]
;		;
%a0 = extractelement <16 x i16> %a, i32 0		%a0 = extractelement <16 x i16> %a, i32 0
%a1 = extractelement <16 x i16> %a, i32 1		%a1 = extractelement <16 x i16> %a, i32 1
%a2 = extractelement <16 x i16> %a, i32 2		%a2 = extractelement <16 x i16> %a, i32 2
%a3 = extractelement <16 x i16> %a, i32 3		%a3 = extractelement <16 x i16> %a, i32 3
%a4 = extractelement <16 x i16> %a, i32 4		%a4 = extractelement <16 x i16> %a, i32 4
%a5 = extractelement <16 x i16> %a, i32 5		%a5 = extractelement <16 x i16> %a, i32 5
%a6 = extractelement <16 x i16> %a, i32 6		%a6 = extractelement <16 x i16> %a, i32 6
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
%rv10 = insertelement <16 x i16> %rv9 , i16 %r10, i32 10		%rv10 = insertelement <16 x i16> %rv9 , i16 %r10, i32 10
%rv11 = insertelement <16 x i16> %rv10, i16 %r11, i32 11		%rv11 = insertelement <16 x i16> %rv10, i16 %r11, i32 11
%rv12 = insertelement <16 x i16> %rv11, i16 %r12, i32 12		%rv12 = insertelement <16 x i16> %rv11, i16 %r12, i32 12
%rv13 = insertelement <16 x i16> %rv12, i16 %r13, i32 13		%rv13 = insertelement <16 x i16> %rv12, i16 %r13, i32 13
%rv14 = insertelement <16 x i16> %rv13, i16 %r14, i32 14		%rv14 = insertelement <16 x i16> %rv13, i16 %r14, i32 14
%rv15 = insertelement <16 x i16> %rv14, i16 %r15, i32 15		%rv15 = insertelement <16 x i16> %rv14, i16 %r15, i32 15
ret <16 x i16> %rv15		ret <16 x i16> %rv15
}		}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; AVX: {{.*}}

llvm/test/Transforms/SLPVectorizer/X86/hsub-inseltpoison.ll

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	;
ret <8 x i16> %r07		ret <8 x i16> %r07
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
; CHECK-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: ret <4 x double> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
		; SSE-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: ret <4 x double> [[TMP7]]
		;
		; SLM-LABEL: @test_v4f64(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
		; SLM-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: ret <4 x double> [[TMP7]]
		;
		; AVX-LABEL: @test_v4f64(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <4 x double> [[TMP3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%r0 = fsub double %a0, %a1		%r0 = fsub double %a0, %a1
%r1 = fsub double %b0, %b1		%r1 = fsub double %b0, %b1
%r2 = fsub double %a2, %a3		%r2 = fsub double %a2, %a3
%r3 = fsub double %b2, %b3		%r3 = fsub double %b2, %b3
%r00 = insertelement <4 x double> poison, double %r0, i32 0		%r00 = insertelement <4 x double> poison, double %r0, i32 0
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; CHECK-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; CHECK-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP5:%.*]] = fsub <4 x float> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x float> [[TMP7]]
		;
		; SLM-LABEL: @test_v8f32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SLM-NEXT: [[TMP5:%.*]] = fsub <4 x float> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x float> [[TMP7]]
		;
		; AVX-LABEL: @test_v8f32(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
		; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x float> %r03, float %r4, i32 4		%r04 = insertelement <8 x float> %r03, float %r4, i32 4
%r05 = insertelement <8 x float> %r04, float %r5, i32 5		%r05 = insertelement <8 x float> %r04, float %r5, i32 5
%r06 = insertelement <8 x float> %r05, float %r6, i32 6		%r06 = insertelement <8 x float> %r05, float %r6, i32 6
%r07 = insertelement <8 x float> %r06, float %r7, i32 7		%r07 = insertelement <8 x float> %r06, float %r7, i32 7
ret <8 x float> %r07		ret <8 x float> %r07
}		}

define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {		define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {
; CHECK-LABEL: @test_v4i64(		; SSE-LABEL: @test_v4i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: ret <4 x i64> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; SSE-NEXT: [[TMP5:%.*]] = sub <2 x i64> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: ret <4 x i64> [[TMP7]]
		;
		; SLM-LABEL: @test_v4i64(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; SLM-NEXT: [[TMP5:%.*]] = sub <2 x i64> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: ret <4 x i64> [[TMP7]]
		;
		; AVX-LABEL: @test_v4i64(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <4 x i64> [[TMP3]]
;		;
%a0 = extractelement <4 x i64> %a, i32 0		%a0 = extractelement <4 x i64> %a, i32 0
%a1 = extractelement <4 x i64> %a, i32 1		%a1 = extractelement <4 x i64> %a, i32 1
%a2 = extractelement <4 x i64> %a, i32 2		%a2 = extractelement <4 x i64> %a, i32 2
%a3 = extractelement <4 x i64> %a, i32 3		%a3 = extractelement <4 x i64> %a, i32 3
%b0 = extractelement <4 x i64> %b, i32 0		%b0 = extractelement <4 x i64> %b, i32 0
%b1 = extractelement <4 x i64> %b, i32 1		%b1 = extractelement <4 x i64> %b, i32 1
%b2 = extractelement <4 x i64> %b, i32 2		%b2 = extractelement <4 x i64> %b, i32 2
%b3 = extractelement <4 x i64> %b, i32 3		%b3 = extractelement <4 x i64> %b, i32 3
%r0 = sub i64 %a0, %a1		%r0 = sub i64 %a0, %a1
%r1 = sub i64 %b0, %b1		%r1 = sub i64 %b0, %b1
%r2 = sub i64 %a2, %a3		%r2 = sub i64 %a2, %a3
%r3 = sub i64 %b2, %b3		%r3 = sub i64 %b2, %b3
%r00 = insertelement <4 x i64> poison, i64 %r0, i32 0		%r00 = insertelement <4 x i64> poison, i64 %r0, i32 0
%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1		%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1
%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2		%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2
%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3		%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3
ret <4 x i64> %r03		ret <4 x i64> %r03
}		}

define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {
; CHECK-LABEL: @test_v8i32(		; SSE-LABEL: @test_v8i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; CHECK-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; CHECK-NEXT: ret <8 x i32> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP5:%.*]] = sub <4 x i32> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x i32> [[TMP7]]
		;
		; SLM-LABEL: @test_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SLM-NEXT: [[TMP5:%.*]] = sub <4 x i32> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x i32> [[TMP7]]
		;
		; AVX-LABEL: @test_v8i32(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
		; AVX-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4		%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; CHECK-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; CHECK-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
; CHECK-NEXT: ret <16 x i16> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; SSE-NEXT: [[TMP5:%.*]] = sub <8 x i16> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i16> [[TMP5]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SSE-NEXT: ret <16 x i16> [[TMP7]]
		;
		; SLM-LABEL: @test_v16i16(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; SLM-NEXT: [[TMP5:%.*]] = sub <8 x i16> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i16> [[TMP5]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <16 x i16> [[TMP7]]
		;
		; AVX-LABEL: @test_v16i16(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; AVX-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <16 x i16> [[TMP3]]
;		;
%a0 = extractelement <16 x i16> %a, i32 0		%a0 = extractelement <16 x i16> %a, i32 0
%a1 = extractelement <16 x i16> %a, i32 1		%a1 = extractelement <16 x i16> %a, i32 1
%a2 = extractelement <16 x i16> %a, i32 2		%a2 = extractelement <16 x i16> %a, i32 2
%a3 = extractelement <16 x i16> %a, i32 3		%a3 = extractelement <16 x i16> %a, i32 3
%a4 = extractelement <16 x i16> %a, i32 4		%a4 = extractelement <16 x i16> %a, i32 4
%a5 = extractelement <16 x i16> %a, i32 5		%a5 = extractelement <16 x i16> %a, i32 5
%a6 = extractelement <16 x i16> %a, i32 6		%a6 = extractelement <16 x i16> %a, i32 6
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	;
%rv11 = insertelement <16 x i16> %rv10, i16 %r11, i32 11		%rv11 = insertelement <16 x i16> %rv10, i16 %r11, i32 11
%rv12 = insertelement <16 x i16> %rv11, i16 %r12, i32 12		%rv12 = insertelement <16 x i16> %rv11, i16 %r12, i32 12
%rv13 = insertelement <16 x i16> %rv12, i16 %r13, i32 13		%rv13 = insertelement <16 x i16> %rv12, i16 %r13, i32 13
%rv14 = insertelement <16 x i16> %rv13, i16 %r14, i32 14		%rv14 = insertelement <16 x i16> %rv13, i16 %r14, i32 14
%rv15 = insertelement <16 x i16> %rv14, i16 %r15, i32 15		%rv15 = insertelement <16 x i16> %rv14, i16 %r15, i32 15
ret <16 x i16> %rv15		ret <16 x i16> %rv15
}		}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:		;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; AVX: {{.*}}
; AVX1: {{.*}}		; AVX1: {{.*}}
; AVX2: {{.*}}		; AVX2: {{.*}}
; AVX512: {{.*}}		; AVX512: {{.*}}
; SLM: {{.*}}
; SSE: {{.*}}

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	;
ret <8 x i16> %r07		ret <8 x i16> %r07
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; CHECK-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
; CHECK-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: ret <4 x double> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
		; SSE-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: ret <4 x double> [[TMP7]]
		;
		; SLM-LABEL: @test_v4f64(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
		; SLM-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: ret <4 x double> [[TMP7]]
		;
		; AVX-LABEL: @test_v4f64(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <4 x double> [[TMP3]]
;		;
%a0 = extractelement <4 x double> %a, i32 0		%a0 = extractelement <4 x double> %a, i32 0
%a1 = extractelement <4 x double> %a, i32 1		%a1 = extractelement <4 x double> %a, i32 1
%a2 = extractelement <4 x double> %a, i32 2		%a2 = extractelement <4 x double> %a, i32 2
%a3 = extractelement <4 x double> %a, i32 3		%a3 = extractelement <4 x double> %a, i32 3
%b0 = extractelement <4 x double> %b, i32 0		%b0 = extractelement <4 x double> %b, i32 0
%b1 = extractelement <4 x double> %b, i32 1		%b1 = extractelement <4 x double> %b, i32 1
%b2 = extractelement <4 x double> %b, i32 2		%b2 = extractelement <4 x double> %b, i32 2
%b3 = extractelement <4 x double> %b, i32 3		%b3 = extractelement <4 x double> %b, i32 3
%r0 = fsub double %a0, %a1		%r0 = fsub double %a0, %a1
%r1 = fsub double %b0, %b1		%r1 = fsub double %b0, %b1
%r2 = fsub double %a2, %a3		%r2 = fsub double %a2, %a3
%r3 = fsub double %b2, %b3		%r3 = fsub double %b2, %b3
%r00 = insertelement <4 x double> undef, double %r0, i32 0		%r00 = insertelement <4 x double> undef, double %r0, i32 0
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @test_v8f32(		; SSE-LABEL: @test_v8f32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; CHECK-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; CHECK-NEXT: ret <8 x float> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP5:%.*]] = fsub <4 x float> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x float> [[TMP7]]
		;
		; SLM-LABEL: @test_v8f32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SLM-NEXT: [[TMP5:%.*]] = fsub <4 x float> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x float> [[TMP7]]
		;
		; AVX-LABEL: @test_v8f32(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
		; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x float> %r03, float %r4, i32 4		%r04 = insertelement <8 x float> %r03, float %r4, i32 4
%r05 = insertelement <8 x float> %r04, float %r5, i32 5		%r05 = insertelement <8 x float> %r04, float %r5, i32 5
%r06 = insertelement <8 x float> %r05, float %r6, i32 6		%r06 = insertelement <8 x float> %r05, float %r6, i32 6
%r07 = insertelement <8 x float> %r06, float %r7, i32 7		%r07 = insertelement <8 x float> %r06, float %r7, i32 7
ret <8 x float> %r07		ret <8 x float> %r07
}		}

define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {		define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {
; CHECK-LABEL: @test_v4i64(		; SSE-LABEL: @test_v4i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
; CHECK-NEXT: ret <4 x i64> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; SSE-NEXT: [[TMP5:%.*]] = sub <2 x i64> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SSE-NEXT: ret <4 x i64> [[TMP7]]
		;
		; SLM-LABEL: @test_v4i64(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; SLM-NEXT: [[TMP5:%.*]] = sub <2 x i64> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <2 x i64> [[TMP5]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; SLM-NEXT: ret <4 x i64> [[TMP7]]
		;
		; AVX-LABEL: @test_v4i64(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <4 x i64> [[TMP3]]
;		;
%a0 = extractelement <4 x i64> %a, i32 0		%a0 = extractelement <4 x i64> %a, i32 0
%a1 = extractelement <4 x i64> %a, i32 1		%a1 = extractelement <4 x i64> %a, i32 1
%a2 = extractelement <4 x i64> %a, i32 2		%a2 = extractelement <4 x i64> %a, i32 2
%a3 = extractelement <4 x i64> %a, i32 3		%a3 = extractelement <4 x i64> %a, i32 3
%b0 = extractelement <4 x i64> %b, i32 0		%b0 = extractelement <4 x i64> %b, i32 0
%b1 = extractelement <4 x i64> %b, i32 1		%b1 = extractelement <4 x i64> %b, i32 1
%b2 = extractelement <4 x i64> %b, i32 2		%b2 = extractelement <4 x i64> %b, i32 2
%b3 = extractelement <4 x i64> %b, i32 3		%b3 = extractelement <4 x i64> %b, i32 3
%r0 = sub i64 %a0, %a1		%r0 = sub i64 %a0, %a1
%r1 = sub i64 %b0, %b1		%r1 = sub i64 %b0, %b1
%r2 = sub i64 %a2, %a3		%r2 = sub i64 %a2, %a3
%r3 = sub i64 %b2, %b3		%r3 = sub i64 %b2, %b3
%r00 = insertelement <4 x i64> undef, i64 %r0, i32 0		%r00 = insertelement <4 x i64> undef, i64 %r0, i32 0
%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1		%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1
%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2		%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2
%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3		%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3
ret <4 x i64> %r03		ret <4 x i64> %r03
}		}

define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {
; CHECK-LABEL: @test_v8i32(		; SSE-LABEL: @test_v8i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; CHECK-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; CHECK-NEXT: ret <8 x i32> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SSE-NEXT: [[TMP5:%.*]] = sub <4 x i32> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SSE-NEXT: ret <8 x i32> [[TMP7]]
		;
		; SLM-LABEL: @test_v8i32(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
		; SLM-NEXT: [[TMP5:%.*]] = sub <4 x i32> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; SLM-NEXT: ret <8 x i32> [[TMP7]]
		;
		; AVX-LABEL: @test_v8i32(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
		; AVX-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4		%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; CHECK-LABEL: @test_v16i16(		; SSE-LABEL: @test_v16i16(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; CHECK-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
; CHECK-NEXT: ret <16 x i16> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; SSE-NEXT: [[TMP5:%.*]] = sub <8 x i16> [[TMP1]], [[TMP3]]
		; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP2]], [[TMP4]]
		; SSE-NEXT: [[TMP7:%.*]] = shufflevector <8 x i16> [[TMP5]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SSE-NEXT: ret <16 x i16> [[TMP7]]
		;
		; SLM-LABEL: @test_v16i16(
		; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>
		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
		; SLM-NEXT: [[TMP3:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; SLM-NEXT: [[TMP5:%.*]] = sub <8 x i16> [[TMP1]], [[TMP3]]
		; SLM-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP2]], [[TMP4]]
		; SLM-NEXT: [[TMP7:%.*]] = shufflevector <8 x i16> [[TMP5]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		; SLM-NEXT: ret <16 x i16> [[TMP7]]
		;
		; AVX-LABEL: @test_v16i16(
		; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
		; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
		; AVX-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
		; AVX-NEXT: ret <16 x i16> [[TMP3]]
;		;
%a0 = extractelement <16 x i16> %a, i32 0		%a0 = extractelement <16 x i16> %a, i32 0
%a1 = extractelement <16 x i16> %a, i32 1		%a1 = extractelement <16 x i16> %a, i32 1
%a2 = extractelement <16 x i16> %a, i32 2		%a2 = extractelement <16 x i16> %a, i32 2
%a3 = extractelement <16 x i16> %a, i32 3		%a3 = extractelement <16 x i16> %a, i32 3
%a4 = extractelement <16 x i16> %a, i32 4		%a4 = extractelement <16 x i16> %a, i32 4
%a5 = extractelement <16 x i16> %a, i32 5		%a5 = extractelement <16 x i16> %a, i32 5
%a6 = extractelement <16 x i16> %a, i32 6		%a6 = extractelement <16 x i16> %a, i32 6
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	;
%rv11 = insertelement <16 x i16> %rv10, i16 %r11, i32 11		%rv11 = insertelement <16 x i16> %rv10, i16 %r11, i32 11
%rv12 = insertelement <16 x i16> %rv11, i16 %r12, i32 12		%rv12 = insertelement <16 x i16> %rv11, i16 %r12, i32 12
%rv13 = insertelement <16 x i16> %rv12, i16 %r13, i32 13		%rv13 = insertelement <16 x i16> %rv12, i16 %r13, i32 13
%rv14 = insertelement <16 x i16> %rv13, i16 %r14, i32 14		%rv14 = insertelement <16 x i16> %rv13, i16 %r14, i32 14
%rv15 = insertelement <16 x i16> %rv14, i16 %r15, i32 15		%rv15 = insertelement <16 x i16> %rv14, i16 %r15, i32 15
ret <16 x i16> %rv15		ret <16 x i16> %rv15
}		}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:		;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; AVX: {{.*}}
; AVX1: {{.*}}		; AVX1: {{.*}}
; AVX2: {{.*}}		; AVX2: {{.*}}
; AVX512: {{.*}}		; AVX512: {{.*}}
; SLM: {{.*}}
; SSE: {{.*}}

llvm/test/Transforms/SLPVectorizer/X86/reused-extractelements.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=bdver2 -pass-remarks-output=%t \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	; YAML: --- !Passed			; YAML: --- !Missed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: NotBeneficial
	; YAML-NEXT: Function: g			; YAML-NEXT: Function: g
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'List vectorization was possible but not beneficial with cost '
	; YAML-NEXT: - Cost: '-1'			; YAML-NEXT: - Cost: '0'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' >= '
	; YAML-NEXT: - TreeSize: '4'			; YAML-NEXT: - Treshold: '0'

	define <2 x i32> @g(<2 x i32> %x, i32 %a, i32 %b) {			define <2 x i32> @g(<2 x i32> %x, i32 %a, i32 %b) {
	; CHECK-LABEL: @g(			; CHECK-LABEL: @g(
	; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[X:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 poison>			; CHECK-NEXT: [[X1:%.]] = extractelement <2 x i32> [[X:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i32> [[TMP1]], i32 [[A:%.]], i32 1			; CHECK-NEXT: [[X1X1:%.*]] = mul i32 [[X1]], [[X1]]
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x i32> [[TMP1]], i32 [[B:%.]], i32 1			; CHECK-NEXT: [[AB:%.]] = mul i32 [[A:%.]], [[B:%.*]]
	; CHECK-NEXT: [[TMP4:%.*]] = mul <2 x i32> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[INS1:%.*]] = insertelement <2 x i32> poison, i32 [[X1X1]], i32 0
	; CHECK-NEXT: ret <2 x i32> [[TMP4]]			; CHECK-NEXT: [[INS2:%.*]] = insertelement <2 x i32> [[INS1]], i32 [[AB]], i32 1
				; CHECK-NEXT: ret <2 x i32> [[INS2]]
	;			;
	%x1 = extractelement <2 x i32> %x, i32 1			%x1 = extractelement <2 x i32> %x, i32 1
	%x1x1 = mul i32 %x1, %x1			%x1x1 = mul i32 %x1, %x1
	%ab = mul i32 %a, %b			%ab = mul i32 %a, %b
	%ins1 = insertelement <2 x i32> poison, i32 %x1x1, i32 0			%ins1 = insertelement <2 x i32> poison, i32 %x1x1, i32 0
	%ins2 = insertelement <2 x i32> %ins1, i32 %ab, i32 1			%ins2 = insertelement <2 x i32> %ins1, i32 %ab, i32 1
	ret <2 x i32> %ins2			ret <2 x i32> %ins2
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve tryToGatherExtractElements by using per-register analysis.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 557957

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/extractelements-to-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_clear_undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hadd.ll

llvm/test/Transforms/SLPVectorizer/X86/hsub-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/hsub.ll

llvm/test/Transforms/SLPVectorizer/X86/reused-extractelements.ll

[SLP]Improve tryToGatherExtractElements by using per-register analysis.
ClosedPublic