This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2/4
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
gather-cost.ll
-
X86/
-
matched-shuffled-entries.ll

Differential D100495

[SLP] Add detection of shuffled/perfect matching of tree entries.
ClosedPublic

Authored by ABataev on Apr 14 2021, 11:16 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
dtemirbulatov
anton-afanasyev

Commits

rGaf870e11aed7: [SLP] Add detection of shuffled/perfect matching of tree entries.
rGdaf6e18c55c2: [SLP] Add detection of shuffled/perfect matching of tree entries.
rGb232771acad6: [SLP] Add detection of shuffled/perfect matching of tree entries.
rGd6fde913790d: [SLP]Add detection of shuffled/perfect matching of tree entries.

Summary

SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Apr 14 2021, 11:16 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 14 2021, 11:16 AM

ABataev requested review of this revision.Apr 14 2021, 11:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 11:16 AM

Harbormaster completed remote builds in B98724: Diff 337501.Apr 14 2021, 12:22 PM

RKSimon retitled this revision from [SLP]Add detectyion of shuffled/perfect matching of tree entries. to [SLP] Add detection of shuffled/perfect matching of tree entries..Apr 15 2021, 3:16 AM

A few comments along the theme of reducing code duplication

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4290	This is the same as the getTreeEntry code above - merge them?
4305	This shuffle kind decode code feels like the kind of thing it should be somewhere like Analysis\VectorUtils.h ?
4599	Can't we merge these? if we assert that (Entries.size() == 1 \|\| Entries.size() == 2) then Entries,.back() will return Entries.front() for a unary shuffle and Builder.CreateShuffleVector should handle the canonicalization.

RKSimon mentioned this in D100486: [COST]Improve cost model for shuffles in SLP..Apr 15 2021, 4:39 AM

Rebase + address comments.

Harbormaster completed remote builds in B99160: Diff 338089.Apr 16 2021, 7:19 AM

LGTM, cheers - I will looking at moving the shuffle kind decode out in a separate commit

This revision is now accepted and ready to land.Apr 16 2021, 7:30 AM

Closed by commit rGd6fde913790d: [SLP]Add detection of shuffled/perfect matching of tree entries. (authored by ABataev). · Explain WhyApr 19 2021, 1:48 PM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGd6fde913790d: [SLP]Add detection of shuffled/perfect matching of tree entries..

ABataev added a reverting change: rG803048106533: Revert "[SLP]Add detection of shuffled/perfect matching of tree entries.".Apr 19 2021, 2:10 PM

ABataev added a commit: rGb232771acad6: [SLP] Add detection of shuffled/perfect matching of tree entries..Apr 20 2021, 6:57 AM

ABataev added a reverting change: rGcf00cb8bed72: Revert "[SLP] Add detection of shuffled/perfect matching of tree entries.".Apr 20 2021, 7:16 AM

ABataev added a commit: rGdaf6e18c55c2: [SLP] Add detection of shuffled/perfect matching of tree entries..Apr 20 2021, 7:47 AM

ABataev added a reverting change: rGb82344a01949: Revert "[SLP] Add detection of shuffled/perfect matching of tree entries.".Apr 20 2021, 8:29 AM

ABataev added a commit: rGaf870e11aed7: [SLP] Add detection of shuffled/perfect matching of tree entries..Apr 20 2021, 9:09 AM

Seeing a miscompile after this has landed. Trying to reduce a test case, but no luck so far.

Herald added a subscriber: tmatheson. · View Herald TranscriptApr 23 2021, 2:55 AM

In D100495#2711644, @bkramer wrote:

Seeing a miscompile after this has landed. Trying to reduce a test case, but no luck so far.

Thanks for the report, waiting for the reproducer.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4305	Yes, that would be good, just need to agree where we should this, in `getShuffleCost` or somewhere else?

In D100495#2711644, @bkramer wrote:

Seeing a miscompile after this has landed. Trying to reduce a test case, but no luck so far.

Looks like I know the cause of the problem, will try to prepare a fix later today

In D100495#2711644, @bkramer wrote:

Seeing a miscompile after this has landed. Trying to reduce a test case, but no luck so far.

Hmm, looks like still need a reproducer because the possible problem I thought to exist in the patch, was fixed already. So, a possible miscompile is caused by another problem I don't currently see. Still need a reproducer. :(

In D100495#2711644, @bkramer wrote:

Seeing a miscompile after this has landed. Trying to reduce a test case, but no luck so far.

Any updates about a reproducer?

In D100495#2713228, @ABataev wrote:

In D100495#2711644, @bkramer wrote:

Seeing a miscompile after this has landed. Trying to reduce a test case, but no luck so far.

Any updates about a reproducer?

Here's the smallest reproducer we have so far.

Grab:

driver.cc: https://gist.github.com/hawkinsp/de3ee1adc72d9e4618e00cde5139a5c3
buffer-assignment.txt: https://gist.github.com/hawkinsp/01719b73adb20cccb8d06a57267d42a3
module.ll: https://gist.github.com/hawkinsp/cbe6e3d508f6bf8172d29bd0c6f986a0

and run (on a machine that supports AVX 512):

llvm-project/build/bin/clang  module.ll  driver.cc -o repro -march=skylake-avx512 -O3 -lstdc++ -lm
./repro buffer-assignment.txt

Before this PR (at the preceding commit) this repro produces:

Output:
0.502001, -0.462127, 0.459155, 0.10126, -0.0533019, -0.557239

After this PR, this repro produces:

Output:
0.519662, -0.450291, 0.492509, 0.0910437, -0.0563238, -0.522649

We expect this procedure to be accurate to something more like 1e-10, so this is a significant difference.

In D100495#2715441, @phawkins wrote:
In D100495#2713228, @ABataev wrote:

In D100495#2711644, @bkramer wrote:

Seeing a miscompile after this has landed. Trying to reduce a test case, but no luck so far.

Any updates about a reproducer?

Here's the smallest reproducer we have so far.

Grab:

driver.cc: https://gist.github.com/hawkinsp/de3ee1adc72d9e4618e00cde5139a5c3

buffer-assignment.txt: https://gist.github.com/hawkinsp/01719b73adb20cccb8d06a57267d42a3

module.ll: https://gist.github.com/hawkinsp/cbe6e3d508f6bf8172d29bd0c6f986a0

and run (on a machine that supports AVX 512):
llvm-project/build/bin/clang  module.ll  driver.cc -o repro -march=skylake-avx512 -O3 -lstdc++ -lm
./repro buffer-assignment.txt
Before this PR (at the preceding commit) this repro produces:
Output:
0.502001, -0.462127, 0.459155, 0.10126, -0.0533019, -0.557239
After this PR, this repro produces:
Output:
0.519662, -0.450291, 0.492509, 0.0910437, -0.0563238, -0.522649
We expect this procedure to be accurate to something more like 1e-10, so this is a significant difference.

Thanks for the reproducer, will investigate it and prepare a fix ASAP!

dtemirbulatov mentioned this in D57779: [SLP] Add support for throttling..Apr 25 2021, 5:40 PM

In D100495#2715441, @phawkins wrote:
In D100495#2713228, @ABataev wrote:

In D100495#2711644, @bkramer wrote:

Seeing a miscompile after this has landed. Trying to reduce a test case, but no luck so far.

Any updates about a reproducer?

Here's the smallest reproducer we have so far.

Grab:

driver.cc: https://gist.github.com/hawkinsp/de3ee1adc72d9e4618e00cde5139a5c3

buffer-assignment.txt: https://gist.github.com/hawkinsp/01719b73adb20cccb8d06a57267d42a3

module.ll: https://gist.github.com/hawkinsp/cbe6e3d508f6bf8172d29bd0c6f986a0

and run (on a machine that supports AVX 512):
llvm-project/build/bin/clang  module.ll  driver.cc -o repro -march=skylake-avx512 -O3 -lstdc++ -lm
./repro buffer-assignment.txt
Before this PR (at the preceding commit) this repro produces:
Output:
0.502001, -0.462127, 0.459155, 0.10126, -0.0533019, -0.557239
After this PR, this repro produces:
Output:
0.519662, -0.450291, 0.492509, 0.0910437, -0.0563238, -0.522649
We expect this procedure to be accurate to something more like 1e-10, so this is a significant difference.

Thanks for the reproducer again, investigated it. This patch is not a real cause of the issue, it just revealed again a known problem with masked gathers. Just after this patch, we started to vectorize more code with gathers. If I completely disable masked gathers in the code, the result I get is the next:

0.502001, -0.462127, 0.459155, 0.10126, -0.0533019, -0.557239

. And looks like it is correct. @anton-afanasyev, maybe there is a bug in lowering of masked gathers? And we really need to check if it is profitable/allowed to use masked gathers. I have initial patch for it in the non-power-2 patch, will prepare it as a separate patch later today.

@ABataev If you can email me the IR with/without the masked gathers I can investigate if there's a problem in the backend (or a later pass).

In D100495#2716713, @RKSimon wrote:

@ABataev If you can email me the IR with/without the masked gathers I can investigate if there's a problem in the backend (or a later pass).

What kind of reproducer do you need? The reproducer reported by @phawkins and attached here after SLP vectorizer with and without masked gathers?

In D100495#2716725, @ABataev wrote:

In D100495#2716713, @RKSimon wrote:

@ABataev If you can email me the IR with/without the masked gathers I can investigate if there's a problem in the backend (or a later pass).

What kind of reproducer do you need? The reproducer reported by @phawkins and attached here after SLP vectorizer with and without masked gathers?

Yes please, the IR files emitted after the SLP pass should be sufficient - how much diff is there other than with/without masked gathers ?

In D100495#2716748, @RKSimon wrote:

In D100495#2716725, @ABataev wrote:

In D100495#2716713, @RKSimon wrote:

@ABataev If you can email me the IR with/without the masked gathers I can investigate if there's a problem in the backend (or a later pass).

What kind of reproducer do you need? The reproducer reported by @phawkins and attached here after SLP vectorizer with and without masked gathers?

Yes please, the IR files emitted after the SLP pass should be sufficient - how much diff is there other than with/without masked gathers ?

Ok, no problem, just give me few minutes

repro.zip51 KBDownload

In D100495#2716748, @RKSimon wrote:

In D100495#2716725, @ABataev wrote:

In D100495#2716713, @RKSimon wrote:

@ABataev If you can email me the IR with/without the masked gathers I can investigate if there's a problem in the backend (or a later pass).

What kind of reproducer do you need? The reproducer reported by @phawkins and attached here after SLP vectorizer with and without masked gathers?

Yes please, the IR files emitted after the SLP pass should be sufficient - how much diff is there other than with/without masked gathers ?

Uploaded. They must be quite similar except for some cases where we do not vectorize small trees of height 2 with second gather entry.

anton-afanasyev mentioned this in D102023: [SLP]Do not count perfect diamond matches for gathers several times..May 7 2021, 10:36 PM

ABataev mentioned this in rG30463bc3f183: [SLP]Do not count perfect diamond matches for gathers several times..May 10 2021, 7:10 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

99 lines

test/

Transforms/

SLPVectorizer/

AArch64/

gather-cost.ll

2 lines

X86/

matched-shuffled-entries.ll

152 lines

Diff 338628

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,536 Lines • ▼ Show 20 Lines	private:
Value vectorizeTree(ArrayRef<Value > VL);		Value vectorizeTree(ArrayRef<Value > VL);

/// \returns the scalarization cost for this type. Scalarization in this		/// \returns the scalarization cost for this type. Scalarization in this
/// context means the creation of vectors from a group of scalars.		/// context means the creation of vectors from a group of scalars.
InstructionCost		InstructionCost
getGatherCost(FixedVectorType *Ty,		getGatherCost(FixedVectorType *Ty,
const DenseSet<unsigned> &ShuffledIndices) const;		const DenseSet<unsigned> &ShuffledIndices) const;

		/// Checks if the gathered \p VL can be represented as shuffle(s) of previous
		/// tree entries.
		/// \returns ShuffleKind, if gathered values can be represented as shuffles of
		/// previous tree entries. \p Mask is filled with the shuffle mask.
		Optional<TargetTransformInfo::ShuffleKind>
		isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,
		SmallVectorImpl<const TreeEntry *> &Entries);

/// \returns the scalarization cost for this list of values. Assuming that		/// \returns the scalarization cost for this list of values. Assuming that
/// this subtree gets vectorized, we may need to extract the values from the		/// this subtree gets vectorized, we may need to extract the values from the
/// roots. This method calculates the cost of extracting the values.		/// roots. This method calculates the cost of extracting the values.
InstructionCost getGatherCost(ArrayRef<Value *> VL) const;		InstructionCost getGatherCost(ArrayRef<Value *> VL) const;

/// Set the Builder insert point to one after the last instruction in		/// Set the Builder insert point to one after the last instruction in
/// the bundle		/// the bundle
void setInsertPointAfterBundle(TreeEntry *E);		void setInsertPointAfterBundle(TreeEntry *E);
▲ Show 20 Lines • Show All 2,002 Lines • ▼ Show 20 Lines	if (E->getOpcode() == Instruction::ExtractElement &&
cast<ExtractElementInst>(V)->getIndexOperand());		cast<ExtractElementInst>(V)->getIndexOperand());
Cost -= TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy,		Cost -= TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy,
IO->getZExtValue());		IO->getZExtValue());
}		}
}		}
return ReuseShuffleCost + Cost;		return ReuseShuffleCost + Cost;
}		}
}		}
return ReuseShuffleCost + getGatherCost(VL);		InstructionCost GatherCost = 0;
		SmallVector<int> Mask;
		SmallVector<const TreeEntry *> Entries;
		Optional<TargetTransformInfo::ShuffleKind> Shuffle =
		isGatherShuffledEntry(E, Mask, Entries);
		if (Shuffle.hasValue()) {
		if (ShuffleVectorInst::isIdentityMask(Mask)) {
		LLVM_DEBUG(
		dbgs()
		<< "SLP: perfect diamond match for gather bundle that starts with "
		<< *VL.front() << ".\n");
		} else {
		LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
		<< " entries for bundle that starts with "
		<< *VL.front() << ".\n");
		GatherCost = TTI->getShuffleCost(*Shuffle, VecTy, Mask);
		}
		} else {
		GatherCost = getGatherCost(VL);
		}
		return ReuseShuffleCost + GatherCost;
}		}
assert((E->State == TreeEntry::Vectorize \|\|		assert((E->State == TreeEntry::Vectorize \|\|
E->State == TreeEntry::ScatterVectorize) &&		E->State == TreeEntry::ScatterVectorize) &&
"Unhandled state");		"Unhandled state");
assert(E->getOpcode() && allSameType(VL) && allSameBlock(VL) && "Invalid VL");		assert(E->getOpcode() && allSameType(VL) && allSameBlock(VL) && "Invalid VL");
Instruction *VL0 = E->getMainOp();		Instruction *VL0 = E->getMainOp();
unsigned ShuffleOrOp =		unsigned ShuffleOrOp =
E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();		E->isAltShuffle() ? (unsigned)Instruction::ShuffleVector : E->getOpcode();
▲ Show 20 Lines • Show All 639 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
LLVM_DEBUG(dbgs() << Str);		LLVM_DEBUG(dbgs() << Str);
if (ViewSLPTree)		if (ViewSLPTree)
ViewGraph(this, "SLP" + F->getName(), false, Str);		ViewGraph(this, "SLP" + F->getName(), false, Str);
#endif		#endif

return Cost;		return Cost;
}		}

		Optional<TargetTransformInfo::ShuffleKind>
		BoUpSLP::isGatherShuffledEntry(const TreeEntry *TE, SmallVectorImpl<int> &Mask,
		SmallVectorImpl<const TreeEntry *> &Entries) {
		auto *VLIt = find_if(VectorizableTree,
		[TE](const std::unique_ptr<TreeEntry> &EntryPtr) {
		return EntryPtr.get() == TE;
		});
		assert(VLIt != VectorizableTree.end() &&
		"Gathered values should be in the tree.");
		Mask.clear();
		Entries.clear();
		DenseMap<const TreeEntry *, int> Used;
		int NumShuffles = 0;
		for (int I = 0, E = TE->Scalars.size(); I < E; ++I) {
		Value *V = TE->Scalars[I];
		const TreeEntry *VTE = getTreeEntry(V);
		if (!VTE) {
		// Check if it is used in one of the gathered entries.
		const auto *It =
		find_if(make_range(VectorizableTree.begin(), VLIt),
		[V](const std::unique_ptr<TreeEntry> &EntryPtr) {
		return EntryPtr->State == TreeEntry::NeedToGather &&
		is_contained(EntryPtr->Scalars, V);
		});
		if (It != VLIt)
		VTE = It->get();
		}
		if (VTE) {
		auto Res = Used.try_emplace(VTE, NumShuffles);
		if (Res.second) {
		Entries.push_back(VTE);
		++NumShuffles;
		}
		Mask.push_back(
		Res.first->second * E +
		std::distance(VTE->Scalars.begin(), find(VTE->Scalars, V)));
		continue;
		}
		return None;
		}
		if (NumShuffles == 1) {
		if (ShuffleVectorInst::isReverseMask(Mask))
		return TargetTransformInfo::SK_Reverse;
		return TargetTransformInfo::SK_PermuteSingleSrc;
		RKSimonUnsubmitted Done Reply Inline Actions This is the same as the getTreeEntry code above - merge them? RKSimon: This is the same as the getTreeEntry code above - merge them?
		}
		if (NumShuffles == 2) {
		if (ShuffleVectorInst::isSelectMask(Mask))
		return TargetTransformInfo::SK_Select;
		if (ShuffleVectorInst::isTransposeMask(Mask))
		return TargetTransformInfo::SK_Transpose;
		return TargetTransformInfo::SK_PermuteTwoSrc;
		}
		return None;
		}

InstructionCost		InstructionCost
BoUpSLP::getGatherCost(FixedVectorType *Ty,		BoUpSLP::getGatherCost(FixedVectorType *Ty,
const DenseSet<unsigned> &ShuffledIndices) const {		const DenseSet<unsigned> &ShuffledIndices) const {
unsigned NumElts = Ty->getNumElements();		unsigned NumElts = Ty->getNumElements();
		RKSimonUnsubmitted Not Done Reply Inline Actions This shuffle kind decode code feels like the kind of thing it should be somewhere like Analysis\VectorUtils.h ? RKSimon: This shuffle kind decode code feels like the kind of thing it should be somewhere like…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, that would be good, just need to agree where we should this, in `getShuffleCost` or somewhere else? ABataev: Yes, that would be good, just need to agree where we should this, in `getShuffleCost` or…
APInt DemandedElts = APInt::getNullValue(NumElts);		APInt DemandedElts = APInt::getNullValue(NumElts);
for (unsigned I = 0; I < NumElts; ++I)		for (unsigned I = 0; I < NumElts; ++I)
if (!ShuffledIndices.count(I))		if (!ShuffledIndices.count(I))
DemandedElts.setBit(I);		DemandedElts.setBit(I);
InstructionCost Cost =		InstructionCost Cost =
TTI->getScalarizationOverhead(Ty, DemandedElts, /Insert/ true,		TTI->getScalarizationOverhead(Ty, DemandedElts, /Insert/ true,
/Extract/ false);		/Extract/ false);
if (!ShuffledIndices.empty())		if (!ShuffledIndices.empty())
▲ Show 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

ShuffleInstructionBuilder ShuffleBuilder(Builder);		ShuffleInstructionBuilder ShuffleBuilder(Builder);
bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
Value *Vec = gather(E->Scalars);		Value *Vec;
		SmallVector<int> Mask;
		SmallVector<const TreeEntry *> Entries;
		Optional<TargetTransformInfo::ShuffleKind> Shuffle =
		isGatherShuffledEntry(E, Mask, Entries);
		if (Shuffle.hasValue()) {
		assert((Entries.size() == 1 \|\| Entries.size() == 2) &&
		"Expected shuffle of 1 or 2 entries.");
		Vec = Builder.CreateShuffleVector(Entries.front()->VectorizedValue,
		Entries.back()->VectorizedValue, Mask);
		} else {
		Vec = gather(E->Scalars);
		}
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
		RKSimonUnsubmitted Not Done Reply Inline Actions Can't we merge these? if we assert that (Entries.size() == 1 \|\| Entries.size() == 2) then Entries,.back() will return Entries.front() for a unary shuffle and Builder.CreateShuffleVector should handle the canonicalization. RKSimon: Can't we merge these? if we assert that (Entries.size() == 1 \|\| Entries.size() == 2) then…
Vec = ShuffleBuilder.finalize(Vec);		Vec = ShuffleBuilder.finalize(Vec);
if (auto *I = dyn_cast<Instruction>(Vec)) {		if (auto *I = dyn_cast<Instruction>(Vec)) {
GatherSeq.insert(I);		GatherSeq.insert(I);
CSEBlocks.insert(I->getParent());		CSEBlocks.insert(I->getParent());
}		}
}		}
E->VectorizedValue = Vec;		E->VectorizedValue = Vec;
return Vec;		return Vec;
▲ Show 20 Lines • Show All 3,419 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -slp-vectorizer -instcombine -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -S -slp-vectorizer -instcombine -pass-remarks-output=%t \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=REMARK %s			; RUN: cat %t \| FileCheck -check-prefix=REMARK %s
	; RUN: opt < %s -S -aa-pipeline=basic-aa -passes='slp-vectorizer,instcombine' -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -S -aa-pipeline=basic-aa -passes='slp-vectorizer,instcombine' -pass-remarks-output=%t \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=REMARK %s			; RUN: cat %t \| FileCheck -check-prefix=REMARK %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	; REMARK-LABEL: Function: gather_multiple_use			; REMARK-LABEL: Function: gather_multiple_use
	; REMARK: Args:			; REMARK: Args:
	; REMARK-NEXT: - String: 'Vectorized horizontal reduction with cost '			; REMARK-NEXT: - String: 'Vectorized horizontal reduction with cost '
	; REMARK-NEXT: - Cost: '-7'			; REMARK-NEXT: - Cost: '-16'
	;			;
	; REMARK-NOT: Function: gather_load			; REMARK-NOT: Function: gather_load

	define internal i32 @gather_multiple_use(i32 %a, i32 %b, i32 %c, i32 %d) {			define internal i32 @gather_multiple_use(i32 %a, i32 %b, i32 %c, i32 %d) {
	; CHECK-LABEL: @gather_multiple_use(			; CHECK-LABEL: @gather_multiple_use(
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A:%.]], i32 1
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 2			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 2
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s

	define i32 @bar() local_unnamed_addr {			define i32 @bar() local_unnamed_addr {
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADD103:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB104:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD105:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB106:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[SHR_I:%.*]] = lshr i32 [[ADD103]], 15
	; CHECK-NEXT: [[AND_I:%.*]] = and i32 [[SHR_I]], 65537
	; CHECK-NEXT: [[MUL_I:%.*]] = mul nuw i32 [[AND_I]], 65535
	; CHECK-NEXT: [[ADD_I:%.*]] = add i32 [[MUL_I]], [[ADD103]]
	; CHECK-NEXT: [[XOR_I:%.*]] = xor i32 [[ADD_I]], [[MUL_I]]
	; CHECK-NEXT: [[SHR_I64:%.*]] = lshr i32 [[ADD105]], 15
	; CHECK-NEXT: [[AND_I65:%.*]] = and i32 [[SHR_I64]], 65537
	; CHECK-NEXT: [[MUL_I66:%.*]] = mul nuw i32 [[AND_I65]], 65535
	; CHECK-NEXT: [[ADD_I67:%.*]] = add i32 [[MUL_I66]], [[ADD105]]
	; CHECK-NEXT: [[XOR_I68:%.*]] = xor i32 [[ADD_I67]], [[MUL_I66]]
	; CHECK-NEXT: [[SHR_I69:%.*]] = lshr i32 [[SUB104]], 15
	; CHECK-NEXT: [[AND_I70:%.*]] = and i32 [[SHR_I69]], 65537
	; CHECK-NEXT: [[MUL_I71:%.*]] = mul nuw i32 [[AND_I70]], 65535
	; CHECK-NEXT: [[ADD_I72:%.*]] = add i32 [[MUL_I71]], [[SUB104]]
	; CHECK-NEXT: [[XOR_I73:%.*]] = xor i32 [[ADD_I72]], [[MUL_I71]]
	; CHECK-NEXT: [[SHR_I74:%.*]] = lshr i32 [[SUB106]], 15
	; CHECK-NEXT: [[AND_I75:%.*]] = and i32 [[SHR_I74]], 65537
	; CHECK-NEXT: [[MUL_I76:%.*]] = mul nuw i32 [[AND_I75]], 65535
	; CHECK-NEXT: [[ADD_I77:%.*]] = add i32 [[MUL_I76]], [[SUB106]]
	; CHECK-NEXT: [[XOR_I78:%.*]] = xor i32 [[ADD_I77]], [[MUL_I76]]
	; CHECK-NEXT: [[ADD110:%.*]] = add i32 [[XOR_I68]], [[XOR_I]]
	; CHECK-NEXT: [[ADD112:%.*]] = add i32 [[ADD110]], [[XOR_I73]]
	; CHECK-NEXT: [[ADD113:%.*]] = add i32 [[ADD112]], [[XOR_I78]]
	; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD103_1:%.*]] = add nsw i32 [[ADD94_1]], [[ADD78_1]]
	; CHECK-NEXT: [[SUB104_1:%.*]] = sub nsw i32 [[ADD78_1]], [[ADD94_1]]
	; CHECK-NEXT: [[ADD105_1:%.*]] = add nsw i32 [[SUB102_1]], [[SUB86_1]]
	; CHECK-NEXT: [[SUB106_1:%.*]] = sub nsw i32 [[SUB86_1]], [[SUB102_1]]
	; CHECK-NEXT: [[SHR_I_1:%.*]] = lshr i32 [[ADD103_1]], 15
	; CHECK-NEXT: [[AND_I_1:%.*]] = and i32 [[SHR_I_1]], 65537
	; CHECK-NEXT: [[MUL_I_1:%.*]] = mul nuw i32 [[AND_I_1]], 65535
	; CHECK-NEXT: [[ADD_I_1:%.*]] = add i32 [[MUL_I_1]], [[ADD103_1]]
	; CHECK-NEXT: [[XOR_I_1:%.*]] = xor i32 [[ADD_I_1]], [[MUL_I_1]]
	; CHECK-NEXT: [[SHR_I64_1:%.*]] = lshr i32 [[ADD105_1]], 15
	; CHECK-NEXT: [[AND_I65_1:%.*]] = and i32 [[SHR_I64_1]], 65537
	; CHECK-NEXT: [[MUL_I66_1:%.*]] = mul nuw i32 [[AND_I65_1]], 65535
	; CHECK-NEXT: [[ADD_I67_1:%.*]] = add i32 [[MUL_I66_1]], [[ADD105_1]]
	; CHECK-NEXT: [[XOR_I68_1:%.*]] = xor i32 [[ADD_I67_1]], [[MUL_I66_1]]
	; CHECK-NEXT: [[SHR_I69_1:%.*]] = lshr i32 [[SUB104_1]], 15
	; CHECK-NEXT: [[AND_I70_1:%.*]] = and i32 [[SHR_I69_1]], 65537
	; CHECK-NEXT: [[MUL_I71_1:%.*]] = mul nuw i32 [[AND_I70_1]], 65535
	; CHECK-NEXT: [[ADD_I72_1:%.*]] = add i32 [[MUL_I71_1]], [[SUB104_1]]
	; CHECK-NEXT: [[XOR_I73_1:%.*]] = xor i32 [[ADD_I72_1]], [[MUL_I71_1]]
	; CHECK-NEXT: [[SHR_I74_1:%.*]] = lshr i32 [[SUB106_1]], 15
	; CHECK-NEXT: [[AND_I75_1:%.*]] = and i32 [[SHR_I74_1]], 65537
	; CHECK-NEXT: [[MUL_I76_1:%.*]] = mul nuw i32 [[AND_I75_1]], 65535
	; CHECK-NEXT: [[ADD_I77_1:%.*]] = add i32 [[MUL_I76_1]], [[SUB106_1]]
	; CHECK-NEXT: [[XOR_I78_1:%.*]] = xor i32 [[ADD_I77_1]], [[MUL_I76_1]]
	; CHECK-NEXT: [[ADD108_1:%.*]] = add i32 [[XOR_I68_1]], [[ADD113]]
	; CHECK-NEXT: [[ADD110_1:%.*]] = add i32 [[ADD108_1]], [[XOR_I_1]]
	; CHECK-NEXT: [[ADD112_1:%.*]] = add i32 [[ADD110_1]], [[XOR_I73_1]]
	; CHECK-NEXT: [[ADD113_1:%.*]] = add i32 [[ADD112_1]], [[XOR_I78_1]]
	; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[ADD103_2:%.*]] = add nsw i32 undef, [[ADD78_2]]
	; CHECK-NEXT: [[SUB104_2:%.*]] = sub nsw i32 [[ADD78_2]], undef
	; CHECK-NEXT: [[ADD105_2:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB106_2:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[SHR_I_2:%.*]] = lshr i32 [[ADD103_2]], 15
	; CHECK-NEXT: [[AND_I_2:%.*]] = and i32 [[SHR_I_2]], 65537
	; CHECK-NEXT: [[MUL_I_2:%.*]] = mul nuw i32 [[AND_I_2]], 65535
	; CHECK-NEXT: [[ADD_I_2:%.*]] = add i32 [[MUL_I_2]], [[ADD103_2]]
	; CHECK-NEXT: [[XOR_I_2:%.*]] = xor i32 [[ADD_I_2]], [[MUL_I_2]]
	; CHECK-NEXT: [[SHR_I64_2:%.*]] = lshr i32 [[ADD105_2]], 15
	; CHECK-NEXT: [[AND_I65_2:%.*]] = and i32 [[SHR_I64_2]], 65537
	; CHECK-NEXT: [[MUL_I66_2:%.*]] = mul nuw i32 [[AND_I65_2]], 65535
	; CHECK-NEXT: [[ADD_I67_2:%.*]] = add i32 [[MUL_I66_2]], [[ADD105_2]]
	; CHECK-NEXT: [[XOR_I68_2:%.*]] = xor i32 [[ADD_I67_2]], [[MUL_I66_2]]
	; CHECK-NEXT: [[SHR_I69_2:%.*]] = lshr i32 [[SUB104_2]], 15
	; CHECK-NEXT: [[AND_I70_2:%.*]] = and i32 [[SHR_I69_2]], 65537
	; CHECK-NEXT: [[MUL_I71_2:%.*]] = mul nuw i32 [[AND_I70_2]], 65535
	; CHECK-NEXT: [[ADD_I72_2:%.*]] = add i32 [[MUL_I71_2]], [[SUB104_2]]
	; CHECK-NEXT: [[XOR_I73_2:%.*]] = xor i32 [[ADD_I72_2]], [[MUL_I71_2]]
	; CHECK-NEXT: [[SHR_I74_2:%.*]] = lshr i32 [[SUB106_2]], 15
	; CHECK-NEXT: [[AND_I75_2:%.*]] = and i32 [[SHR_I74_2]], 65537
	; CHECK-NEXT: [[MUL_I76_2:%.*]] = mul nuw i32 [[AND_I75_2]], 65535
	; CHECK-NEXT: [[ADD_I77_2:%.*]] = add i32 [[MUL_I76_2]], [[SUB106_2]]
	; CHECK-NEXT: [[XOR_I78_2:%.*]] = xor i32 [[ADD_I77_2]], [[MUL_I76_2]]
	; CHECK-NEXT: [[ADD108_2:%.*]] = add i32 [[XOR_I68_2]], [[ADD113_1]]
	; CHECK-NEXT: [[ADD110_2:%.*]] = add i32 [[ADD108_2]], [[XOR_I_2]]
	; CHECK-NEXT: [[ADD112_2:%.*]] = add i32 [[ADD110_2]], [[XOR_I73_2]]
	; CHECK-NEXT: [[ADD113_2:%.*]] = add i32 [[ADD112_2]], [[XOR_I78_2]]
	; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD103_3:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[SUB102_3]], i32 0
	; CHECK-NEXT: [[SUB104_3:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 undef, i32 1
	; CHECK-NEXT: [[ADD105_3:%.*]] = add nsw i32 [[SUB102_3]], undef			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[SUB102_1]], i32 2
	; CHECK-NEXT: [[SUB106_3:%.*]] = sub nsw i32 undef, [[SUB102_3]]			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 undef, i32 3
	; CHECK-NEXT: [[SHR_I_3:%.*]] = lshr i32 [[ADD103_3]], 15			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 undef, i32 4
	; CHECK-NEXT: [[AND_I_3:%.*]] = and i32 [[SHR_I_3]], 65537			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> [[TMP4]], i32 undef, i32 5
	; CHECK-NEXT: [[MUL_I_3:%.*]] = mul nuw i32 [[AND_I_3]], 65535			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 undef, i32 6
	; CHECK-NEXT: [[ADD_I_3:%.*]] = add i32 [[MUL_I_3]], [[ADD103_3]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD94_1]], i32 7
	; CHECK-NEXT: [[XOR_I_3:%.*]] = xor i32 [[ADD_I_3]], [[MUL_I_3]]			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[ADD78_1]], i32 8
	; CHECK-NEXT: [[SHR_I64_3:%.*]] = lshr i32 [[ADD105_3]], 15			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB86_1]], i32 9
	; CHECK-NEXT: [[AND_I65_3:%.*]] = and i32 [[SHR_I64_3]], 65537			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 undef, i32 10
	; CHECK-NEXT: [[MUL_I66_3:%.*]] = mul nuw i32 [[AND_I65_3]], 65535			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[ADD78_2]], i32 11
	; CHECK-NEXT: [[ADD_I67_3:%.*]] = add i32 [[MUL_I66_3]], [[ADD105_3]]			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 undef, i32 12
	; CHECK-NEXT: [[XOR_I68_3:%.*]] = xor i32 [[ADD_I67_3]], [[MUL_I66_3]]			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 undef, i32 13
	; CHECK-NEXT: [[SHR_I69_3:%.*]] = lshr i32 [[SUB104_3]], 15			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <16 x i32> [[TMP13]], i32 undef, i32 14
	; CHECK-NEXT: [[AND_I70_3:%.*]] = and i32 [[SHR_I69_3]], 65537			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> [[TMP14]], i32 undef, i32 15
	; CHECK-NEXT: [[MUL_I71_3:%.*]] = mul nuw i32 [[AND_I70_3]], 65535			; CHECK-NEXT: [[TMP16:%.*]] = insertelement <16 x i32> <i32 undef, i32 undef, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>, i32 [[SUB86_1]], i32 2
	; CHECK-NEXT: [[ADD_I72_3:%.*]] = add i32 [[MUL_I71_3]], [[SUB104_3]]			; CHECK-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP16]], i32 undef, i32 3
	; CHECK-NEXT: [[XOR_I73_3:%.*]] = xor i32 [[ADD_I72_3]], [[MUL_I71_3]]			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <16 x i32> [[TMP17]], i32 undef, i32 4
	; CHECK-NEXT: [[SHR_I74_3:%.*]] = lshr i32 [[SUB106_3]], 15			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <16 x i32> [[TMP18]], i32 undef, i32 5
	; CHECK-NEXT: [[AND_I75_3:%.*]] = and i32 [[SHR_I74_3]], 65537			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <16 x i32> [[TMP19]], i32 undef, i32 6
	; CHECK-NEXT: [[MUL_I76_3:%.*]] = mul nuw i32 [[AND_I75_3]], 65535			; CHECK-NEXT: [[TMP21:%.*]] = insertelement <16 x i32> [[TMP20]], i32 [[ADD78_1]], i32 7
	; CHECK-NEXT: [[ADD_I77_3:%.*]] = add i32 [[MUL_I76_3]], [[SUB106_3]]			; CHECK-NEXT: [[TMP22:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[ADD94_1]], i32 8
	; CHECK-NEXT: [[XOR_I78_3:%.*]] = xor i32 [[ADD_I77_3]], [[MUL_I76_3]]			; CHECK-NEXT: [[TMP23:%.*]] = insertelement <16 x i32> [[TMP22]], i32 [[SUB102_1]], i32 9
	; CHECK-NEXT: [[ADD108_3:%.*]] = add i32 [[XOR_I68_3]], [[ADD113_2]]			; CHECK-NEXT: [[TMP24:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[ADD78_2]], i32 10
	; CHECK-NEXT: [[ADD110_3:%.*]] = add i32 [[ADD108_3]], [[XOR_I_3]]			; CHECK-NEXT: [[TMP25:%.*]] = insertelement <16 x i32> [[TMP24]], i32 undef, i32 11
	; CHECK-NEXT: [[ADD112_3:%.*]] = add i32 [[ADD110_3]], [[XOR_I73_3]]			; CHECK-NEXT: [[TMP26:%.*]] = insertelement <16 x i32> [[TMP25]], i32 undef, i32 12
	; CHECK-NEXT: [[ADD113_3:%.*]] = add i32 [[ADD112_3]], [[XOR_I78_3]]			; CHECK-NEXT: [[TMP27:%.*]] = insertelement <16 x i32> [[TMP26]], i32 undef, i32 13
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ADD113_3]], 16			; CHECK-NEXT: [[TMP28:%.*]] = insertelement <16 x i32> [[TMP27]], i32 undef, i32 14
				; CHECK-NEXT: [[TMP29:%.*]] = insertelement <16 x i32> [[TMP28]], i32 [[SUB102_3]], i32 15
				; CHECK-NEXT: [[TMP30:%.*]] = add nsw <16 x i32> [[TMP15]], [[TMP29]]
				; CHECK-NEXT: [[TMP31:%.*]] = sub nsw <16 x i32> [[TMP15]], [[TMP29]]
				; CHECK-NEXT: [[TMP32:%.*]] = shufflevector <16 x i32> [[TMP30]], <16 x i32> [[TMP31]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 21, i32 22, i32 7, i32 24, i32 25, i32 10, i32 27, i32 28, i32 13, i32 30, i32 31>
				; CHECK-NEXT: [[TMP33:%.*]] = lshr <16 x i32> [[TMP32]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
				; CHECK-NEXT: [[TMP34:%.*]] = and <16 x i32> [[TMP33]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
				; CHECK-NEXT: [[TMP35:%.*]] = mul nuw <16 x i32> [[TMP34]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
				; CHECK-NEXT: [[TMP36:%.*]] = add <16 x i32> [[TMP35]], [[TMP32]]
				; CHECK-NEXT: [[TMP37:%.*]] = xor <16 x i32> [[TMP36]], [[TMP35]]
				; CHECK-NEXT: [[TMP38:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP37]])
				; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP38]], 16
	; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 undef, [[SHR]]			; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 undef, [[SHR]]
	; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1			; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1
	; CHECK-NEXT: ret i32 [[SHR120]]			; CHECK-NEXT: ret i32 [[SHR120]]
	;			;
	entry:			entry:
	%add103 = add nsw i32 undef, undef			%add103 = add nsw i32 undef, undef
	%sub104 = sub nsw i32 undef, undef			%sub104 = sub nsw i32 undef, undef
	%add105 = add nsw i32 undef, undef			%add105 = add nsw i32 undef, undef
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Add detection of shuffled/perfect matching of tree entries.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 338628

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/gather-cost.ll

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

[SLP] Add detection of shuffled/perfect matching of tree entries.
ClosedPublic