This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2/4
VectorCombine.cpp
-
test/Transforms/VectorCombine/AArch64/
-
Transforms/
-
VectorCombine/
-
AArch64/
1/2
select-shuffle.ll

Differential D128732

[VectorCombine] Improve shuffle select shuffle-of-shuffles
ClosedPublic

Authored by dmgreen on Jun 28 2022, 7:13 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
labrinea
SjoerdMeijer
samtebbs

Commits

rG5493f8fc59ca: [VectorCombine] Improve shuffle select shuffle-of-shuffles
rG19a1e20b8a0f: [VectorCombine] Improve shuffle select shuffle-of-shuffles

Summary

This in an extension to the code added in D123911 which added vector combine folding of shuffle-select patterns, attempting to reduce the total amount of shuffling required in patterns like:

%x = shuffle %i1, %i2
%y = shuffle %i1, %i2
%a = binop %x, %y
%b = binop %x, %y
shuffle %a, %b, selectmask

This patch extends the handing of shuffles that are dependent on one another, which can arise from the SLP vectorizer, as-in:

%x = shuffle %i1, %i2
%y = shuffle %x

The input shuffles can also be emitted, in which case they are treated like identity shuffles. This patch also attempts to calculate a better ordering of input shuffles, which can help getting lower cost input shuffles, pushing complex shuffles further down the tree.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Jun 28 2022, 7:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 7:13 AM

Herald added subscribers: nlopes, hiraditya. · View Herald Transcript

dmgreen requested review of this revision.Jun 28 2022, 7:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 7:13 AM

Harbormaster completed remote builds in B172474: Diff 440597.Jun 28 2022, 7:13 AM

samtebbs added inline comments.Jul 1 2022, 6:24 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1421–1429	Do you think it would be worth making an anonymous function for this similar behaviour?
1501–1523	This has the same structure as `GetBaseMaskValue`, so I wonder if `GetMaskValue` can use this. If it makes it messier then having both is OK.
llvm/test/Transforms/VectorCombine/AArch64/select-shuffle.ll
578	Looks like more instructions are added here. Is that fine because it gets lowered to something better or of lower cost?

Updates for review comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1501–1523	Hmm. Good point, but I'm not sure how easy it is for them to share logic, unfortunately. They are each returning quite different values, and the parameters are different between the two methods.
llvm/test/Transforms/VectorCombine/AArch64/select-shuffle.ll
578	Yeah - More instructions is better in this case, as the shuffles are each simpler. It can be hard to see where this is an improvement and where it isn't from the tests. In general it can be a little difficult to be very precise, but most of the changes are improvements (and this one especially is much better as the shuffles are a lot easier to materialize).

Harbormaster completed remote builds in B173500: Diff 442017.Jul 4 2022, 12:15 AM

LGTM now, cheers.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1501–1523	Yeah looks like it would be messier to do that. Looks good to me as it is then.

This revision is now accepted and ready to land.Jul 4 2022, 2:44 AM

This revision was landed with ongoing or failed builds.Jul 4 2022, 5:39 AM

Closed by commit rG19a1e20b8a0f: [VectorCombine] Improve shuffle select shuffle-of-shuffles (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG19a1e20b8a0f: [VectorCombine] Improve shuffle select shuffle-of-shuffles.

nikic added a reverting change: rGb69c75d53f8d: Revert "[VectorCombine] Improve shuffle select shuffle-of-shuffles".Jul 5 2022, 12:31 AM

dmgreen added a commit: rG5493f8fc59ca: [VectorCombine] Improve shuffle select shuffle-of-shuffles.Jul 5 2022, 9:16 AM

Hi!

We get a memory corruption issue in our downstream testing with this patch. I've attached a reproducer, which can be run with opt -passes=vector-combine reduced.ll:

reduced.ll29 KBDownload

The issue occurs in X86TTIImpl::getShuffleCost:

unsigned E = *NumOfDests.getValue();
unsigned NormalizedVF =
    LegalVT.getVectorNumElements() * std::max(NumOfSrcs, E);
unsigned NumOfSrcRegs = NormalizedVF / LegalVT.getVectorNumElements();
unsigned NumOfDestRegs = NormalizedVF / LegalVT.getVectorNumElements();
SmallVector<int> NormalizedMask(NormalizedVF, UndefMaskElem);
copy(Mask, NormalizedMask.begin());

Mask has a size of 64, but NormalizedVF is only 16, so the copy tramples over the heap.

Thanks for the reproducer - I'll take a look now.

dmgreen mentioned this in rG4b7913c35733: [VectorCombine] Only consider shuffle uses with the same type..Jul 16 2022, 5:23 AM

There is hopefully a fix in rG4b7913c35733. Let me know if anything else shows up.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

VectorCombine.cpp

164 lines

test/

Transforms/

VectorCombine/

AArch64/

select-shuffle.ll

260 lines

Diff 442073

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

Show First 20 Lines • Show All 1,243 Lines • ▼ Show 20 Lines	bool VectorCombine::foldSelectShuffle(Instruction &I, bool FromReduction) {
auto *VT = dyn_cast<FixedVectorType>(I.getType());		auto *VT = dyn_cast<FixedVectorType>(I.getType());
if (!SVI \|\| !VT)		if (!SVI \|\| !VT)
return false;		return false;
auto *Op0 = dyn_cast<Instruction>(SVI->getOperand(0));		auto *Op0 = dyn_cast<Instruction>(SVI->getOperand(0));
auto *Op1 = dyn_cast<Instruction>(SVI->getOperand(1));		auto *Op1 = dyn_cast<Instruction>(SVI->getOperand(1));
if (!Op0 \|\| !Op1 \|\| Op0 == Op1 \|\| !Op0->isBinaryOp() \|\| !Op1->isBinaryOp() \|\|		if (!Op0 \|\| !Op1 \|\| Op0 == Op1 \|\| !Op0->isBinaryOp() \|\| !Op1->isBinaryOp() \|\|
VT != Op0->getType())		VT != Op0->getType())
return false;		return false;
auto *SVI0A = dyn_cast<ShuffleVectorInst>(Op0->getOperand(0));		auto *SVI0A = dyn_cast<Instruction>(Op0->getOperand(0));
auto *SVI0B = dyn_cast<ShuffleVectorInst>(Op0->getOperand(1));		auto *SVI0B = dyn_cast<Instruction>(Op0->getOperand(1));
auto *SVI1A = dyn_cast<ShuffleVectorInst>(Op1->getOperand(0));		auto *SVI1A = dyn_cast<Instruction>(Op1->getOperand(0));
auto *SVI1B = dyn_cast<ShuffleVectorInst>(Op1->getOperand(1));		auto *SVI1B = dyn_cast<Instruction>(Op1->getOperand(1));
		SmallPtrSet<Instruction *, 4> InputShuffles({SVI0A, SVI0B, SVI1A, SVI1B});
auto checkSVNonOpUses = [&](Instruction *I) {		auto checkSVNonOpUses = [&](Instruction *I) {
if (!I \|\| I->getOperand(0)->getType() != VT)		if (!I \|\| I->getOperand(0)->getType() != VT)
return true;		return true;
return any_of(I->users(), [&](User *U) { return U != Op0 && U != Op1; });		return any_of(I->users(), [&](User *U) {
		return U != Op0 && U != Op1 &&
		!(isa<ShuffleVectorInst>(U) &&
		(InputShuffles.contains(cast<Instruction>(U)) \|\|
		isInstructionTriviallyDead(cast<Instruction>(U))));
		});
};		};
if (checkSVNonOpUses(SVI0A) \|\| checkSVNonOpUses(SVI0B) \|\|		if (checkSVNonOpUses(SVI0A) \|\| checkSVNonOpUses(SVI0B) \|\|
checkSVNonOpUses(SVI1A) \|\| checkSVNonOpUses(SVI1B))		checkSVNonOpUses(SVI1A) \|\| checkSVNonOpUses(SVI1B))
return false;		return false;

// Collect all the uses that are shuffles that we can transform together. We		// Collect all the uses that are shuffles that we can transform together. We
// may not have a single shuffle, but a group that can all be transformed		// may not have a single shuffle, but a group that can all be transformed
// together profitably.		// together profitably.
Show All 10 Lines	bool VectorCombine::foldSelectShuffle(Instruction &I, bool FromReduction) {
};		};
if (!collectShuffles(Op0) \|\| !collectShuffles(Op1))		if (!collectShuffles(Op0) \|\| !collectShuffles(Op1))
return false;		return false;
// From a reduction, we need to be processing a single shuffle, otherwise the		// From a reduction, we need to be processing a single shuffle, otherwise the
// other uses will not be lane-invariant.		// other uses will not be lane-invariant.
if (FromReduction && Shuffles.size() > 1)		if (FromReduction && Shuffles.size() > 1)
return false;		return false;

		// Add any shuffle uses for the shuffles we have found, to include them in our
		// cost calculations.
		if (!FromReduction) {
		for (ShuffleVectorInst *SV : Shuffles) {
		for (auto U : SV->users()) {
		ShuffleVectorInst *SSV = dyn_cast<ShuffleVectorInst>(U);
		if (SSV && isa<UndefValue>(SSV->getOperand(1)))
		Shuffles.push_back(SSV);
		}
		}
		}

// For each of the output shuffles, we try to sort all the first vector		// For each of the output shuffles, we try to sort all the first vector
// elements to the beginning, followed by the second array elements at the		// elements to the beginning, followed by the second array elements at the
// end. If the binops are legalized to smaller vectors, this may reduce total		// end. If the binops are legalized to smaller vectors, this may reduce total
// number of binops. We compute the ReconstructMask mask needed to convert		// number of binops. We compute the ReconstructMask mask needed to convert
// back to the original lane order.		// back to the original lane order.
SmallVector<int> V1, V2;		SmallVector<std::pair<int, int>> V1, V2;
SmallVector<SmallVector<int>> ReconstructMasks;		SmallVector<SmallVector<int>> OrigReconstructMasks;
int MaxV1Elt = 0, MaxV2Elt = 0;		int MaxV1Elt = 0, MaxV2Elt = 0;
unsigned NumElts = VT->getNumElements();		unsigned NumElts = VT->getNumElements();
for (ShuffleVectorInst *SVN : Shuffles) {		for (ShuffleVectorInst *SVN : Shuffles) {
SmallVector<int> Mask;		SmallVector<int> Mask;
SVN->getShuffleMask(Mask);		SVN->getShuffleMask(Mask);

// Check the operands are the same as the original, or reversed (in which		// Check the operands are the same as the original, or reversed (in which
// case we need to commute the mask).		// case we need to commute the mask).
Value *SVOp0 = SVN->getOperand(0);		Value *SVOp0 = SVN->getOperand(0);
Value *SVOp1 = SVN->getOperand(1);		Value *SVOp1 = SVN->getOperand(1);
		if (isa<UndefValue>(SVOp1)) {
		auto *SSV = cast<ShuffleVectorInst>(SVOp0);
		SVOp0 = SSV->getOperand(0);
		SVOp1 = SSV->getOperand(1);
		for (unsigned I = 0, E = Mask.size(); I != E; I++)
		Mask[I] = Mask[I] < 0 ? Mask[I] : SSV->getMaskValue(Mask[I]);
		}
if (SVOp0 == Op1 && SVOp1 == Op0) {		if (SVOp0 == Op1 && SVOp1 == Op0) {
std::swap(SVOp0, SVOp1);		std::swap(SVOp0, SVOp1);
ShuffleVectorInst::commuteShuffleMask(Mask, NumElts);		ShuffleVectorInst::commuteShuffleMask(Mask, NumElts);
}		}
if (SVOp0 != Op0 \|\| SVOp1 != Op1)		if (SVOp0 != Op0 \|\| SVOp1 != Op1)
return false;		return false;

// Calculate the reconstruction mask for this shuffle, as the mask needed to		// Calculate the reconstruction mask for this shuffle, as the mask needed to
// take the packed values from Op0/Op1 and reconstructing to the original		// take the packed values from Op0/Op1 and reconstructing to the original
// order.		// order.
SmallVector<int> ReconstructMask;		SmallVector<int> ReconstructMask;
for (unsigned I = 0; I < Mask.size(); I++) {		for (unsigned I = 0; I < Mask.size(); I++) {
if (Mask[I] < 0) {		if (Mask[I] < 0) {
ReconstructMask.push_back(-1);		ReconstructMask.push_back(-1);
} else if (Mask[I] < static_cast<int>(NumElts)) {		} else if (Mask[I] < static_cast<int>(NumElts)) {
MaxV1Elt = std::max(MaxV1Elt, Mask[I]);		MaxV1Elt = std::max(MaxV1Elt, Mask[I]);
auto It = find(V1, Mask[I]);		auto It = find_if(V1, [&](const std::pair<int, int> &A) {
		return Mask[I] == A.first;
		});
if (It != V1.end())		if (It != V1.end())
ReconstructMask.push_back(It - V1.begin());		ReconstructMask.push_back(It - V1.begin());
else {		else {
ReconstructMask.push_back(V1.size());		ReconstructMask.push_back(V1.size());
V1.push_back(Mask[I]);		V1.emplace_back(Mask[I], V1.size());
}		}
} else {		} else {
MaxV2Elt = std::max<int>(MaxV2Elt, Mask[I] - NumElts);		MaxV2Elt = std::max<int>(MaxV2Elt, Mask[I] - NumElts);
auto It = find(V2, Mask[I] - NumElts);		auto It = find_if(V2, [&](const std::pair<int, int> &A) {
		return Mask[I] - static_cast<int>(NumElts) == A.first;
		});
if (It != V2.end())		if (It != V2.end())
ReconstructMask.push_back(NumElts + It - V2.begin());		ReconstructMask.push_back(NumElts + It - V2.begin());
else {		else {
ReconstructMask.push_back(NumElts + V2.size());		ReconstructMask.push_back(NumElts + V2.size());
V2.push_back(Mask[I] - NumElts);		V2.emplace_back(Mask[I] - NumElts, NumElts + V2.size());
}		}
}		}
}		}

// For reductions, we know that the lane ordering out doesn't alter the		// For reductions, we know that the lane ordering out doesn't alter the
// result. In-order can help simplify the shuffle away.		// result. In-order can help simplify the shuffle away.
if (FromReduction)		if (FromReduction)
sort(ReconstructMask);		sort(ReconstructMask);
ReconstructMasks.push_back(ReconstructMask);		OrigReconstructMasks.push_back(std::move(ReconstructMask));
}		}

// If the Maximum element used from V1 and V2 are not larger than the new		// If the Maximum element used from V1 and V2 are not larger than the new
// vectors, the vectors are already packes and performing the optimization		// vectors, the vectors are already packes and performing the optimization
// again will likely not help any further. This also prevents us from getting		// again will likely not help any further. This also prevents us from getting
// stuck in a cycle in case the costs do not also rule it out.		// stuck in a cycle in case the costs do not also rule it out.
if (V1.empty() \|\| V2.empty() \|\|		if (V1.empty() \|\| V2.empty() \|\|
(MaxV1Elt == static_cast<int>(V1.size()) - 1 &&		(MaxV1Elt == static_cast<int>(V1.size()) - 1 &&
MaxV2Elt == static_cast<int>(V2.size()) - 1))		MaxV2Elt == static_cast<int>(V2.size()) - 1))
return false;		return false;

		// GetBaseMaskValue takes one of the inputs, which may either be a shuffle, a
		// shuffle of another shuffle, or not a shuffle (that is treated like a
		// identity shuffle).
		auto GetBaseMaskValue = [&](Instruction *I, int M) {
		auto *SV = dyn_cast<ShuffleVectorInst>(I);
		if (!SV)
		return M;
		if (isa<UndefValue>(SV->getOperand(1)))
		if (auto *SSV = dyn_cast<ShuffleVectorInst>(SV->getOperand(0)))
		if (InputShuffles.contains(SSV))
		return SSV->getMaskValue(SV->getMaskValue(M));
		return SV->getMaskValue(M);
		};

		// Attempt to sort the inputs my ascending mask values to make simpler input
		// shuffles and push complex shuffles down to the uses. We sort on the first
		// of the two input shuffle orders, to try and get at least one input into a
		// nice order.
		auto SortBase = [&](Instruction *A, std::pair<int, int> X,
		std::pair<int, int> Y) {
		int MXA = GetBaseMaskValue(A, X.first);
		int MYA = GetBaseMaskValue(A, Y.first);
		return MXA < MYA;
		};
		stable_sort(V1, [&](std::pair<int, int> A, std::pair<int, int> B) {
		return SortBase(SVI0A, A, B);
		});
		stable_sort(V2, [&](std::pair<int, int> A, std::pair<int, int> B) {
		return SortBase(SVI1A, A, B);
		});
		// Calculate our ReconstructMasks from the OrigReconstructMasks and the
		// modified order of the input shuffles.
		SmallVector<SmallVector<int>> ReconstructMasks;
		for (auto Mask : OrigReconstructMasks) {
		SmallVector<int> ReconstructMask;
		for (int M : Mask) {
		auto FindIndex = [](const SmallVector<std::pair<int, int>> &V, int M) {
		auto It = find_if(V, [M](auto A) { return A.second == M; });
		assert(It != V.end() && "Expected all entries in Mask");
		return std::distance(V.begin(), It);
		};
		if (M < 0)
		ReconstructMask.push_back(-1);
		else if (M < static_cast<int>(NumElts)) {
		ReconstructMask.push_back(FindIndex(V1, M));
		} else {
		ReconstructMask.push_back(NumElts + FindIndex(V2, M));
		samtebbsUnsubmitted Done Reply Inline Actions Do you think it would be worth making an anonymous function for this similar behaviour? samtebbs: Do you think it would be worth making an anonymous function for this similar behaviour?
		}
		}
		ReconstructMasks.push_back(std::move(ReconstructMask));
		}

// Calculate the masks needed for the new input shuffles, which get padded		// Calculate the masks needed for the new input shuffles, which get padded
// with undef		// with undef
SmallVector<int> V1A, V1B, V2A, V2B;		SmallVector<int> V1A, V1B, V2A, V2B;
for (unsigned I = 0; I < V1.size(); I++) {		for (unsigned I = 0; I < V1.size(); I++) {
V1A.push_back(SVI0A->getMaskValue(V1[I]));		V1A.push_back(GetBaseMaskValue(SVI0A, V1[I].first));
V1B.push_back(SVI0B->getMaskValue(V1[I]));		V1B.push_back(GetBaseMaskValue(SVI0B, V1[I].first));
}		}
for (unsigned I = 0; I < V2.size(); I++) {		for (unsigned I = 0; I < V2.size(); I++) {
V2A.push_back(SVI1A->getMaskValue(V2[I]));		V2A.push_back(GetBaseMaskValue(SVI1A, V2[I].first));
V2B.push_back(SVI1B->getMaskValue(V2[I]));		V2B.push_back(GetBaseMaskValue(SVI1B, V2[I].first));
}		}
while (V1A.size() < NumElts) {		while (V1A.size() < NumElts) {
V1A.push_back(UndefMaskElem);		V1A.push_back(UndefMaskElem);
V1B.push_back(UndefMaskElem);		V1B.push_back(UndefMaskElem);
}		}
while (V2A.size() < NumElts) {		while (V2A.size() < NumElts) {
V2A.push_back(UndefMaskElem);		V2A.push_back(UndefMaskElem);
V2B.push_back(UndefMaskElem);		V2B.push_back(UndefMaskElem);
}		}

auto AddShuffleCost = [&](InstructionCost C, ShuffleVectorInst *SV) {		auto AddShuffleCost = [&](InstructionCost C, Instruction *I) {
return C +		auto *SV = dyn_cast<ShuffleVectorInst>(I);
TTI.getShuffleCost(TTI::SK_PermuteTwoSrc, VT, SV->getShuffleMask());		if (!SV)
		return C;
		return C + TTI.getShuffleCost(isa<UndefValue>(SV->getOperand(1))
		? TTI::SK_PermuteSingleSrc
		: TTI::SK_PermuteTwoSrc,
		VT, SV->getShuffleMask());
};		};
auto AddShuffleMaskCost = [&](InstructionCost C, ArrayRef<int> Mask) {		auto AddShuffleMaskCost = [&](InstructionCost C, ArrayRef<int> Mask) {
return C + TTI.getShuffleCost(TTI::SK_PermuteTwoSrc, VT, Mask);		return C + TTI.getShuffleCost(TTI::SK_PermuteTwoSrc, VT, Mask);
};		};

// Get the costs of the shuffles + binops before and after with the new		// Get the costs of the shuffles + binops before and after with the new
// shuffle masks.		// shuffle masks.
InstructionCost CostBefore =		InstructionCost CostBefore =
TTI.getArithmeticInstrCost(Op0->getOpcode(), VT) +		TTI.getArithmeticInstrCost(Op0->getOpcode(), VT) +
TTI.getArithmeticInstrCost(Op1->getOpcode(), VT);		TTI.getArithmeticInstrCost(Op1->getOpcode(), VT);
CostBefore += std::accumulate(Shuffles.begin(), Shuffles.end(),		CostBefore += std::accumulate(Shuffles.begin(), Shuffles.end(),
InstructionCost(0), AddShuffleCost);		InstructionCost(0), AddShuffleCost);
// This set helps us only cost each unique shuffle once.
SmallPtrSet<ShuffleVectorInst *, 4> InputShuffles(
{SVI0A, SVI0B, SVI1A, SVI1B});
CostBefore += std::accumulate(InputShuffles.begin(), InputShuffles.end(),		CostBefore += std::accumulate(InputShuffles.begin(), InputShuffles.end(),
InstructionCost(0), AddShuffleCost);		InstructionCost(0), AddShuffleCost);

// The new binops will be unused for lanes past the used shuffle lengths.		// The new binops will be unused for lanes past the used shuffle lengths.
// These types attempt to get the correct cost for that from the target.		// These types attempt to get the correct cost for that from the target.
FixedVectorType *Op0SmallVT =		FixedVectorType *Op0SmallVT =
FixedVectorType::get(VT->getScalarType(), V1.size());		FixedVectorType::get(VT->getScalarType(), V1.size());
FixedVectorType *Op1SmallVT =		FixedVectorType *Op1SmallVT =
FixedVectorType::get(VT->getScalarType(), V2.size());		FixedVectorType::get(VT->getScalarType(), V2.size());
InstructionCost CostAfter =		InstructionCost CostAfter =
TTI.getArithmeticInstrCost(Op0->getOpcode(), Op0SmallVT) +		TTI.getArithmeticInstrCost(Op0->getOpcode(), Op0SmallVT) +
TTI.getArithmeticInstrCost(Op1->getOpcode(), Op1SmallVT);		TTI.getArithmeticInstrCost(Op1->getOpcode(), Op1SmallVT);
CostAfter += std::accumulate(ReconstructMasks.begin(), ReconstructMasks.end(),		CostAfter += std::accumulate(ReconstructMasks.begin(), ReconstructMasks.end(),
InstructionCost(0), AddShuffleMaskCost);		InstructionCost(0), AddShuffleMaskCost);
std::set<SmallVector<int>> OutputShuffleMasks({V1A, V1B, V2A, V2B});		std::set<SmallVector<int>> OutputShuffleMasks({V1A, V1B, V2A, V2B});
CostAfter +=		CostAfter +=
std::accumulate(OutputShuffleMasks.begin(), OutputShuffleMasks.end(),		std::accumulate(OutputShuffleMasks.begin(), OutputShuffleMasks.end(),
InstructionCost(0), AddShuffleMaskCost);		InstructionCost(0), AddShuffleMaskCost);

		LLVM_DEBUG(dbgs() << "Found a binop select shuffle pattern: " << I << "\n");
		LLVM_DEBUG(dbgs() << " CostBefore: " << CostBefore
		<< " vs CostAfter: " << CostAfter << "\n");
if (CostBefore <= CostAfter)		if (CostBefore <= CostAfter)
return false;		return false;

// The cost model has passed, create the new instructions.		// The cost model has passed, create the new instructions.
Builder.SetInsertPoint(SVI0A);		auto GetShuffleOperand = [&](Instruction I, unsigned Op) -> Value {
Value *NSV0A = Builder.CreateShuffleVector(SVI0A->getOperand(0),		auto *SV = dyn_cast<ShuffleVectorInst>(I);
SVI0A->getOperand(1), V1A);		if (!SV)
Builder.SetInsertPoint(SVI0B);		return I;
Value *NSV0B = Builder.CreateShuffleVector(SVI0B->getOperand(0),		if (isa<UndefValue>(SV->getOperand(1)))
SVI0B->getOperand(1), V1B);		if (auto *SSV = dyn_cast<ShuffleVectorInst>(SV->getOperand(0)))
Builder.SetInsertPoint(SVI1A);		if (InputShuffles.contains(SSV))
Value *NSV1A = Builder.CreateShuffleVector(SVI1A->getOperand(0),		return SSV->getOperand(Op);
SVI1A->getOperand(1), V2A);		return SV->getOperand(Op);
Builder.SetInsertPoint(SVI1B);		};
Value *NSV1B = Builder.CreateShuffleVector(SVI1B->getOperand(0),		Builder.SetInsertPoint(SVI0A->getNextNode());
SVI1B->getOperand(1), V2B);		Value *NSV0A = Builder.CreateShuffleVector(GetShuffleOperand(SVI0A, 0),
		GetShuffleOperand(SVI0A, 1), V1A);
		Builder.SetInsertPoint(SVI0B->getNextNode());
		Value *NSV0B = Builder.CreateShuffleVector(GetShuffleOperand(SVI0B, 0),
		GetShuffleOperand(SVI0B, 1), V1B);
		Builder.SetInsertPoint(SVI1A->getNextNode());
		Value *NSV1A = Builder.CreateShuffleVector(GetShuffleOperand(SVI1A, 0),
		GetShuffleOperand(SVI1A, 1), V2A);
		Builder.SetInsertPoint(SVI1B->getNextNode());
		Value *NSV1B = Builder.CreateShuffleVector(GetShuffleOperand(SVI1B, 0),
		GetShuffleOperand(SVI1B, 1), V2B);
Builder.SetInsertPoint(Op0);		Builder.SetInsertPoint(Op0);
		samtebbsUnsubmitted Not Done Reply Inline Actions This has the same structure as `GetBaseMaskValue`, so I wonder if `GetMaskValue` can use this. If it makes it messier then having both is OK. samtebbs: This has the same structure as `GetBaseMaskValue`, so I wonder if `GetMaskValue` can use this.
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Hmm. Good point, but I'm not sure how easy it is for them to share logic, unfortunately. They are each returning quite different values, and the parameters are different between the two methods. dmgreen: Hmm. Good point, but I'm not sure how easy it is for them to share logic, unfortunately. They…
		samtebbsUnsubmitted Not Done Reply Inline Actions Yeah looks like it would be messier to do that. Looks good to me as it is then. samtebbs: Yeah looks like it would be messier to do that. Looks good to me as it is then.
Value *NOp0 = Builder.CreateBinOp((Instruction::BinaryOps)Op0->getOpcode(),		Value *NOp0 = Builder.CreateBinOp((Instruction::BinaryOps)Op0->getOpcode(),
NSV0A, NSV0B);		NSV0A, NSV0B);
if (auto *I = dyn_cast<Instruction>(NOp0))		if (auto *I = dyn_cast<Instruction>(NOp0))
I->copyIRFlags(Op0, true);		I->copyIRFlags(Op0, true);
Builder.SetInsertPoint(Op1);		Builder.SetInsertPoint(Op1);
Value *NOp1 = Builder.CreateBinOp((Instruction::BinaryOps)Op1->getOpcode(),		Value *NOp1 = Builder.CreateBinOp((Instruction::BinaryOps)Op1->getOpcode(),
NSV1A, NSV1B);		NSV1A, NSV1B);
if (auto *I = dyn_cast<Instruction>(NOp1))		if (auto *I = dyn_cast<Instruction>(NOp1))
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/test/Transforms/VectorCombine/AArch64/select-shuffle.ll

Show All 16 Lines	;
%a = add nsw <16 x i32> %s1, %s2		%a = add nsw <16 x i32> %s1, %s2
%b = sub nsw <16 x i32> %s1, %s2		%b = sub nsw <16 x i32> %s1, %s2
%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 21, i32 22, i32 7, i32 24, i32 25, i32 10, i32 27, i32 28, i32 13, i32 30, i32 31>		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 21, i32 22, i32 7, i32 24, i32 25, i32 10, i32 27, i32 28, i32 13, i32 30, i32 31>
ret <16 x i32> %s3		ret <16 x i32> %s3
}		}

define i32 @test1_reduce(<16 x i32> %x, <16 x i32> %y) {		define i32 @test1_reduce(<16 x i32> %x, <16 x i32> %y) {
; CHECK-LABEL: @test1_reduce(		; CHECK-LABEL: @test1_reduce(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 20, i32 23, i32 26, i32 29, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 6, i32 9, i32 12, i32 15, i32 21, i32 24, i32 27, i32 30, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 21, i32 6, i32 24, i32 9, i32 27, i32 12, i32 30, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 20, i32 23, i32 26, i32 29, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 31, i32 28, i32 25, i32 22, i32 5, i32 8, i32 11, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 19, i32 18, i32 17, i32 16, i32 4, i32 7, i32 10, i32 13, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 4, i32 19, i32 7, i32 18, i32 10, i32 17, i32 13, i32 16, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 31, i32 28, i32 25, i32 22, i32 5, i32 8, i32 11, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP1]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[S3]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[S3]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32 6, i32 23, i32 24, i32 9, i32 26, i32 27, i32 12, i32 29, i32 30, i32 15>		%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32 6, i32 23, i32 24, i32 9, i32 26, i32 27, i32 12, i32 29, i32 30, i32 15>
%s2 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 31, i32 28, i32 25, i32 22, i32 5, i32 4, i32 19, i32 8, i32 7, i32 18, i32 11, i32 10, i32 17, i32 14, i32 13, i32 16>		%s2 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 31, i32 28, i32 25, i32 22, i32 5, i32 4, i32 19, i32 8, i32 7, i32 18, i32 11, i32 10, i32 17, i32 14, i32 13, i32 16>
%a = add nsw <16 x i32> %s1, %s2		%a = add nsw <16 x i32> %s1, %s2
%b = sub nsw <16 x i32> %s1, %s2		%b = sub nsw <16 x i32> %s1, %s2
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	;
%a = add nsw <16 x i32> %s1, %s2		%a = add nsw <16 x i32> %s1, %s2
%b = sub nsw <16 x i32> %s1, %s2		%b = sub nsw <16 x i32> %s1, %s2
%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32 6, i32 23, i32 24, i32 9, i32 26, i32 27, i32 12, i32 29, i32 30, i32 15>		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32 6, i32 23, i32 24, i32 9, i32 26, i32 27, i32 12, i32 29, i32 30, i32 15>
ret <16 x i32> %s3		ret <16 x i32> %s3
}		}

define <16 x i32> @test2_2(<16 x i32> %x, <16 x i32> %y) {		define <16 x i32> @test2_2(<16 x i32> %x, <16 x i32> %y) {
; CHECK-LABEL: @test2_2(		; CHECK-LABEL: @test2_2(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 21, i32 20, i32 24, i32 23, i32 11, i32 10, i32 14, i32 13, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 0, i32 1, i32 12, i32 15, i32 18, i32 19, i32 22, i32 25, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 15, i32 12, i32 25, i32 22, i32 19, i32 18, i32 1, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 10, i32 11, i32 13, i32 14, i32 20, i32 21, i32 23, i32 24, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 6, i32 3, i32 9, i32 2, i32 28, i32 17, i32 31, i32 16, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 29, i32 26, i32 27, i32 30, i32 7, i32 4, i32 5, i32 8, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 30, i32 27, i32 8, i32 5, i32 4, i32 7, i32 26, i32 29, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 17, i32 28, i32 16, i32 31, i32 3, i32 6, i32 2, i32 9, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP1]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 0, i32 1, i32 20, i32 2, i32 3, i32 21, i32 4, i32 5, i32 22, i32 6, i32 7, i32 23>		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 19, i32 18, i32 23, i32 22, i32 5, i32 4, i32 21, i32 7, i32 6, i32 20, i32 1, i32 0, i32 17, i32 3, i32 2, i32 16>
; CHECK-NEXT: ret <16 x i32> [[S3]]		; CHECK-NEXT: ret <16 x i32> [[S3]]
;		;
%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%s2 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>		%s2 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>
%a = add nsw <16 x i32> %s1, %s2		%a = add nsw <16 x i32> %s1, %s2
%b = sub nsw <16 x i32> %s1, %s2		%b = sub nsw <16 x i32> %s1, %s2
%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 31, i32 28, i32 25, i32 22, i32 5, i32 4, i32 19, i32 8, i32 7, i32 18, i32 11, i32 10, i32 17, i32 14, i32 13, i32 16>		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 31, i32 28, i32 25, i32 22, i32 5, i32 4, i32 19, i32 8, i32 7, i32 18, i32 11, i32 10, i32 17, i32 14, i32 13, i32 16>
ret <16 x i32> %s3		ret <16 x i32> %s3
Show All 17 Lines	;
%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32 6, i32 23, i32 24, i32 9, i32 26, i32 27, i32 12, i32 29, i32 30, i32 15>		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32 6, i32 23, i32 24, i32 9, i32 26, i32 27, i32 12, i32 29, i32 30, i32 15>
%s4 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 31, i32 28, i32 25, i32 22, i32 5, i32 4, i32 19, i32 8, i32 7, i32 18, i32 11, i32 10, i32 17, i32 14, i32 13, i32 16>		%s4 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 31, i32 28, i32 25, i32 22, i32 5, i32 4, i32 19, i32 8, i32 7, i32 18, i32 11, i32 10, i32 17, i32 14, i32 13, i32 16>
%c = mul <16 x i32> %s3, %s4		%c = mul <16 x i32> %s3, %s4
ret <16 x i32> %c		ret <16 x i32> %c
}		}

define <16 x i32> @test3_1(<16 x i32> %x, <16 x i32> %y) {		define <16 x i32> @test3_1(<16 x i32> %x, <16 x i32> %y) {
; CHECK-LABEL: @test3_1(		; CHECK-LABEL: @test3_1(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 0, i32 17, i32 21, i32 25, i32 29, i32 4, i32 8, i32 12, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[S1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 23, i32 27, i32 31, i32 6, i32 10, i32 14, i32 21, i32 25, i32 29, i32 4, i32 8, i32 12>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 2, i32 19, i32 23, i32 27, i32 31, i32 6, i32 10, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[S2:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 18, i32 3, i32 16, i32 1, i32 5, i32 9, i32 13, i32 20, i32 24, i32 28, i32 7, i32 11, i32 15, i32 22, i32 26, i32 30>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 18, i32 3, i32 7, i32 11, i32 15, i32 22, i32 26, i32 30, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[A:%.*]] = add nsw <16 x i32> [[S1]], [[S2]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 16, i32 1, i32 5, i32 9, i32 13, i32 20, i32 24, i32 28, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[B:%.*]] = sub nsw <16 x i32> [[S1]], [[S2]]
; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP1]], [[TMP3]]		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[A]], <16 x i32> [[B]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: ret <16 x i32> [[S3]]		; CHECK-NEXT: ret <16 x i32> [[S3]]
;		;
%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 23, i32 27, i32 31, i32 6, i32 10, i32 14, i32 21, i32 25, i32 29, i32 4, i32 8, i32 12>		%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 23, i32 27, i32 31, i32 6, i32 10, i32 14, i32 21, i32 25, i32 29, i32 4, i32 8, i32 12>
%s2 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 18, i32 3, i32 16, i32 1, i32 5, i32 9, i32 13, i32 20, i32 24, i32 28, i32 7, i32 11, i32 15, i32 22, i32 26, i32 30>		%s2 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 18, i32 3, i32 16, i32 1, i32 5, i32 9, i32 13, i32 20, i32 24, i32 28, i32 7, i32 11, i32 15, i32 22, i32 26, i32 30>
%a = add nsw <16 x i32> %s1, %s2		%a = add nsw <16 x i32> %s1, %s2
%b = sub nsw <16 x i32> %s1, %s2		%b = sub nsw <16 x i32> %s1, %s2
%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x i32> %s3		ret <16 x i32> %s3
}		}

define <16 x i32> @test3_2(<16 x i32> %x, <16 x i32> %y) {		define <16 x i32> @test3_2(<16 x i32> %x, <16 x i32> %y) {
; CHECK-LABEL: @test3_2(		; CHECK-LABEL: @test3_2(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 6, i32 23, i32 19, i32 31, i32 27, i32 2, i32 14, i32 10, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 17, i32 21, i32 25, i32 29, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 4, i32 21, i32 17, i32 29, i32 25, i32 0, i32 12, i32 8, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 19, i32 23, i32 27, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 20, i32 5, i32 1, i32 13, i32 9, i32 16, i32 28, i32 24, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 18, i32 22, i32 26, i32 30, i32 3, i32 7, i32 11, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 22, i32 7, i32 3, i32 15, i32 11, i32 18, i32 30, i32 26, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 16, i32 20, i32 24, i32 28, i32 1, i32 5, i32 9, i32 13, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP1]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 16, i32 17, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 17, i32 21, i32 1, i32 5, i32 4, i32 7, i32 6, i32 0, i32 3, i32 2, i32 20, i32 23, i32 22, i32 16, i32 19, i32 18>
; CHECK-NEXT: ret <16 x i32> [[S3]]		; CHECK-NEXT: ret <16 x i32> [[S3]]
;		;
%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 23, i32 27, i32 31, i32 6, i32 10, i32 14, i32 21, i32 25, i32 29, i32 4, i32 8, i32 12>		%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 23, i32 27, i32 31, i32 6, i32 10, i32 14, i32 21, i32 25, i32 29, i32 4, i32 8, i32 12>
%s2 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 18, i32 3, i32 16, i32 1, i32 5, i32 9, i32 13, i32 20, i32 24, i32 28, i32 7, i32 11, i32 15, i32 22, i32 26, i32 30>		%s2 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 18, i32 3, i32 16, i32 1, i32 5, i32 9, i32 13, i32 20, i32 24, i32 28, i32 7, i32 11, i32 15, i32 22, i32 26, i32 30>
%a = add nsw <16 x i32> %s1, %s2		%a = add nsw <16 x i32> %s1, %s2
%b = sub nsw <16 x i32> %s1, %s2		%b = sub nsw <16 x i32> %s1, %s2
%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>
ret <16 x i32> %s3		ret <16 x i32> %s3
Show All 18 Lines	;
%s4 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>		%s4 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>
%c = add <16 x i32> %s3, %s4		%c = add <16 x i32> %s3, %s4
ret <16 x i32> %c		ret <16 x i32> %c
}		}


define <16 x i32> @test23(<16 x i32> %x, <16 x i32> %y) {		define <16 x i32> @test23(<16 x i32> %x, <16 x i32> %y) {
; CHECK-LABEL: @test23(		; CHECK-LABEL: @test23(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 18, i32 0, i32 20, i32 24, i32 12, i32 22, i32 10, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[S10:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 19, i32 1, i32 21, i32 25, i32 13, i32 23, i32 11, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[S20:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 7, i32 29, i32 3, i32 9, i32 27, i32 5, i32 17, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[A0:%.*]] = add nsw <16 x i32> [[S10]], [[S20]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[Y]], <16 x i32> [[X]], <16 x i32> <i32 4, i32 26, i32 6, i32 8, i32 16, i32 2, i32 28, i32 30, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[B0:%.*]] = sub nsw <16 x i32> [[S10]], [[S20]]
; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP1]], [[TMP3]]		; CHECK-NEXT: [[S1:%.*]] = shufflevector <16 x i32> [[A0]], <16 x i32> [[B0]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 23, i32 27, i32 31, i32 6, i32 10, i32 14, i32 21, i32 25, i32 29, i32 4, i32 8, i32 12>
; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[S2:%.*]] = shufflevector <16 x i32> [[B0]], <16 x i32> [[A0]], <16 x i32> <i32 18, i32 3, i32 16, i32 1, i32 5, i32 9, i32 13, i32 20, i32 24, i32 28, i32 7, i32 11, i32 15, i32 22, i32 26, i32 30>
; CHECK-NEXT: [[S1:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 1, i32 17, i32 0, i32 16, i32 21, i32 22, i32 23, i32 5, i32 6, i32 7, i32 18, i32 19, i32 20, i32 2, i32 3, i32 4>
; CHECK-NEXT: [[S2:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 18, i32 19, i32 20, i32 2, i32 3, i32 4, i32 21, i32 22, i32 23, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[A:%.*]] = add nsw <16 x i32> [[S1]], [[S2]]		; CHECK-NEXT: [[A:%.*]] = add nsw <16 x i32> [[S1]], [[S2]]
; CHECK-NEXT: [[B:%.*]] = sub nsw <16 x i32> [[S1]], [[S2]]		; CHECK-NEXT: [[B:%.*]] = sub nsw <16 x i32> [[S1]], [[S2]]
; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[A]], <16 x i32> [[B]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[A]], <16 x i32> [[B]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i32> [[A]], <16 x i32> [[B]], <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>		; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i32> [[A]], <16 x i32> [[B]], <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>
; CHECK-NEXT: [[C:%.*]] = add <16 x i32> [[S3]], [[S4]]		; CHECK-NEXT: [[C:%.*]] = add <16 x i32> [[S3]], [[S4]]
; CHECK-NEXT: ret <16 x i32> [[C]]		; CHECK-NEXT: ret <16 x i32> [[C]]
;		;
%s10 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%s10 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%s20 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>		%s20 = shufflevector <16 x i32> %y, <16 x i32> %x, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>
%a0 = add nsw <16 x i32> %s10, %s20		%a0 = add nsw <16 x i32> %s10, %s20
%b0 = sub nsw <16 x i32> %s10, %s20		%b0 = sub nsw <16 x i32> %s10, %s20
%s1 = shufflevector <16 x i32> %a0, <16 x i32> %b0, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 23, i32 27, i32 31, i32 6, i32 10, i32 14, i32 21, i32 25, i32 29, i32 4, i32 8, i32 12>		%s1 = shufflevector <16 x i32> %a0, <16 x i32> %b0, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 23, i32 27, i32 31, i32 6, i32 10, i32 14, i32 21, i32 25, i32 29, i32 4, i32 8, i32 12>
%s2 = shufflevector <16 x i32> %b0, <16 x i32> %a0, <16 x i32> <i32 18, i32 3, i32 16, i32 1, i32 5, i32 9, i32 13, i32 20, i32 24, i32 28, i32 7, i32 11, i32 15, i32 22, i32 26, i32 30>		%s2 = shufflevector <16 x i32> %b0, <16 x i32> %a0, <16 x i32> <i32 18, i32 3, i32 16, i32 1, i32 5, i32 9, i32 13, i32 20, i32 24, i32 28, i32 7, i32 11, i32 15, i32 22, i32 26, i32 30>
%a = add nsw <16 x i32> %s1, %s2		%a = add nsw <16 x i32> %s1, %s2
%b = sub nsw <16 x i32> %s1, %s2		%b = sub nsw <16 x i32> %s1, %s2
%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%s4 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>		%s4 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 29, i32 26, i32 7, i32 4, i32 3, i32 6, i32 5, i32 2, i32 9, i32 8, i32 17, i32 28, i32 27, i32 16, i32 31, i32 30>
%c = add <16 x i32> %s3, %s4		%c = add <16 x i32> %s3, %s4
ret <16 x i32> %c		ret <16 x i32> %c
}		}

define <16 x i32> @testgood(<16 x i32> %x, <16 x i32> %y) {		define <16 x i32> @testgood(<16 x i32> %x, <16 x i32> %y) {
; CHECK-LABEL: @testgood(		; CHECK-LABEL: @testgood(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP1]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>
; CHECK-NEXT: ret <16 x i32> [[S3]]		; CHECK-NEXT: ret <16 x i32> [[S3]]
;		;
%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>		%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
%s2 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>		%s2 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
%a = add nsw <16 x i32> %s1, %s2		%a = add nsw <16 x i32> %s1, %s2
%b = sub nsw <16 x i32> %s1, %s2		%b = sub nsw <16 x i32> %s1, %s2
%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
ret <16 x i32> %s3		ret <16 x i32> %s3
}		}

		define <16 x i32> @test_shufshufin(<16 x i32> %x, <16 x i32> %y) {
		; CHECK-LABEL: @test_shufshufin(
		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP2]], [[TMP4]]
		; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP1]], [[TMP3]]
		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>
		; CHECK-NEXT: ret <16 x i32> [[S3]]
		;
		%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
		%s2 = shufflevector <16 x i32> %s1, <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
		%a = add nsw <16 x i32> %s1, %s2
		%b = sub nsw <16 x i32> %s1, %s2
		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
		ret <16 x i32> %s3
		}

		define <16 x i32> @testshufshufout(<16 x i32> %x, <16 x i32> %y) {
		; CHECK-LABEL: @testshufshufout(
		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[TMP2]], [[TMP4]]
		; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <16 x i32> [[TMP1]], [[TMP3]]
		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>
		; CHECK-NEXT: [[S4:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 16, i32 0, i32 17, i32 1, i32 18, i32 2, i32 19, i32 3, i32 20, i32 4, i32 21, i32 5, i32 22, i32 6, i32 23, i32 7>
		; CHECK-NEXT: [[R:%.*]] = add nsw <16 x i32> [[S3]], [[S4]]
		; CHECK-NEXT: ret <16 x i32> [[R]]
		;
		%s1 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
		%s2 = shufflevector <16 x i32> %x, <16 x i32> %y, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
		%a = add nsw <16 x i32> %s1, %s2
		%b = sub nsw <16 x i32> %s1, %s2
		%s3 = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
		%s4 = shufflevector <16 x i32> %s3, <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>
		%r = add nsw <16 x i32> %s3, %s4
		ret <16 x i32> %r
		}

declare void @use(<16 x i32>)		declare void @use(<16 x i32>)
define <16 x i32> @test_extrashuffleuse(<16 x i32> %x, <16 x i32> %y) {		define <16 x i32> @test_extrashuffleuse(<16 x i32> %x, <16 x i32> %y) {
; CHECK-LABEL: @test_extrashuffleuse(		; CHECK-LABEL: @test_extrashuffleuse(
; CHECK-NEXT: [[S1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>		; CHECK-NEXT: [[S1:%.]] = shufflevector <16 x i32> [[X:%.]], <16 x i32> [[Y:%.*]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
; CHECK-NEXT: [[S2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>		; CHECK-NEXT: [[S2:%.*]] = shufflevector <16 x i32> [[X]], <16 x i32> [[Y]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
; CHECK-NEXT: [[A:%.*]] = add nsw <16 x i32> [[S1]], [[S2]]		; CHECK-NEXT: [[A:%.*]] = add nsw <16 x i32> [[S1]], [[S2]]
; CHECK-NEXT: [[B:%.*]] = sub nsw <16 x i32> [[S1]], [[S2]]		; CHECK-NEXT: [[B:%.*]] = sub nsw <16 x i32> [[S1]], [[S2]]
; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[A]], <16 x i32> [[B]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>		; CHECK-NEXT: [[S3:%.*]] = shufflevector <16 x i32> [[A]], <16 x i32> [[B]], <16 x i32> <i32 0, i32 17, i32 2, i32 19, i32 4, i32 21, i32 6, i32 23, i32 8, i32 25, i32 10, i32 27, i32 12, i32 29, i32 14, i32 31>
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	;
%t1 = shufflevector <4 x i32> %add, <4 x i32> %add, <4 x i32> <i32 6, i32 0, i32 1, i32 7>		%t1 = shufflevector <4 x i32> %add, <4 x i32> %add, <4 x i32> <i32 6, i32 0, i32 1, i32 7>
%h1 = getelementptr <4 x i32>, ptr %dst, i32 1		%h1 = getelementptr <4 x i32>, ptr %dst, i32 1
store <4 x i32> %t1, ptr %h1, align 16		store <4 x i32> %t1, ptr %h1, align 16
ret void		ret void
}		}

define <16 x i32> @test_1651256324(<16 x i32> %l0, <16 x i32> %l1, <16 x i32> %l6, <16 x i32> %l7) {		define <16 x i32> @test_1651256324(<16 x i32> %l0, <16 x i32> %l1, <16 x i32> %l6, <16 x i32> %l7) {
; CHECK-LABEL: @test_1651256324(		; CHECK-LABEL: @test_1651256324(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[L0:%.]], <16 x i32> [[L6:%.*]], <16 x i32> <i32 30, i32 10, i32 1, i32 20, i32 10, i32 0, i32 20, i32 3, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i32> [[L0:%.]], <16 x i32> [[L6:%.*]], <16 x i32> <i32 0, i32 1, i32 3, i32 10, i32 10, i32 15, i32 20, i32 20, i32 30, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <16 x i32> [[L1:%.]], <16 x i32> [[L1]], <16 x i32> <i32 24, i32 1, i32 10, i32 0, i32 5, i32 7, i32 11, i32 11, i32 4, i32 12, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP2:%.]] = shufflevector <16 x i32> [[L1:%.]], <16 x i32> [[L1]], <16 x i32> <i32 0, i32 1, i32 4, i32 5, i32 7, i32 10, i32 11, i32 11, i32 12, i32 22, i32 24, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[L1]], <16 x i32> [[L1]], <16 x i32> <i32 24, i32 1, i32 10, i32 0, i32 5, i32 7, i32 11, i32 11, i32 4, i32 12, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[L1]], <16 x i32> [[L1]], <16 x i32> <i32 0, i32 1, i32 4, i32 5, i32 7, i32 10, i32 11, i32 11, i32 12, i32 22, i32 24, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i32> [[L7:%.]], <16 x i32> [[L7]], <16 x i32> <i32 23, i32 20, i32 29, i32 25, i32 14, i32 21, i32 11, i32 9, i32 2, i32 7, i32 5, i32 15, i32 24, i32 30, i32 26, i32 5>		; CHECK-NEXT: [[S2:%.]] = shufflevector <16 x i32> [[L7:%.]], <16 x i32> [[L7]], <16 x i32> <i32 23, i32 20, i32 29, i32 25, i32 14, i32 21, i32 11, i32 9, i32 2, i32 7, i32 5, i32 15, i32 24, i32 30, i32 26, i32 5>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[L6]], <16 x i32> [[L7]], <16 x i32> <i32 4, i32 24, i32 13, i32 15, i32 0, i32 29, i32 8, i32 30, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[L6]], <16 x i32> [[L7]], <16 x i32> <i32 29, i32 13, i32 30, i32 24, i32 0, i32 15, i32 15, i32 8, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP5:%.*]] = add <16 x i32> [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <16 x i32> [[TMP3]], [[TMP2]]
; CHECK-NEXT: [[TMP6:%.*]] = sub <16 x i32> [[TMP1]], [[TMP4]]		; CHECK-NEXT: [[TMP6:%.*]] = sub <16 x i32> [[TMP1]], [[TMP4]]
; CHECK-NEXT: [[T0:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 18, i32 7, i32 22, i32 1, i32 8, i32 1, i32 9, i32 0, i32 6, i32 6, i32 23, i32 16, i32 18, i32 10, i32 24, i32 21>		; CHECK-NEXT: [[T0:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 17, i32 7, i32 23, i32 1, i32 2, i32 1, i32 8, i32 10, i32 6, i32 6, i32 18, i32 24, i32 17, i32 9, i32 21, i32 16>
; CHECK-NEXT: [[T1:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 16, i32 17, i32 3, i32 4, i32 3, i32 18, i32 19, i32 3, i32 5, i32 1, i32 20, i32 21, i32 6>		; CHECK-NEXT: [[T1:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> [[TMP6]], <16 x i32> <i32 10, i32 1, i32 5, i32 24, i32 19, i32 0, i32 3, i32 0, i32 17, i32 22, i32 0, i32 4, i32 1, i32 20, i32 16, i32 6>
; CHECK-NEXT: [[R:%.*]] = xor <16 x i32> [[T0]], [[T1]]		; CHECK-NEXT: [[R:%.*]] = xor <16 x i32> [[T0]], [[T1]]
; CHECK-NEXT: ret <16 x i32> [[R]]		; CHECK-NEXT: ret <16 x i32> [[R]]
;		;
%s0 = shufflevector <16 x i32> %l0, <16 x i32> %l6, <16 x i32> <i32 1, i32 20, i32 15, i32 3, i32 1, i32 10, i32 17, i32 25, i32 29, i32 23, i32 20, i32 10, i32 0, i32 20, i32 30, i32 30>		%s0 = shufflevector <16 x i32> %l0, <16 x i32> %l6, <16 x i32> <i32 1, i32 20, i32 15, i32 3, i32 1, i32 10, i32 17, i32 25, i32 29, i32 23, i32 20, i32 10, i32 0, i32 20, i32 30, i32 30>
%s1 = shufflevector <16 x i32> %l1, <16 x i32> %l1, <16 x i32> <i32 11, i32 22, i32 1, i32 7, i32 20, i32 0, i32 2, i32 24, i32 28, i32 10, i32 31, i32 12, i32 22, i32 5, i32 11, i32 4>		%s1 = shufflevector <16 x i32> %l1, <16 x i32> %l1, <16 x i32> <i32 11, i32 22, i32 1, i32 7, i32 20, i32 0, i32 2, i32 24, i32 28, i32 10, i32 31, i32 12, i32 22, i32 5, i32 11, i32 4>
%s2 = shufflevector <16 x i32> %l7, <16 x i32> %l7, <16 x i32> <i32 23, i32 20, i32 29, i32 25, i32 14, i32 21, i32 11, i32 9, i32 2, i32 7, i32 5, i32 15, i32 24, i32 30, i32 26, i32 5>		%s2 = shufflevector <16 x i32> %l7, <16 x i32> %l7, <16 x i32> <i32 23, i32 20, i32 29, i32 25, i32 14, i32 21, i32 11, i32 9, i32 2, i32 7, i32 5, i32 15, i32 24, i32 30, i32 26, i32 5>
%s3 = shufflevector <16 x i32> %l6, <16 x i32> %l7, <16 x i32> <i32 29, i32 19, i32 15, i32 30, i32 13, i32 0, i32 30, i32 23, i32 26, i32 3, i32 15, i32 24, i32 29, i32 8, i32 4, i32 0>		%s3 = shufflevector <16 x i32> %l6, <16 x i32> %l7, <16 x i32> <i32 29, i32 19, i32 15, i32 30, i32 13, i32 0, i32 30, i32 23, i32 26, i32 3, i32 15, i32 24, i32 29, i32 8, i32 4, i32 0>
%add = add <16 x i32> %s1, %s1		%add = add <16 x i32> %s1, %s1
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP52:%.*]] = shufflevector <4 x i8> [[TMP15]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP52:%.*]] = shufflevector <4 x i8> [[TMP15]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP53:%.*]] = shufflevector <16 x i8> [[TMP51]], <16 x i8> [[TMP52]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP53:%.*]] = shufflevector <16 x i8> [[TMP51]], <16 x i8> [[TMP52]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP54:%.*]] = shufflevector <4 x i8> [[TMP7]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP54:%.*]] = shufflevector <4 x i8> [[TMP7]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP55:%.*]] = shufflevector <16 x i8> [[TMP53]], <16 x i8> [[TMP54]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[TMP55:%.*]] = shufflevector <16 x i8> [[TMP53]], <16 x i8> [[TMP54]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: [[TMP56:%.*]] = zext <16 x i8> [[TMP55]] to <16 x i32>		; CHECK-NEXT: [[TMP56:%.*]] = zext <16 x i8> [[TMP55]] to <16 x i32>
; CHECK-NEXT: [[TMP57:%.*]] = sub nsw <16 x i32> [[TMP48]], [[TMP56]]		; CHECK-NEXT: [[TMP57:%.*]] = sub nsw <16 x i32> [[TMP48]], [[TMP56]]
; CHECK-NEXT: [[TMP58:%.*]] = shl nsw <16 x i32> [[TMP57]], <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: [[TMP58:%.*]] = shl nsw <16 x i32> [[TMP57]], <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i32> [[TMP58]], [[TMP40]]		; CHECK-NEXT: [[TMP59:%.*]] = add nsw <16 x i32> [[TMP58]], [[TMP40]]
; CHECK-NEXT: [[REORDER:%.*]] = shufflevector <16 x i32> [[TMP59]], <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>		; CHECK-NEXT: [[TMP60:%.*]] = shufflevector <16 x i32> [[TMP59]], <16 x i32> [[TMP59]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP60:%.*]] = add nsw <16 x i32> [[TMP59]], [[REORDER]]		; CHECK-NEXT: [[TMP61:%.*]] = shufflevector <16 x i32> [[TMP59]], <16 x i32> [[TMP59]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP61:%.*]] = sub nsw <16 x i32> [[TMP59]], [[REORDER]]		; CHECK-NEXT: [[TMP62:%.*]] = shufflevector <16 x i32> [[TMP59]], <16 x i32> poison, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP62:%.*]] = shufflevector <16 x i32> [[TMP60]], <16 x i32> [[TMP61]], <16 x i32> <i32 7, i32 3, i32 15, i32 11, i32 18, i32 22, i32 30, i32 26, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP63:%.*]] = shufflevector <16 x i32> [[TMP59]], <16 x i32> poison, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP63:%.*]] = shufflevector <16 x i32> [[TMP60]], <16 x i32> [[TMP61]], <16 x i32> <i32 1, i32 5, i32 13, i32 9, i32 16, i32 20, i32 28, i32 24, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP64:%.*]] = add nsw <16 x i32> [[TMP61]], [[TMP63]]
; CHECK-NEXT: [[TMP64:%.*]] = shufflevector <16 x i32> [[TMP60]], <16 x i32> [[TMP61]], <16 x i32> <i32 5, i32 1, i32 13, i32 9, i32 16, i32 20, i32 28, i32 24, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP65:%.*]] = sub nsw <16 x i32> [[TMP60]], [[TMP62]]
; CHECK-NEXT: [[TMP65:%.*]] = shufflevector <16 x i32> [[TMP60]], <16 x i32> [[TMP61]], <16 x i32> <i32 3, i32 7, i32 15, i32 11, i32 18, i32 22, i32 30, i32 26, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP66:%.*]] = shufflevector <16 x i32> [[TMP64]], <16 x i32> [[TMP65]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP66:%.*]] = add nsw <16 x i32> [[TMP62]], [[TMP64]]		; CHECK-NEXT: [[TMP67:%.*]] = shufflevector <16 x i32> [[TMP64]], <16 x i32> [[TMP65]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP67:%.*]] = sub nsw <16 x i32> [[TMP63]], [[TMP65]]		; CHECK-NEXT: [[TMP68:%.*]] = shufflevector <16 x i32> [[TMP64]], <16 x i32> [[TMP65]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP68:%.*]] = shufflevector <16 x i32> [[TMP66]], <16 x i32> [[TMP67]], <16 x i32> <i32 1, i32 4, i32 16, i32 20, i32 3, i32 7, i32 19, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP69:%.*]] = shufflevector <16 x i32> [[TMP64]], <16 x i32> [[TMP65]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP69:%.*]] = shufflevector <16 x i32> [[TMP66]], <16 x i32> [[TMP67]], <16 x i32> <i32 0, i32 5, i32 17, i32 21, i32 2, i32 6, i32 18, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP70:%.*]] = add nsw <16 x i32> [[TMP67]], [[TMP69]]
; CHECK-NEXT: [[TMP70:%.*]] = shufflevector <16 x i32> [[TMP66]], <16 x i32> [[TMP67]], <16 x i32> <i32 0, i32 5, i32 17, i32 21, i32 2, i32 6, i32 18, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP71:%.*]] = sub nsw <16 x i32> [[TMP66]], [[TMP68]]
; CHECK-NEXT: [[TMP71:%.*]] = shufflevector <16 x i32> [[TMP66]], <16 x i32> [[TMP67]], <16 x i32> <i32 1, i32 4, i32 16, i32 20, i32 3, i32 7, i32 19, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP72:%.*]] = shufflevector <16 x i32> [[TMP70]], <16 x i32> [[TMP71]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP72:%.*]] = add nsw <16 x i32> [[TMP68]], [[TMP70]]		; CHECK-NEXT: [[TMP73:%.*]] = shufflevector <16 x i32> [[TMP70]], <16 x i32> [[TMP71]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP73:%.*]] = sub nsw <16 x i32> [[TMP69]], [[TMP71]]		; CHECK-NEXT: [[TMP74:%.*]] = shufflevector <16 x i32> [[TMP70]], <16 x i32> [[TMP71]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP74:%.*]] = shufflevector <16 x i32> [[TMP72]], <16 x i32> [[TMP73]], <16 x i32> <i32 0, i32 16, i32 17, i32 1, i32 18, i32 2, i32 19, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP75:%.*]] = shufflevector <16 x i32> [[TMP70]], <16 x i32> [[TMP71]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP75:%.*]] = shufflevector <16 x i32> [[TMP72]], <16 x i32> [[TMP73]], <16 x i32> <i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP76:%.*]] = add nsw <16 x i32> [[TMP73]], [[TMP75]]
; CHECK-NEXT: [[TMP76:%.*]] = shufflevector <16 x i32> [[TMP72]], <16 x i32> [[TMP73]], <16 x i32> <i32 4, i32 20, i32 21, i32 5, i32 22, i32 6, i32 23, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP77:%.*]] = sub nsw <16 x i32> [[TMP72]], [[TMP74]]
; CHECK-NEXT: [[TMP77:%.*]] = shufflevector <16 x i32> [[TMP72]], <16 x i32> [[TMP73]], <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP78:%.*]] = shufflevector <16 x i32> [[TMP76]], <16 x i32> [[TMP77]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP78:%.*]] = add nsw <16 x i32> [[TMP74]], [[TMP76]]		; CHECK-NEXT: [[TMP79:%.*]] = shufflevector <16 x i32> [[TMP76]], <16 x i32> [[TMP77]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP79:%.*]] = sub nsw <16 x i32> [[TMP75]], [[TMP77]]		; CHECK-NEXT: [[TMP80:%.*]] = shufflevector <16 x i32> [[TMP76]], <16 x i32> [[TMP77]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP80:%.*]] = shufflevector <16 x i32> [[TMP78]], <16 x i32> [[TMP79]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>		; CHECK-NEXT: [[TMP81:%.*]] = shufflevector <16 x i32> [[TMP76]], <16 x i32> [[TMP77]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP81:%.*]] = lshr <16 x i32> [[TMP80]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP82:%.*]] = add nsw <16 x i32> [[TMP79]], [[TMP81]]
; CHECK-NEXT: [[TMP82:%.*]] = and <16 x i32> [[TMP81]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP83:%.*]] = sub nsw <16 x i32> [[TMP78]], [[TMP80]]
; CHECK-NEXT: [[TMP83:%.*]] = mul nuw <16 x i32> [[TMP82]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP84:%.*]] = shufflevector <16 x i32> [[TMP82]], <16 x i32> [[TMP83]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; CHECK-NEXT: [[TMP84:%.*]] = add <16 x i32> [[TMP83]], [[TMP80]]		; CHECK-NEXT: [[TMP85:%.*]] = lshr <16 x i32> [[TMP84]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP85:%.*]] = xor <16 x i32> [[TMP84]], [[TMP83]]		; CHECK-NEXT: [[TMP86:%.*]] = and <16 x i32> [[TMP85]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP86:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP85]])		; CHECK-NEXT: [[TMP87:%.*]] = mul nuw <16 x i32> [[TMP86]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[CONV118:%.*]] = and i32 [[TMP86]], 65535		; CHECK-NEXT: [[TMP88:%.*]] = add <16 x i32> [[TMP87]], [[TMP84]]
; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP86]], 16		; CHECK-NEXT: [[TMP89:%.*]] = xor <16 x i32> [[TMP88]], [[TMP87]]
		; CHECK-NEXT: [[TMP90:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP89]])
		; CHECK-NEXT: [[CONV118:%.*]] = and i32 [[TMP90]], 65535
		; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP90]], 16
		samtebbsUnsubmitted Not Done Reply Inline Actions Looks like more instructions are added here. Is that fine because it gets lowered to something better or of lower cost? samtebbs: Looks like more instructions are added here. Is that fine because it gets lowered to something…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah - More instructions is better in this case, as the shuffles are each simpler. It can be hard to see where this is an improvement and where it isn't from the tests. In general it can be a little difficult to be very precise, but most of the changes are improvements (and this one especially is much better as the shuffles are a lot easier to materialize). dmgreen: Yeah - More instructions is better in this case, as the shuffles are each simpler. It can be…
; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 [[CONV118]], [[SHR]]		; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 [[CONV118]], [[SHR]]
; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1		; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1
; CHECK-NEXT: ret i32 [[SHR120]]		; CHECK-NEXT: ret i32 [[SHR120]]
;		;
entry:		entry:
%idx.ext = sext i32 %st1 to i64		%idx.ext = sext i32 %st1 to i64
%idx.ext63 = sext i32 %st2 to i64		%idx.ext63 = sext i32 %st2 to i64
%arrayidx3 = getelementptr inbounds i8, i8* %p1, i64 4		%arrayidx3 = getelementptr inbounds i8, i8* %p1, i64 4
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP44:%.*]] = shufflevector <4 x i8> [[TMP7]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP44:%.*]] = shufflevector <4 x i8> [[TMP7]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP45:%.*]] = shufflevector <16 x i8> [[TMP43]], <16 x i8> [[TMP44]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEXT: [[TMP45:%.*]] = shufflevector <16 x i8> [[TMP43]], <16 x i8> [[TMP44]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[TMP46:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP46:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP47:%.*]] = shufflevector <16 x i8> [[TMP45]], <16 x i8> [[TMP46]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>		; CHECK-NEXT: [[TMP47:%.*]] = shufflevector <16 x i8> [[TMP45]], <16 x i8> [[TMP46]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 16, i32 17, i32 18, i32 19>
; CHECK-NEXT: [[TMP48:%.*]] = zext <16 x i8> [[TMP47]] to <16 x i32>		; CHECK-NEXT: [[TMP48:%.*]] = zext <16 x i8> [[TMP47]] to <16 x i32>
; CHECK-NEXT: [[TMP49:%.*]] = sub nsw <16 x i32> [[TMP39]], [[TMP48]]		; CHECK-NEXT: [[TMP49:%.*]] = sub nsw <16 x i32> [[TMP39]], [[TMP48]]
; CHECK-NEXT: [[TMP50:%.*]] = shl nsw <16 x i32> [[TMP49]], <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		; CHECK-NEXT: [[TMP50:%.*]] = shl nsw <16 x i32> [[TMP49]], <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
; CHECK-NEXT: [[TMP51:%.*]] = add nsw <16 x i32> [[TMP50]], [[TMP30]]		; CHECK-NEXT: [[TMP51:%.*]] = add nsw <16 x i32> [[TMP50]], [[TMP30]]
; CHECK-NEXT: [[REORDER:%.*]] = shufflevector <16 x i32> [[TMP51]], <16 x i32> poison, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>		; CHECK-NEXT: [[TMP52:%.*]] = shufflevector <16 x i32> [[TMP51]], <16 x i32> [[TMP51]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP52:%.*]] = add nsw <16 x i32> [[TMP51]], [[REORDER]]		; CHECK-NEXT: [[TMP53:%.*]] = shufflevector <16 x i32> [[TMP51]], <16 x i32> [[TMP51]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP53:%.*]] = sub nsw <16 x i32> [[TMP51]], [[REORDER]]		; CHECK-NEXT: [[TMP54:%.*]] = shufflevector <16 x i32> [[TMP51]], <16 x i32> poison, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP54:%.*]] = shufflevector <16 x i32> [[TMP52]], <16 x i32> [[TMP53]], <16 x i32> <i32 7, i32 3, i32 15, i32 11, i32 18, i32 22, i32 30, i32 26, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP55:%.*]] = shufflevector <16 x i32> [[TMP51]], <16 x i32> poison, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP55:%.*]] = shufflevector <16 x i32> [[TMP52]], <16 x i32> [[TMP53]], <16 x i32> <i32 1, i32 5, i32 13, i32 9, i32 16, i32 20, i32 28, i32 24, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP56:%.*]] = add nsw <16 x i32> [[TMP53]], [[TMP55]]
; CHECK-NEXT: [[TMP56:%.*]] = shufflevector <16 x i32> [[TMP52]], <16 x i32> [[TMP53]], <16 x i32> <i32 5, i32 1, i32 13, i32 9, i32 16, i32 20, i32 28, i32 24, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP57:%.*]] = sub nsw <16 x i32> [[TMP52]], [[TMP54]]
; CHECK-NEXT: [[TMP57:%.*]] = shufflevector <16 x i32> [[TMP52]], <16 x i32> [[TMP53]], <16 x i32> <i32 3, i32 7, i32 15, i32 11, i32 18, i32 22, i32 30, i32 26, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP58:%.*]] = shufflevector <16 x i32> [[TMP56]], <16 x i32> [[TMP57]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP58:%.*]] = add nsw <16 x i32> [[TMP54]], [[TMP56]]		; CHECK-NEXT: [[TMP59:%.*]] = shufflevector <16 x i32> [[TMP56]], <16 x i32> [[TMP57]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP59:%.*]] = sub nsw <16 x i32> [[TMP55]], [[TMP57]]		; CHECK-NEXT: [[TMP60:%.*]] = shufflevector <16 x i32> [[TMP56]], <16 x i32> [[TMP57]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP60:%.*]] = shufflevector <16 x i32> [[TMP58]], <16 x i32> [[TMP59]], <16 x i32> <i32 1, i32 4, i32 16, i32 20, i32 3, i32 7, i32 19, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP61:%.*]] = shufflevector <16 x i32> [[TMP56]], <16 x i32> [[TMP57]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP61:%.*]] = shufflevector <16 x i32> [[TMP58]], <16 x i32> [[TMP59]], <16 x i32> <i32 0, i32 5, i32 17, i32 21, i32 2, i32 6, i32 18, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP62:%.*]] = add nsw <16 x i32> [[TMP59]], [[TMP61]]
; CHECK-NEXT: [[TMP62:%.*]] = shufflevector <16 x i32> [[TMP58]], <16 x i32> [[TMP59]], <16 x i32> <i32 0, i32 5, i32 17, i32 21, i32 2, i32 6, i32 18, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP63:%.*]] = sub nsw <16 x i32> [[TMP58]], [[TMP60]]
; CHECK-NEXT: [[TMP63:%.*]] = shufflevector <16 x i32> [[TMP58]], <16 x i32> [[TMP59]], <16 x i32> <i32 1, i32 4, i32 16, i32 20, i32 3, i32 7, i32 19, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP64:%.*]] = shufflevector <16 x i32> [[TMP62]], <16 x i32> [[TMP63]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP64:%.*]] = add nsw <16 x i32> [[TMP60]], [[TMP62]]		; CHECK-NEXT: [[TMP65:%.*]] = shufflevector <16 x i32> [[TMP62]], <16 x i32> [[TMP63]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP65:%.*]] = sub nsw <16 x i32> [[TMP61]], [[TMP63]]		; CHECK-NEXT: [[TMP66:%.*]] = shufflevector <16 x i32> [[TMP62]], <16 x i32> [[TMP63]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP66:%.*]] = shufflevector <16 x i32> [[TMP64]], <16 x i32> [[TMP65]], <16 x i32> <i32 0, i32 16, i32 17, i32 1, i32 18, i32 2, i32 19, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP67:%.*]] = shufflevector <16 x i32> [[TMP62]], <16 x i32> [[TMP63]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP67:%.*]] = shufflevector <16 x i32> [[TMP64]], <16 x i32> [[TMP65]], <16 x i32> <i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP68:%.*]] = add nsw <16 x i32> [[TMP65]], [[TMP67]]
; CHECK-NEXT: [[TMP68:%.*]] = shufflevector <16 x i32> [[TMP64]], <16 x i32> [[TMP65]], <16 x i32> <i32 4, i32 20, i32 21, i32 5, i32 22, i32 6, i32 23, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP69:%.*]] = sub nsw <16 x i32> [[TMP64]], [[TMP66]]
; CHECK-NEXT: [[TMP69:%.*]] = shufflevector <16 x i32> [[TMP64]], <16 x i32> [[TMP65]], <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP70:%.*]] = shufflevector <16 x i32> [[TMP68]], <16 x i32> [[TMP69]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP70:%.*]] = add nsw <16 x i32> [[TMP66]], [[TMP68]]		; CHECK-NEXT: [[TMP71:%.*]] = shufflevector <16 x i32> [[TMP68]], <16 x i32> [[TMP69]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP71:%.*]] = sub nsw <16 x i32> [[TMP67]], [[TMP69]]		; CHECK-NEXT: [[TMP72:%.*]] = shufflevector <16 x i32> [[TMP68]], <16 x i32> [[TMP69]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP72:%.*]] = shufflevector <16 x i32> [[TMP70]], <16 x i32> [[TMP71]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>		; CHECK-NEXT: [[TMP73:%.*]] = shufflevector <16 x i32> [[TMP68]], <16 x i32> [[TMP69]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP73:%.*]] = lshr <16 x i32> [[TMP72]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP74:%.*]] = add nsw <16 x i32> [[TMP71]], [[TMP73]]
; CHECK-NEXT: [[TMP74:%.*]] = and <16 x i32> [[TMP73]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP75:%.*]] = sub nsw <16 x i32> [[TMP70]], [[TMP72]]
; CHECK-NEXT: [[TMP75:%.*]] = mul nuw <16 x i32> [[TMP74]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP76:%.*]] = shufflevector <16 x i32> [[TMP74]], <16 x i32> [[TMP75]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; CHECK-NEXT: [[TMP76:%.*]] = add <16 x i32> [[TMP75]], [[TMP72]]		; CHECK-NEXT: [[TMP77:%.*]] = lshr <16 x i32> [[TMP76]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP77:%.*]] = xor <16 x i32> [[TMP76]], [[TMP75]]		; CHECK-NEXT: [[TMP78:%.*]] = and <16 x i32> [[TMP77]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP78:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP77]])		; CHECK-NEXT: [[TMP79:%.*]] = mul nuw <16 x i32> [[TMP78]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[CONV118:%.*]] = and i32 [[TMP78]], 65535		; CHECK-NEXT: [[TMP80:%.*]] = add <16 x i32> [[TMP79]], [[TMP76]]
; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP78]], 16		; CHECK-NEXT: [[TMP81:%.*]] = xor <16 x i32> [[TMP80]], [[TMP79]]
		; CHECK-NEXT: [[TMP82:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP81]])
		; CHECK-NEXT: [[CONV118:%.*]] = and i32 [[TMP82]], 65535
		; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP82]], 16
; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 [[CONV118]], [[SHR]]		; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 [[CONV118]], [[SHR]]
; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1		; CHECK-NEXT: [[SHR120:%.*]] = lshr i32 [[ADD119]], 1
; CHECK-NEXT: ret i32 [[SHR120]]		; CHECK-NEXT: ret i32 [[SHR120]]
;		;
entry:		entry:
%idx.ext = sext i32 %i_pix1 to i64		%idx.ext = sext i32 %i_pix1 to i64
%idx.ext63 = sext i32 %i_pix2 to i64		%idx.ext63 = sext i32 %i_pix2 to i64
%arrayidx3 = getelementptr inbounds i8, ptr %pix1, i64 4		%arrayidx3 = getelementptr inbounds i8, ptr %pix1, i64 4
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines