This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2/6
VectorCombine.cpp
-
test/Transforms/VectorCombine/AArch64/
-
Transforms/
-
VectorCombine/
-
AArch64/
1/2
vecreduce-shuffle.ll

Differential D123494

[VectorCombine] Find and remove shuffles from commutative reductions
ClosedPublic

Authored by dmgreen on Apr 11 2022, 3:53 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
SjoerdMeijer
labrinea
samtebbs
jaykang10

Commits

rGded8187e353f: [VectorCombine] Try to reduce shuffle cost for commutative reduction operands

Summary

Given a shuffle feeding a reduction, the lane ordering of the shuffle will not alter the result. This is also true if there are a number of operations between the reduction and the shuffle, providing they only operate lane-wise. This patch searches for cases like that in Vector Combine, allowing us to check the cost of the shuffle vs an in-order identity shuffle and replace the order of possible. This only handles a single shuffle at the moment to keep things simple, and is able to ignore splats that produce results where every result is the same.

This is a more powerful version of a combine that already happens in instrcombine, capable of optimizing more cases by looking through more instructions and being able to cost the shuffle.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Apr 11 2022, 3:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2022, 3:53 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

dmgreen requested review of this revision.Apr 11 2022, 3:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2022, 3:53 AM

Harbormaster completed remote builds in B158969: Diff 421859.Apr 11 2022, 3:54 AM

RKSimon added inline comments.Apr 11 2022, 4:03 AM

llvm/test/Transforms/VectorCombine/AArch64/vecreduce-shuffle.ll
20	What about we start with a much simpler combine in InstSimplify/Combine that just removes a permute it is feeds a commutative reduction?

dmgreen added inline comments.Apr 11 2022, 4:12 AM

llvm/test/Transforms/VectorCombine/AArch64/vecreduce-shuffle.ll
20	We already have that combine in https://github.com/llvm/llvm-project/blob/431e93f4f56e5b839bf1f746d65139ccf3ca2232/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L2601. This is trying to add a more complex version for vector combine that can handle more patterns, and potentially remove more shuffles that are cheaper than the original (not just single-input identity masks). This test I wanted to leave in to make sure there was vector-combine testing for it, even if the simpler combine exists elsewhere too.

Ping any comments?

samtebbs added inline comments.Apr 26 2022, 8:08 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1162	Does this also need to check if the binary operation is commutative? It may be a good idea to add a test with a non-commutative reduction and a qualifying shuffle, to make sure this function's behaviour isn't broken in the future.

dmgreen added inline comments.Apr 27 2022, 6:59 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1162	I don't think it should matter if the binary operator is commutative. There is a test with a lshr in reduceshuffle_twoin_ext_v16i32, for example. These binary operators we are only looking through, back to the shuffle. We don't modify them directly, just the order that the vector-lanes operate. Which is safe because they are only used by the reduction, and only use splats and the shuffle we transform.

LGTM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1162	Yes you're right. Only the reduction intrinsic needs to be commutative which you have checked for in the switch statement above.

samtebbs accepted this revision.Apr 27 2022, 8:25 AM

This revision is now accepted and ready to land.Apr 27 2022, 8:25 AM

spatel added inline comments.Apr 27 2022, 8:34 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1162	Could add a near duplicate of that test except the shuffled value is operand 1 of the shift (to prove that we can find the shuffle through either operand of a binop).
1213–1219	If I'm seeing it correctly, this logic ensures we won't try to re-shuffle a mask with duplicate elements like: <i32 1, i32 2, i32 1, i32 3> Is that intentional? Either way, it would be good to have a test like that.

dmgreen added inline comments.Apr 28 2022, 5:47 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
1213–1219	I think it was, yes. But mostly just because it simplified things and it didn't sound as profitable (as-in the best order to aim for is an identity mask and having extra lanes in there can mess that up). There is a test for it in reduceshuffle_twoin_repeat_v4i32. I was aiming for something that removed the shuffle - either because it can be removed or it becomes a concat or something simple like that. It might be profitable in more cases though, and so long as we are cost modelling it, sorting the mask indices should be OK. I can change it to that, as it has the added benefit that is simplifies the code a little.

Update to sort the indices.

Harbormaster completed remote builds in B161780: Diff 425754.Apr 28 2022, 5:49 AM

LGTM - I think the patch title would be better as something like "[VectorCombine] try to reduce shuffle cost for commutative reduction operands"

I tried testing with x86 and opened: https://github.com/llvm/llvm-project/issues/55170

There may be some generalization where we recognize other operations that don't care about element order.
For example, if we're testing if all elements are equal to zero:
https://alive2.llvm.org/ce/z/DFk-aC

We don't appear to optimize even that easy unary shuffle case in any pass currently.

In D123494#3480166, @spatel wrote:

LGTM - I think the patch title would be better as something like "[VectorCombine] try to reduce shuffle cost for commutative reduction operands"

I tried testing with x86 and opened: https://github.com/llvm/llvm-project/issues/55170

There may be some generalization where we recognize other operations that don't care about element order.
For example, if we're testing if all elements are equal to zero:
https://alive2.llvm.org/ce/z/DFk-aC

We don't appear to optimize even that easy unary shuffle case in any pass currently.

Thanks. There is D100486 that helps with X86 shuffles, but I'm not sure it applies to these cases quite yet.

This revision was landed with ongoing or failed builds.Apr 28 2022, 11:46 AM

Closed by commit rGded8187e353f: [VectorCombine] Try to reduce shuffle cost for commutative reduction operands (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGded8187e353f: [VectorCombine] Try to reduce shuffle cost for commutative reduction operands.

dmgreen mentioned this in D125086: [VectorCombine] Attempt to fold select shuffles from reductions.May 6 2022, 3:08 AM

dmgreen mentioned this in rG6f9e1ea0efb9: [VectorCombine] Attempt to fold select shuffles from reductions.May 8 2022, 2:33 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

VectorCombine.cpp

117 lines

test/

Transforms/

VectorCombine/

AArch64/

vecreduce-shuffle.ll

38 lines

Diff 425866

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

//===------- VectorCombine.cpp - Optimize partial vector operations -------===//		//===------- VectorCombine.cpp - Optimize partial vector operations -------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This pass optimizes scalar/vector interactions using target cost models. The		// This pass optimizes scalar/vector interactions using target cost models. The
// transforms implemented here may not fit in traditional loop-based or SLP		// transforms implemented here may not fit in traditional loop-based or SLP
// vectorization passes.		// vectorization passes.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Vectorize/VectorCombine.h"		#include "llvm/Transforms/Vectorize/VectorCombine.h"
		#include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	void foldExtExtBinop(ExtractElementInst Ext0, ExtractElementInst Ext1,
Instruction &I);		Instruction &I);
bool foldExtractExtract(Instruction &I);		bool foldExtractExtract(Instruction &I);
bool foldBitcastShuf(Instruction &I);		bool foldBitcastShuf(Instruction &I);
bool scalarizeBinopOrCmp(Instruction &I);		bool scalarizeBinopOrCmp(Instruction &I);
bool foldExtractedCmps(Instruction &I);		bool foldExtractedCmps(Instruction &I);
bool foldSingleElementStore(Instruction &I);		bool foldSingleElementStore(Instruction &I);
bool scalarizeLoadExtract(Instruction &I);		bool scalarizeLoadExtract(Instruction &I);
bool foldShuffleOfBinops(Instruction &I);		bool foldShuffleOfBinops(Instruction &I);
		bool foldShuffleFromReductions(Instruction &I);

void replaceValue(Value &Old, Value &New) {		void replaceValue(Value &Old, Value &New) {
Old.replaceAllUsesWith(&New);		Old.replaceAllUsesWith(&New);
New.takeName(&Old);
if (auto *NewI = dyn_cast<Instruction>(&New)) {		if (auto *NewI = dyn_cast<Instruction>(&New)) {
		New.takeName(&Old);
Worklist.pushUsersToWorkList(*NewI);		Worklist.pushUsersToWorkList(*NewI);
Worklist.pushValue(NewI);		Worklist.pushValue(NewI);
}		}
Worklist.pushValue(&Old);		Worklist.pushValue(&Old);
}		}

void eraseInstruction(Instruction &I) {		void eraseInstruction(Instruction &I) {
for (Value *Op : I.operands())		for (Value *Op : I.operands())
▲ Show 20 Lines • Show All 989 Lines • ▼ Show 20 Lines	bool VectorCombine::foldShuffleOfBinops(Instruction &I) {
if (auto *NewInst = dyn_cast<Instruction>(NewBO)) {		if (auto *NewInst = dyn_cast<Instruction>(NewBO)) {
NewInst->copyIRFlags(B0);		NewInst->copyIRFlags(B0);
NewInst->andIRFlags(B1);		NewInst->andIRFlags(B1);
}		}
replaceValue(I, *NewBO);		replaceValue(I, *NewBO);
return true;		return true;
}		}

		/// Given a commutative reduction, the order of the input lanes does not alter
		/// the results. We can use this to remove certain shuffles feeding the
		/// reduction, removing the need to shuffle at all.
		bool VectorCombine::foldShuffleFromReductions(Instruction &I) {
		auto *II = dyn_cast<IntrinsicInst>(&I);
		if (!II)
		return false;
		switch (II->getIntrinsicID()) {
		case Intrinsic::vector_reduce_add:
		case Intrinsic::vector_reduce_mul:
		case Intrinsic::vector_reduce_and:
		case Intrinsic::vector_reduce_or:
		case Intrinsic::vector_reduce_xor:
		case Intrinsic::vector_reduce_smin:
		case Intrinsic::vector_reduce_smax:
		case Intrinsic::vector_reduce_umin:
		case Intrinsic::vector_reduce_umax:
		break;
		default:
		return false;
		}

		// Find all the inputs when looking through operations that do not alter the
		// lane order (binops, for example). Currently we look for a single shuffle,
		// and can ignore splat values.
		std::queue<Value *> Worklist;
		SmallPtrSet<Value *, 4> Visited;
		ShuffleVectorInst *Shuffle = nullptr;
		if (auto *Op = dyn_cast<Instruction>(I.getOperand(0)))
		Worklist.push(Op);

		while (!Worklist.empty()) {
		Value *CV = Worklist.front();
		Worklist.pop();
		if (Visited.contains(CV))
		continue;

		// Splats don't change the order, so can be safely ignored.
		if (isSplatValue(CV))
		continue;

		Visited.insert(CV);

		if (auto *CI = dyn_cast<Instruction>(CV)) {
		if (CI->isBinaryOp()) {
		samtebbsUnsubmitted Not Done Reply Inline Actions Does this also need to check if the binary operation is commutative? It may be a good idea to add a test with a non-commutative reduction and a qualifying shuffle, to make sure this function's behaviour isn't broken in the future. samtebbs: Does this also need to check if the binary operation is commutative? It may be a good idea to…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I don't think it should matter if the binary operator is commutative. There is a test with a lshr in reduceshuffle_twoin_ext_v16i32, for example. These binary operators we are only looking through, back to the shuffle. We don't modify them directly, just the order that the vector-lanes operate. Which is safe because they are only used by the reduction, and only use splats and the shuffle we transform. dmgreen: I don't think it should matter if the binary operator is commutative. There is a test with a…
		samtebbsUnsubmitted Not Done Reply Inline Actions Yes you're right. Only the reduction intrinsic needs to be commutative which you have checked for in the switch statement above. samtebbs: Yes you're right. Only the reduction intrinsic needs to be commutative which you have checked…
		spatelUnsubmitted Not Done Reply Inline Actions Could add a near duplicate of that test except the shuffled value is operand 1 of the shift (to prove that we can find the shuffle through either operand of a binop). spatel: Could add a near duplicate of that test except the shuffled value is operand 1 of the shift…
		for (auto *Op : CI->operand_values())
		Worklist.push(Op);
		continue;
		} else if (auto *SV = dyn_cast<ShuffleVectorInst>(CI)) {
		if (Shuffle && Shuffle != SV)
		return false;
		Shuffle = SV;
		continue;
		}
		}

		// Anything else is currently an unknown node.
		return false;
		}

		if (!Shuffle)
		return false;

		// Check all uses of the binary ops and shuffles are also included in the
		// lane-invariant operations (Visited should be the list of lanewise
		// instructions, including the shuffle that we found).
		for (auto *V : Visited)
		for (auto *U : V->users())
		if (!Visited.contains(U) && U != &I)
		return false;

		FixedVectorType *VecType =
		dyn_cast<FixedVectorType>(II->getOperand(0)->getType());
		if (!VecType)
		return false;
		FixedVectorType *ShuffleInputType =
		dyn_cast<FixedVectorType>(Shuffle->getOperand(0)->getType());
		if (!ShuffleInputType)
		return false;
		int NumInputElts = ShuffleInputType->getNumElements();

		// Find the mask from sorting the lanes into order. This is most likely to
		// become a identity or concat mask. Undef elements are pushed to the end.
		SmallVector<int> ConcatMask;
		Shuffle->getShuffleMask(ConcatMask);
		sort(ConcatMask, [](int X, int Y) {
		return Y == UndefMaskElem ? true : (X == UndefMaskElem ? false : X < Y);
		});
		bool UsesSecondVec =
		any_of(ConcatMask, [&](int M) { return M >= NumInputElts; });
		InstructionCost OldCost = TTI.getShuffleCost(
		UsesSecondVec ? TTI::SK_PermuteTwoSrc : TTI::SK_PermuteSingleSrc, VecType,
		Shuffle->getShuffleMask());
		InstructionCost NewCost = TTI.getShuffleCost(
		UsesSecondVec ? TTI::SK_PermuteTwoSrc : TTI::SK_PermuteSingleSrc, VecType,
		ConcatMask);

		LLVM_DEBUG(dbgs() << "Found a reduction feeding from a shuffle: " << *Shuffle
		<< "\n");
		LLVM_DEBUG(dbgs() << " OldCost: " << OldCost << " vs NewCost: " << NewCost
		<< "\n");
		if (NewCost < OldCost) {
		spatelUnsubmitted Not Done Reply Inline Actions If I'm seeing it correctly, this logic ensures we won't try to re-shuffle a mask with duplicate elements like: <i32 1, i32 2, i32 1, i32 3> Is that intentional? Either way, it would be good to have a test like that. spatel: If I'm seeing it correctly, this logic ensures we won't try to re-shuffle a mask with duplicate…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I think it was, yes. But mostly just because it simplified things and it didn't sound as profitable (as-in the best order to aim for is an identity mask and having extra lanes in there can mess that up). There is a test for it in reduceshuffle_twoin_repeat_v4i32. I was aiming for something that removed the shuffle - either because it can be removed or it becomes a concat or something simple like that. It might be profitable in more cases though, and so long as we are cost modelling it, sorting the mask indices should be OK. I can change it to that, as it has the added benefit that is simplifies the code a little. dmgreen: I think it was, yes. But mostly just because it simplified things and it didn't sound as…
		Builder.SetInsertPoint(Shuffle);
		Value *NewShuffle = Builder.CreateShuffleVector(
		Shuffle->getOperand(0), Shuffle->getOperand(1), ConcatMask);
		LLVM_DEBUG(dbgs() << "Created new shuffle: " << *NewShuffle << "\n");
		replaceValue(Shuffle, NewShuffle);
		}

		return false;
		}

/// This is the entry point for all transforms. Pass manager differences are		/// This is the entry point for all transforms. Pass manager differences are
/// handled in the callers of this function.		/// handled in the callers of this function.
bool VectorCombine::run() {		bool VectorCombine::run() {
if (DisableVectorCombine)		if (DisableVectorCombine)
return false;		return false;

// Don't attempt vectorization if the target does not support vectors.		// Don't attempt vectorization if the target does not support vectors.
if (!TTI.getNumberOfRegisters(TTI.getRegisterClassForType(/Vector/ true)))		if (!TTI.getNumberOfRegisters(TTI.getRegisterClassForType(/Vector/ true)))
return false;		return false;

bool MadeChange = false;		bool MadeChange = false;
auto FoldInst = [this, &MadeChange](Instruction &I) {		auto FoldInst = [this, &MadeChange](Instruction &I) {
Builder.SetInsertPoint(&I);		Builder.SetInsertPoint(&I);
if (!ScalarizationOnly) {		if (!ScalarizationOnly) {
MadeChange \|= vectorizeLoadInsert(I);		MadeChange \|= vectorizeLoadInsert(I);
MadeChange \|= foldExtractExtract(I);		MadeChange \|= foldExtractExtract(I);
MadeChange \|= foldBitcastShuf(I);		MadeChange \|= foldBitcastShuf(I);
MadeChange \|= foldExtractedCmps(I);		MadeChange \|= foldExtractedCmps(I);
MadeChange \|= foldShuffleOfBinops(I);		MadeChange \|= foldShuffleOfBinops(I);
		MadeChange \|= foldShuffleFromReductions(I);
}		}
MadeChange \|= scalarizeBinopOrCmp(I);		MadeChange \|= scalarizeBinopOrCmp(I);
MadeChange \|= scalarizeLoadExtract(I);		MadeChange \|= scalarizeLoadExtract(I);
MadeChange \|= foldSingleElementStore(I);		MadeChange \|= foldSingleElementStore(I);
};		};
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Transforms/VectorCombine/AArch64/vecreduce-shuffle.ll

Show All 10 Lines
;		;
%x = xor <4 x i32> %a, %b		%x = xor <4 x i32> %a, %b
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_onein_v4i32(<4 x i32> %a) {		define i32 @reduceshuffle_onein_v4i32(<4 x i32> %a) {
; CHECK-LABEL: @reduceshuffle_onein_v4i32(		; CHECK-LABEL: @reduceshuffle_onein_v4i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
		RKSimonUnsubmitted Not Done Reply Inline Actions What about we start with a much simpler combine in InstSimplify/Combine that just removes a permute it is feeds a commutative reduction? RKSimon: What about we start with a much simpler combine in InstSimplify/Combine that just removes a…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions We already have that combine in https://github.com/llvm/llvm-project/blob/431e93f4f56e5b839bf1f746d65139ccf3ca2232/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L2601. This is trying to add a more complex version for vector combine that can handle more patterns, and potentially remove more shuffles that are cheaper than the original (not just single-input identity masks). This test I wanted to leave in to make sure there was vector-combine testing for it, even if the simpler combine exists elsewhere too. dmgreen: We already have that combine in https://github.com/llvm/llvm…
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%x = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_onein_const_v4i32(<4 x i32> %a) {		define i32 @reduceshuffle_onein_const_v4i32(<4 x i32> %a) {
; CHECK-LABEL: @reduceshuffle_onein_const_v4i32(		; CHECK-LABEL: @reduceshuffle_onein_const_v4i32(
; CHECK-NEXT: [[S:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		; CHECK-NEXT: [[S:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[X:%.*]] = xor <4 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1>		; CHECK-NEXT: [[X:%.*]] = xor <4 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%s = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%s = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
%x = xor <4 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1>		%x = xor <4 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
Show All 33 Lines
;		;
%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 1, i32 4, i32 5>		%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_concat_v4i32(<2 x i32> %a, <2 x i32> %b) {		define i32 @reduceshuffle_twoin_concat_v4i32(<2 x i32> %a, <2 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_concat_v4i32(		; CHECK-LABEL: @reduceshuffle_twoin_concat_v4i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <2 x i32> [[A:%.]], <2 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 1, i32 3>		; CHECK-NEXT: [[X:%.]] = shufflevector <2 x i32> [[A:%.]], <2 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <2 x i32> %a, <2 x i32> %b, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%x = shufflevector <2 x i32> %a, <2 x i32> %b, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_lowelts_v4i32(<4 x i32> %a, <4 x i32> %b) {		define i32 @reduceshuffle_twoin_lowelts_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_lowelts_v4i32(		; CHECK-LABEL: @reduceshuffle_twoin_lowelts_v4i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 5, i32 1, i32 4>		; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 5, i32 1, i32 4>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 1, i32 4>		%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 1, i32 4>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_notlowelts_v4i32(<4 x i32> %a, <4 x i32> %b) {		define i32 @reduceshuffle_twoin_notlowelts_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_notlowelts_v4i32(		; CHECK-LABEL: @reduceshuffle_twoin_notlowelts_v4i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 6, i32 1, i32 4>		; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 1, i32 4, i32 6>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 6, i32 1, i32 4>		%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 6, i32 1, i32 4>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_repeat_v4i32(<4 x i32> %a, <4 x i32> %b) {		define i32 @reduceshuffle_twoin_repeat_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_repeat_v4i32(		; CHECK-LABEL: @reduceshuffle_twoin_repeat_v4i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 1, i32 4>		; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 1, i32 4>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 4, i32 1, i32 4>		%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 4, i32 1, i32 4>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_uneven_v4i32(<4 x i32> %a, <4 x i32> %b) {		define i32 @reduceshuffle_twoin_uneven_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_uneven_v4i32(		; CHECK-LABEL: @reduceshuffle_twoin_uneven_v4i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 1, i32 4>		; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 4>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 2, i32 1, i32 4>		%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 2, i32 1, i32 4>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_undef_v4i32(<4 x i32> %a, <4 x i32> %b) {		define i32 @reduceshuffle_twoin_undef_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_undef_v4i32(		; CHECK-LABEL: @reduceshuffle_twoin_undef_v4i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 undef, i32 1, i32 5>		; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 undef, i32 1, i32 5>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 undef, i32 1, i32 5>		%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 undef, i32 1, i32 5>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_undef2_v4i32(<4 x i32> %a, <4 x i32> %b) {		define i32 @reduceshuffle_twoin_undef2_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_undef2_v4i32(		; CHECK-LABEL: @reduceshuffle_twoin_undef2_v4i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 4, i32 undef, i32 1, i32 0>		; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 1, i32 4, i32 undef>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 undef, i32 1, i32 0>		%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 undef, i32 1, i32 0>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_multiundef_v4i32(<4 x i32> %a, <4 x i32> %b) {		define i32 @reduceshuffle_twoin_multiundef_v4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_multiundef_v4i32(		; CHECK-LABEL: @reduceshuffle_twoin_multiundef_v4i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 undef, i32 undef, i32 1>		; CHECK-NEXT: [[X:%.]] = shufflevector <4 x i32> [[A:%.]], <4 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 undef, i32 undef, i32 1>		%x = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 undef, i32 undef, i32 1>
%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %x)
ret i32 %r		ret i32 %r
}		}

▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
;		;
%x = xor <16 x i32> %a, %b		%x = xor <16 x i32> %a, %b
%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_onein_v16i32(<16 x i32> %a) {		define i32 @reduceshuffle_onein_v16i32(<16 x i32> %a) {
; CHECK-LABEL: @reduceshuffle_onein_v16i32(		; CHECK-LABEL: @reduceshuffle_onein_v16i32(
; CHECK-NEXT: [[X:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-NEXT: [[X:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%x = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%x = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_onein_ext_v16i32(<16 x i32> %a) {		define i32 @reduceshuffle_onein_ext_v16i32(<16 x i32> %a) {
; CHECK-LABEL: @reduceshuffle_onein_ext_v16i32(		; CHECK-LABEL: @reduceshuffle_onein_ext_v16i32(
; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[X:%.*]] = xor <16 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>		; CHECK-NEXT: [[X:%.*]] = xor <16 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%s = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%s = shufflevector <16 x i32> %a, <16 x i32> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>		%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_concat_v16i32(<8 x i32> %a, <8 x i32> %b) {		define i32 @reduceshuffle_twoin_concat_v16i32(<8 x i32> %a, <8 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_concat_v16i32(		; CHECK-LABEL: @reduceshuffle_twoin_concat_v16i32(
; CHECK-NEXT: [[S:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-NEXT: [[S:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[X:%.*]] = xor <16 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>		; CHECK-NEXT: [[X:%.*]] = xor <16 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%s = shufflevector <8 x i32> %a, <8 x i32> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%s = shufflevector <8 x i32> %a, <8 x i32> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>		%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_lowelt_v16i32(<16 x i32> %a, <16 x i32> %b) {		define i32 @reduceshuffle_twoin_lowelt_v16i32(<16 x i32> %a, <16 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_lowelt_v16i32(		; CHECK-LABEL: @reduceshuffle_twoin_lowelt_v16i32(
; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; CHECK-NEXT: [[X:%.*]] = xor <16 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>		; CHECK-NEXT: [[X:%.*]] = xor <16 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%s = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		%s = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>		%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)
ret i32 %r		ret i32 %r
Show All 9 Lines	;
%s = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 21, i32 22, i32 7, i32 24, i32 25, i32 10, i32 27, i32 28, i32 13, i32 30, i32 31>		%s = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 21, i32 22, i32 7, i32 24, i32 25, i32 10, i32 27, i32 28, i32 13, i32 30, i32 31>
%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>		%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_uneven_v16i32(<16 x i32> %a, <16 x i32> %b) {		define i32 @reduceshuffle_twoin_uneven_v16i32(<16 x i32> %a, <16 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_uneven_v16i32(		; CHECK-LABEL: @reduceshuffle_twoin_uneven_v16i32(
; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 8>		; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22>
; CHECK-NEXT: [[X:%.*]] = xor <16 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>		; CHECK-NEXT: [[X:%.*]] = xor <16 x i32> [[S]], <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[X]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%s = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 8>		%s = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 8>
%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>		%x = xor <16 x i32> %s, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)		%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %x)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_shr1_v16i32(<16 x i32> %a, <16 x i32> %b) {		define i32 @reduceshuffle_twoin_shr1_v16i32(<16 x i32> %a, <16 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_shr1_v16i32(		; CHECK-LABEL: @reduceshuffle_twoin_shr1_v16i32(
; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 16, i32 17, i32 5, i32 18, i32 19, i32 6, i32 20, i32 21, i32 7, i32 22, i32 23>		; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; CHECK-NEXT: [[A1:%.*]] = lshr <16 x i32> [[S]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[A1:%.*]] = lshr <16 x i32> [[S]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[A2:%.*]] = and <16 x i32> [[A1]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[A2:%.*]] = and <16 x i32> [[A1]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[A3:%.*]] = mul nuw <16 x i32> [[A2]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[A3:%.*]] = mul nuw <16 x i32> [[A2]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[A4:%.*]] = add <16 x i32> [[A3]], [[S]]		; CHECK-NEXT: [[A4:%.*]] = add <16 x i32> [[A3]], [[S]]
; CHECK-NEXT: [[A5:%.*]] = xor <16 x i32> [[A4]], [[A3]]		; CHECK-NEXT: [[A5:%.*]] = xor <16 x i32> [[A4]], [[A3]]
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[A5]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[A5]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%s = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 16, i32 17, i32 5, i32 18, i32 19, i32 6, i32 20, i32 21, i32 7, i32 22, i32 23>		%s = shufflevector <16 x i32> %a, <16 x i32> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 16, i32 17, i32 5, i32 18, i32 19, i32 6, i32 20, i32 21, i32 7, i32 22, i32 23>
%a1 = lshr <16 x i32> %s, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>		%a1 = lshr <16 x i32> %s, <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
%a2 = and <16 x i32> %a1, <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>		%a2 = and <16 x i32> %a1, <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
%a3 = mul nuw <16 x i32> %a2, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>		%a3 = mul nuw <16 x i32> %a2, <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
%a4 = add <16 x i32> %a3, %s		%a4 = add <16 x i32> %a3, %s
%a5 = xor <16 x i32> %a4, %a3		%a5 = xor <16 x i32> %a4, %a3
%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %a5)		%r = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %a5)
ret i32 %r		ret i32 %r
}		}

define i32 @reduceshuffle_twoin_shr2_v16i32(<16 x i32> %a, <16 x i32> %b) {		define i32 @reduceshuffle_twoin_shr2_v16i32(<16 x i32> %a, <16 x i32> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_shr2_v16i32(		; CHECK-LABEL: @reduceshuffle_twoin_shr2_v16i32(
; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 16, i32 17, i32 5, i32 18, i32 19, i32 6, i32 20, i32 21, i32 7, i32 22, i32 23>		; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i32> [[A:%.]], <16 x i32> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; CHECK-NEXT: [[A1:%.*]] = lshr <16 x i32> <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>, [[S]]		; CHECK-NEXT: [[A1:%.*]] = lshr <16 x i32> <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>, [[S]]
; CHECK-NEXT: [[A2:%.*]] = and <16 x i32> [[A1]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[A2:%.*]] = and <16 x i32> [[A1]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[A3:%.*]] = mul nuw <16 x i32> [[A2]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[A3:%.*]] = mul nuw <16 x i32> [[A2]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[A4:%.*]] = add <16 x i32> [[A3]], [[S]]		; CHECK-NEXT: [[A4:%.*]] = add <16 x i32> [[A3]], [[S]]
; CHECK-NEXT: [[A5:%.*]] = xor <16 x i32> [[A4]], [[A3]]		; CHECK-NEXT: [[A5:%.*]] = xor <16 x i32> [[A4]], [[A3]]
; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[A5]])		; CHECK-NEXT: [[R:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[A5]])
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
Show All 17 Lines
;		;
%x = xor <16 x i16> %a, %b		%x = xor <16 x i16> %a, %b
%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)		%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)
ret i16 %r		ret i16 %r
}		}

define i16 @reduceshuffle_onein_v16i16(<16 x i16> %a) {		define i16 @reduceshuffle_onein_v16i16(<16 x i16> %a) {
; CHECK-LABEL: @reduceshuffle_onein_v16i16(		; CHECK-LABEL: @reduceshuffle_onein_v16i16(
; CHECK-NEXT: [[X:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-NEXT: [[X:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[X]])
; CHECK-NEXT: ret i16 [[R]]		; CHECK-NEXT: ret i16 [[R]]
;		;
%x = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%x = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)		%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)
ret i16 %r		ret i16 %r
}		}

define i16 @reduceshuffle_onein_ext_v16i16(<16 x i16> %a) {		define i16 @reduceshuffle_onein_ext_v16i16(<16 x i16> %a) {
; CHECK-LABEL: @reduceshuffle_onein_ext_v16i16(		; CHECK-LABEL: @reduceshuffle_onein_ext_v16i16(
; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[X:%.*]] = xor <16 x i16> [[S]], <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>		; CHECK-NEXT: [[X:%.*]] = xor <16 x i16> [[S]], <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[X]])
; CHECK-NEXT: ret i16 [[R]]		; CHECK-NEXT: ret i16 [[R]]
;		;
%s = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%s = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%x = xor <16 x i16> %s, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>		%x = xor <16 x i16> %s, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)		%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)
ret i16 %r		ret i16 %r
}		}

define i16 @reduceshuffle_twoin_concat_v16i16(<8 x i16> %a, <8 x i16> %b) {		define i16 @reduceshuffle_twoin_concat_v16i16(<8 x i16> %a, <8 x i16> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_concat_v16i16(		; CHECK-LABEL: @reduceshuffle_twoin_concat_v16i16(
; CHECK-NEXT: [[S:%.]] = shufflevector <8 x i16> [[A:%.]], <8 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-NEXT: [[S:%.]] = shufflevector <8 x i16> [[A:%.]], <8 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEXT: [[X:%.*]] = xor <16 x i16> [[S]], <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>		; CHECK-NEXT: [[X:%.*]] = xor <16 x i16> [[S]], <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[X]])
; CHECK-NEXT: ret i16 [[R]]		; CHECK-NEXT: ret i16 [[R]]
;		;
%s = shufflevector <8 x i16> %a, <8 x i16> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%s = shufflevector <8 x i16> %a, <8 x i16> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%x = xor <16 x i16> %s, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>		%x = xor <16 x i16> %s, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)		%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)
ret i16 %r		ret i16 %r
}		}

define i16 @reduceshuffle_twoin_lowelt_v16i16(<16 x i16> %a, <16 x i16> %b) {		define i16 @reduceshuffle_twoin_lowelt_v16i16(<16 x i16> %a, <16 x i16> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_lowelt_v16i16(		; CHECK-LABEL: @reduceshuffle_twoin_lowelt_v16i16(
; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; CHECK-NEXT: [[X:%.*]] = xor <16 x i16> [[S]], <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>		; CHECK-NEXT: [[X:%.*]] = xor <16 x i16> [[S]], <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[X]])		; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[X]])
; CHECK-NEXT: ret i16 [[R]]		; CHECK-NEXT: ret i16 [[R]]
;		;
%s = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		%s = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>
%x = xor <16 x i16> %s, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>		%x = xor <16 x i16> %s, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)		%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)
ret i16 %r		ret i16 %r
Show All 22 Lines	;
%s = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 8>		%s = shufflevector <16 x i16> %a, <16 x i16> %b, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 8>
%x = xor <16 x i16> %s, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>		%x = xor <16 x i16> %s, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)		%r = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> %x)
ret i16 %r		ret i16 %r
}		}

define i16 @reduceshuffle_twoin_ext_v16i16(<16 x i16> %a, <16 x i16> %b) {		define i16 @reduceshuffle_twoin_ext_v16i16(<16 x i16> %a, <16 x i16> %b) {
; CHECK-LABEL: @reduceshuffle_twoin_ext_v16i16(		; CHECK-LABEL: @reduceshuffle_twoin_ext_v16i16(
; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 16, i32 17, i32 5, i32 18, i32 19, i32 6, i32 20, i32 21, i32 7, i32 22, i32 23>		; CHECK-NEXT: [[S:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
; CHECK-NEXT: [[A1:%.*]] = lshr <16 x i16> [[S]], <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>		; CHECK-NEXT: [[A1:%.*]] = lshr <16 x i16> [[S]], <i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7, i16 7>
; CHECK-NEXT: [[A2:%.*]] = and <16 x i16> [[A1]], <i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257>		; CHECK-NEXT: [[A2:%.*]] = and <16 x i16> [[A1]], <i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257, i16 257>
; CHECK-NEXT: [[A3:%.*]] = mul nuw <16 x i16> [[A2]], <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>		; CHECK-NEXT: [[A3:%.*]] = mul nuw <16 x i16> [[A2]], <i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
; CHECK-NEXT: [[A4:%.*]] = add <16 x i16> [[A3]], [[S]]		; CHECK-NEXT: [[A4:%.*]] = add <16 x i16> [[A3]], [[S]]
; CHECK-NEXT: [[A5:%.*]] = xor <16 x i16> [[A4]], [[A3]]		; CHECK-NEXT: [[A5:%.*]] = xor <16 x i16> [[A4]], [[A3]]
; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[A5]])		; CHECK-NEXT: [[R:%.*]] = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> [[A5]])
; CHECK-NEXT: ret i16 [[R]]		; CHECK-NEXT: ret i16 [[R]]
;		;
Show All 14 Lines