This is an archive of the discontinued LLVM Phabricator instance.

[ARM] MVE interleaving load and stores.
ClosedPublic

Authored by dmgreen on Oct 24 2019, 9:30 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
SjoerdMeijer
samparker
simon_tatham
ostannard

Commits

rG882f23caeae5: [ARM] MVE interleaving load and stores.

Summary

Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases.

We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Oct 24 2019, 9:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2019, 9:30 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

samparker added inline comments.Oct 28 2019, 4:26 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
16751	What is Factor here? Is it not the number of elements..?
16807	debug
16979	How about performing this with a loop?

dmgreen marked 3 inline comments as done.Oct 30 2019, 11:37 AM

dmgreen added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
16751	Factor is the "n" in "vldn". So 2 or 4 for MVE, but also 3 for Neon. It comes from the elements of the shuffle, like the example in the comment at the start of this function. We really just need it to say that a VLD3 isn't legal on MVE.
16807	Ah!
16979	Yeah. Nice suggestion. It involves a pop because of the last item changing, but looks better.

Thanks. I've updated with the suggestions.

Also:

Added a load of extra tests.
Updated the cost model a little, notably for smaller than legal types that look like interleaves but actually use a single load with a vrev or a vmovn or something like it.
Err, formatted =)

samparker added inline comments.Nov 4 2019, 6:50 AM

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
786	if this is handling the smaller vector types, shouldn't this be <= 64 instead? Is the 2 below 'Factor'? If so it would be good to use the variable name or a clear explanation of what it is.

dmgreen marked an inline comment as done.Nov 5 2019, 6:57 AM

dmgreen added inline comments.

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
786	Oh, yeah, it's SrcVecTy, so yes <= 64 will be fine (128 will already be handled in the above if). The 2 below is VLDR, VREV or VLDR; VMOVN. 2 Instructions though. I'll add a comment.

Updated "vmovn" cost check.

Nice one. LGTM

This revision is now accepted and ready to land.Nov 6 2019, 2:03 AM

Closed by commit rG882f23caeae5: [ARM] MVE interleaving load and stores. (authored by dmgreen). · Explain WhyNov 19 2019, 11:01 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMISelLowering.h

2 lines

ARMISelLowering.cpp

121 lines

ARMTargetTransformInfo.cpp

25 lines

test/

CodeGen/

Thumb2/

555 lines

1580 lines

580 lines

1594 lines

Transforms/

InterleavedAccess/

ARM/

interleaved-accesses.ll

428 lines

LoopVectorize/

ARM/

mve-interleaved-cost.ll

108 lines

Diff 230109

llvm/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 598 Lines • ▼ Show 20 Lines	public:
bool shouldExpandShift(SelectionDAG &DAG, SDNode *N) const override;		bool shouldExpandShift(SelectionDAG &DAG, SDNode *N) const override;

CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool isVarArg) const;		CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool isVarArg) const;
CCAssignFn *CCAssignFnForReturn(CallingConv::ID CC, bool isVarArg) const;		CCAssignFn *CCAssignFnForReturn(CallingConv::ID CC, bool isVarArg) const;

/// Returns true if \p VecTy is a legal interleaved access type. This		/// Returns true if \p VecTy is a legal interleaved access type. This
/// function checks the vector element type and the overall width of the		/// function checks the vector element type and the overall width of the
/// vector.		/// vector.
bool isLegalInterleavedAccessType(VectorType *VecTy,		bool isLegalInterleavedAccessType(unsigned Factor, VectorType *VecTy,
const DataLayout &DL) const;		const DataLayout &DL) const;

bool alignLoopsWithOptSize() const override;		bool alignLoopsWithOptSize() const override;

/// Returns the number of interleaved accesses that will be generated when		/// Returns the number of interleaved accesses that will be generated when
/// lowering accesses of the given type.		/// lowering accesses of the given type.
unsigned getNumInterleavedAccesses(VectorType *VecTy,		unsigned getNumInterleavedAccesses(VectorType *VecTy,
const DataLayout &DL) const;		const DataLayout &DL) const;
▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,675 Lines • ▼ Show 20 Lines
/// will generate when lowering accesses of the given type.		/// will generate when lowering accesses of the given type.
unsigned		unsigned
ARMTargetLowering::getNumInterleavedAccesses(VectorType *VecTy,		ARMTargetLowering::getNumInterleavedAccesses(VectorType *VecTy,
const DataLayout &DL) const {		const DataLayout &DL) const {
return (DL.getTypeSizeInBits(VecTy) + 127) / 128;		return (DL.getTypeSizeInBits(VecTy) + 127) / 128;
}		}

bool ARMTargetLowering::isLegalInterleavedAccessType(		bool ARMTargetLowering::isLegalInterleavedAccessType(
VectorType *VecTy, const DataLayout &DL) const {		unsigned Factor, VectorType *VecTy, const DataLayout &DL) const {

unsigned VecSize = DL.getTypeSizeInBits(VecTy);		unsigned VecSize = DL.getTypeSizeInBits(VecTy);
unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());		unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());

		if (!Subtarget->hasNEON() && !Subtarget->hasMVEIntegerOps())
		return false;

// Ensure the vector doesn't have f16 elements. Even though we could do an		// Ensure the vector doesn't have f16 elements. Even though we could do an
// i16 vldN, we can't hold the f16 vectors and will end up converting via		// i16 vldN, we can't hold the f16 vectors and will end up converting via
// f32.		// f32.
if (VecTy->getElementType()->isHalfTy())		if (Subtarget->hasNEON() && VecTy->getElementType()->isHalfTy())
		return false;
		if (Subtarget->hasMVEIntegerOps() && Factor == 3)
return false;		return false;

// Ensure the number of vector elements is greater than 1.		// Ensure the number of vector elements is greater than 1.
if (VecTy->getNumElements() < 2)		if (VecTy->getNumElements() < 2)
return false;		return false;

// Ensure the element type is legal.		// Ensure the element type is legal.
if (ElSize != 8 && ElSize != 16 && ElSize != 32)		if (ElSize != 8 && ElSize != 16 && ElSize != 32)
return false;		return false;

// Ensure the total vector size is 64 or a multiple of 128. Types larger than		// Ensure the total vector size is 64 or a multiple of 128. Types larger than
// 128 will be split into multiple interleaved accesses.		// 128 will be split into multiple interleaved accesses.
return VecSize == 64 \|\| VecSize % 128 == 0;		if (Subtarget->hasNEON() && VecSize == 64)
		return true;
		return VecSize % 128 == 0;
}		}

unsigned ARMTargetLowering::getMaxSupportedInterleaveFactor() const {		unsigned ARMTargetLowering::getMaxSupportedInterleaveFactor() const {
if (Subtarget->hasNEON())		if (Subtarget->hasNEON())
return 4;		return 4;
		if (Subtarget->hasMVEIntegerOps())
		return 4;
return TargetLoweringBase::getMaxSupportedInterleaveFactor();		return TargetLoweringBase::getMaxSupportedInterleaveFactor();
}		}

/// Lower an interleaved load into a vldN intrinsic.		/// Lower an interleaved load into a vldN intrinsic.
///		///
/// E.g. Lower an interleaved load (Factor = 2):		/// E.g. Lower an interleaved load (Factor = 2):
/// %wide.vec = load <8 x i32>, <8 x i32>* %ptr, align 4		/// %wide.vec = load <8 x i32>, <8 x i32>* %ptr, align 4
/// %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> ; Extract even elements		/// %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> ; Extract even elements
Show All 15 Lines	bool ARMTargetLowering::lowerInterleavedLoad(
VectorType *VecTy = Shuffles[0]->getType();		VectorType *VecTy = Shuffles[0]->getType();
Type *EltTy = VecTy->getVectorElementType();		Type *EltTy = VecTy->getVectorElementType();

const DataLayout &DL = LI->getModule()->getDataLayout();		const DataLayout &DL = LI->getModule()->getDataLayout();

// Skip if we do not have NEON and skip illegal vector types. We can		// Skip if we do not have NEON and skip illegal vector types. We can
// "legalize" wide vector types into multiple interleaved accesses as long as		// "legalize" wide vector types into multiple interleaved accesses as long as
// the vector types are divisible by 128.		// the vector types are divisible by 128.
if (!Subtarget->hasNEON() \|\| !isLegalInterleavedAccessType(VecTy, DL))		if (!isLegalInterleavedAccessType(Factor, VecTy, DL))
		samparkerUnsubmitted Not Done Reply Inline Actions What is Factor here? Is it not the number of elements..? samparker: What is Factor here? Is it not the number of elements..?
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Factor is the "n" in "vldn". So 2 or 4 for MVE, but also 3 for Neon. It comes from the elements of the shuffle, like the example in the comment at the start of this function. We really just need it to say that a VLD3 isn't legal on MVE. dmgreen: Factor is the "n" in "vldn". So 2 or 4 for MVE, but also 3 for Neon. It comes from the elements…
return false;		return false;

unsigned NumLoads = getNumInterleavedAccesses(VecTy, DL);		unsigned NumLoads = getNumInterleavedAccesses(VecTy, DL);

// A pointer vector can not be the return type of the ldN intrinsics. Need to		// A pointer vector can not be the return type of the ldN intrinsics. Need to
// load integer vectors first and then convert to pointer vectors.		// load integer vectors first and then convert to pointer vectors.
if (EltTy->isPointerTy())		if (EltTy->isPointerTy())
VecTy =		VecTy =
Show All 15 Lines	if (NumLoads > 1) {
// element type.		// element type.
BaseAddr = Builder.CreateBitCast(		BaseAddr = Builder.CreateBitCast(
BaseAddr, VecTy->getVectorElementType()->getPointerTo(		BaseAddr, VecTy->getVectorElementType()->getPointerTo(
LI->getPointerAddressSpace()));		LI->getPointerAddressSpace()));
}		}

assert(isTypeLegal(EVT::getEVT(VecTy)) && "Illegal vldN vector type!");		assert(isTypeLegal(EVT::getEVT(VecTy)) && "Illegal vldN vector type!");

		auto createLoadIntrinsic = [&](Value *BaseAddr) {
		if (Subtarget->hasNEON()) {
Type *Int8Ptr = Builder.getInt8PtrTy(LI->getPointerAddressSpace());		Type *Int8Ptr = Builder.getInt8PtrTy(LI->getPointerAddressSpace());
Type *Tys[] = {VecTy, Int8Ptr};		Type *Tys[] = {VecTy, Int8Ptr};
static const Intrinsic::ID LoadInts[3] = {Intrinsic::arm_neon_vld2,		static const Intrinsic::ID LoadInts[3] = {Intrinsic::arm_neon_vld2,
Intrinsic::arm_neon_vld3,		Intrinsic::arm_neon_vld3,
Intrinsic::arm_neon_vld4};		Intrinsic::arm_neon_vld4};
Function *VldnFunc =		Function *VldnFunc =
Intrinsic::getDeclaration(LI->getModule(), LoadInts[Factor - 2], Tys);		Intrinsic::getDeclaration(LI->getModule(), LoadInts[Factor - 2], Tys);

		SmallVector<Value *, 2> Ops;
		Ops.push_back(Builder.CreateBitCast(BaseAddr, Int8Ptr));
		Ops.push_back(Builder.getInt32(LI->getAlignment()));

		return Builder.CreateCall(VldnFunc, Ops, "vldN");
		} else {
		assert((Factor == 2 \|\| Factor == 4) &&
		"expected interleave factor of 2 or 4 for MVE");
		Intrinsic::ID LoadInts =
		Factor == 2 ? Intrinsic::arm_mve_vld2q : Intrinsic::arm_mve_vld4q;
		Type *VecEltTy = VecTy->getVectorElementType()->getPointerTo(
		LI->getPointerAddressSpace());
		Type *Tys[] = {VecTy, VecEltTy};
		Function *VldnFunc =
		Intrinsic::getDeclaration(LI->getModule(), LoadInts, Tys);
		samparkerUnsubmitted Not Done Reply Inline Actions debug samparker: debug
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Ah! dmgreen: Ah!

		SmallVector<Value *, 2> Ops;
		Ops.push_back(Builder.CreateBitCast(BaseAddr, VecEltTy));
		return Builder.CreateCall(VldnFunc, Ops, "vldN");
		}
		};

// Holds sub-vectors extracted from the load intrinsic return values. The		// Holds sub-vectors extracted from the load intrinsic return values. The
// sub-vectors are associated with the shufflevector instructions they will		// sub-vectors are associated with the shufflevector instructions they will
// replace.		// replace.
DenseMap<ShuffleVectorInst , SmallVector<Value , 4>> SubVecs;		DenseMap<ShuffleVectorInst , SmallVector<Value , 4>> SubVecs;

for (unsigned LoadCount = 0; LoadCount < NumLoads; ++LoadCount) {		for (unsigned LoadCount = 0; LoadCount < NumLoads; ++LoadCount) {
// If we're generating more than one load, compute the base address of		// If we're generating more than one load, compute the base address of
// subsequent loads as an offset from the previous.		// subsequent loads as an offset from the previous.
if (LoadCount > 0)		if (LoadCount > 0)
BaseAddr =		BaseAddr =
Builder.CreateConstGEP1_32(VecTy->getVectorElementType(), BaseAddr,		Builder.CreateConstGEP1_32(VecTy->getVectorElementType(), BaseAddr,
VecTy->getVectorNumElements() * Factor);		VecTy->getVectorNumElements() * Factor);

SmallVector<Value *, 2> Ops;		CallInst *VldN = createLoadIntrinsic(BaseAddr);
Ops.push_back(Builder.CreateBitCast(BaseAddr, Int8Ptr));
Ops.push_back(Builder.getInt32(LI->getAlignment()));

CallInst *VldN = Builder.CreateCall(VldnFunc, Ops, "vldN");

// Replace uses of each shufflevector with the corresponding vector loaded		// Replace uses of each shufflevector with the corresponding vector loaded
// by ldN.		// by ldN.
for (unsigned i = 0; i < Shuffles.size(); i++) {		for (unsigned i = 0; i < Shuffles.size(); i++) {
ShuffleVectorInst *SV = Shuffles[i];		ShuffleVectorInst *SV = Shuffles[i];
unsigned Index = Indices[i];		unsigned Index = Indices[i];

Value *SubVec = Builder.CreateExtractValue(VldN, Index);		Value *SubVec = Builder.CreateExtractValue(VldN, Index);
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::lowerInterleavedStore(StoreInst *SI,
Type *EltTy = VecTy->getVectorElementType();		Type *EltTy = VecTy->getVectorElementType();
VectorType *SubVecTy = VectorType::get(EltTy, LaneLen);		VectorType *SubVecTy = VectorType::get(EltTy, LaneLen);

const DataLayout &DL = SI->getModule()->getDataLayout();		const DataLayout &DL = SI->getModule()->getDataLayout();

// Skip if we do not have NEON and skip illegal vector types. We can		// Skip if we do not have NEON and skip illegal vector types. We can
// "legalize" wide vector types into multiple interleaved accesses as long as		// "legalize" wide vector types into multiple interleaved accesses as long as
// the vector types are divisible by 128.		// the vector types are divisible by 128.
if (!Subtarget->hasNEON() \|\| !isLegalInterleavedAccessType(SubVecTy, DL))		if (!isLegalInterleavedAccessType(Factor, SubVecTy, DL))
return false;		return false;

unsigned NumStores = getNumInterleavedAccesses(SubVecTy, DL);		unsigned NumStores = getNumInterleavedAccesses(SubVecTy, DL);

Value *Op0 = SVI->getOperand(0);		Value *Op0 = SVI->getOperand(0);
Value *Op1 = SVI->getOperand(1);		Value *Op1 = SVI->getOperand(1);
IRBuilder<> Builder(SI);		IRBuilder<> Builder(SI);

Show All 27 Lines	BaseAddr = Builder.CreateBitCast(
BaseAddr, SubVecTy->getVectorElementType()->getPointerTo(		BaseAddr, SubVecTy->getVectorElementType()->getPointerTo(
SI->getPointerAddressSpace()));		SI->getPointerAddressSpace()));
}		}

assert(isTypeLegal(EVT::getEVT(SubVecTy)) && "Illegal vstN vector type!");		assert(isTypeLegal(EVT::getEVT(SubVecTy)) && "Illegal vstN vector type!");

auto Mask = SVI->getShuffleMask();		auto Mask = SVI->getShuffleMask();

Type *Int8Ptr = Builder.getInt8PtrTy(SI->getPointerAddressSpace());		auto createStoreIntrinsic = [&](Value *BaseAddr,
Type *Tys[] = {Int8Ptr, SubVecTy};		SmallVectorImpl<Value *> &Shuffles) {
		if (Subtarget->hasNEON()) {
static const Intrinsic::ID StoreInts[3] = {Intrinsic::arm_neon_vst2,		static const Intrinsic::ID StoreInts[3] = {Intrinsic::arm_neon_vst2,
Intrinsic::arm_neon_vst3,		Intrinsic::arm_neon_vst3,
Intrinsic::arm_neon_vst4};		Intrinsic::arm_neon_vst4};
		Type *Int8Ptr = Builder.getInt8PtrTy(SI->getPointerAddressSpace());
		Type *Tys[] = {Int8Ptr, SubVecTy};

		Function *VstNFunc = Intrinsic::getDeclaration(
		SI->getModule(), StoreInts[Factor - 2], Tys);

		SmallVector<Value *, 6> Ops;
		Ops.push_back(Builder.CreateBitCast(BaseAddr, Int8Ptr));
		for (auto S : Shuffles)
		Ops.push_back(S);
		Ops.push_back(Builder.getInt32(SI->getAlignment()));
		Builder.CreateCall(VstNFunc, Ops);
		} else {
		assert((Factor == 2 \|\| Factor == 4) &&
		"expected interleave factor of 2 or 4 for MVE");
		Intrinsic::ID StoreInts =
		Factor == 2 ? Intrinsic::arm_mve_vst2q : Intrinsic::arm_mve_vst4q;
		Type *EltPtrTy = SubVecTy->getVectorElementType()->getPointerTo(
		SI->getPointerAddressSpace());
		Type *Tys[] = {EltPtrTy, SubVecTy};
		Function *VstNFunc =
		Intrinsic::getDeclaration(SI->getModule(), StoreInts, Tys);

		samparkerUnsubmitted Not Done Reply Inline Actions How about performing this with a loop? samparker: How about performing this with a loop?
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah. Nice suggestion. It involves a pop because of the last item changing, but looks better. dmgreen: Yeah. Nice suggestion. It involves a pop because of the last item changing, but looks better.
		SmallVector<Value *, 6> Ops;
		Ops.push_back(Builder.CreateBitCast(BaseAddr, EltPtrTy));
		for (auto S : Shuffles)
		Ops.push_back(S);
		for (unsigned F = 0; F < Factor; F++) {
		Ops.push_back(Builder.getInt32(F));
		Builder.CreateCall(VstNFunc, Ops);
		Ops.pop_back();
		}
		}
		};

for (unsigned StoreCount = 0; StoreCount < NumStores; ++StoreCount) {		for (unsigned StoreCount = 0; StoreCount < NumStores; ++StoreCount) {
// If we generating more than one store, we compute the base address of		// If we generating more than one store, we compute the base address of
// subsequent stores as an offset from the previous.		// subsequent stores as an offset from the previous.
if (StoreCount > 0)		if (StoreCount > 0)
BaseAddr = Builder.CreateConstGEP1_32(SubVecTy->getVectorElementType(),		BaseAddr = Builder.CreateConstGEP1_32(SubVecTy->getVectorElementType(),
BaseAddr, LaneLen * Factor);		BaseAddr, LaneLen * Factor);

SmallVector<Value *, 6> Ops;		SmallVector<Value *, 4> Shuffles;
Ops.push_back(Builder.CreateBitCast(BaseAddr, Int8Ptr));

Function *VstNFunc =
Intrinsic::getDeclaration(SI->getModule(), StoreInts[Factor - 2], Tys);

// Split the shufflevector operands into sub vectors for the new vstN call.		// Split the shufflevector operands into sub vectors for the new vstN call.
for (unsigned i = 0; i < Factor; i++) {		for (unsigned i = 0; i < Factor; i++) {
unsigned IdxI = StoreCount * LaneLen * Factor + i;		unsigned IdxI = StoreCount * LaneLen * Factor + i;
if (Mask[IdxI] >= 0) {		if (Mask[IdxI] >= 0) {
Ops.push_back(Builder.CreateShuffleVector(		Shuffles.push_back(Builder.CreateShuffleVector(
Op0, Op1, createSequentialMask(Builder, Mask[IdxI], LaneLen, 0)));		Op0, Op1, createSequentialMask(Builder, Mask[IdxI], LaneLen, 0)));
} else {		} else {
unsigned StartMask = 0;		unsigned StartMask = 0;
for (unsigned j = 1; j < LaneLen; j++) {		for (unsigned j = 1; j < LaneLen; j++) {
unsigned IdxJ = StoreCount * LaneLen * Factor + j;		unsigned IdxJ = StoreCount * LaneLen * Factor + j;
if (Mask[IdxJ * Factor + IdxI] >= 0) {		if (Mask[IdxJ * Factor + IdxI] >= 0) {
StartMask = Mask[IdxJ * Factor + IdxI] - IdxJ;		StartMask = Mask[IdxJ * Factor + IdxI] - IdxJ;
break;		break;
}		}
}		}
// Note: If all elements in a chunk are undefs, StartMask=0!		// Note: If all elements in a chunk are undefs, StartMask=0!
// Note: Filling undef gaps with random elements is ok, since		// Note: Filling undef gaps with random elements is ok, since
// those elements were being written anyway (with undefs).		// those elements were being written anyway (with undefs).
// In the case of all undefs we're defaulting to using elems from 0		// In the case of all undefs we're defaulting to using elems from 0
// Note: StartMask cannot be negative, it's checked in		// Note: StartMask cannot be negative, it's checked in
// isReInterleaveMask		// isReInterleaveMask
Ops.push_back(Builder.CreateShuffleVector(		Shuffles.push_back(Builder.CreateShuffleVector(
Op0, Op1, createSequentialMask(Builder, StartMask, LaneLen, 0)));		Op0, Op1, createSequentialMask(Builder, StartMask, LaneLen, 0)));
}		}
}		}

Ops.push_back(Builder.getInt32(SI->getAlignment()));		createStoreIntrinsic(BaseAddr, Shuffles);
Builder.CreateCall(VstNFunc, Ops);
}		}
return true;		return true;
}		}

enum HABaseType {		enum HABaseType {
HA_UNKNOWN = 0,		HA_UNKNOWN = 0,
HA_FLOAT,		HA_FLOAT,
HA_DOUBLE,		HA_DOUBLE,
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 749 Lines • ▼ Show 20 Lines	if (ST->hasNEON() && Src->isVectorTy() &&
return LT.first * 4;		return LT.first * 4;
}		}
int BaseCost = ST->hasMVEIntegerOps() && Src->isVectorTy()		int BaseCost = ST->hasMVEIntegerOps() && Src->isVectorTy()
? ST->getMVEVectorCostFactor()		? ST->getMVEVectorCostFactor()
: 1;		: 1;
return BaseCost * LT.first;		return BaseCost * LT.first;
}		}

int ARMTTIImpl::getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,		int ARMTTIImpl::getInterleavedMemoryOpCost(
unsigned Factor,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
ArrayRef<unsigned> Indices,		unsigned Alignment, unsigned AddressSpace, bool UseMaskForCond,
unsigned Alignment,
unsigned AddressSpace,
bool UseMaskForCond,
bool UseMaskForGaps) {		bool UseMaskForGaps) {
assert(Factor >= 2 && "Invalid interleave factor");		assert(Factor >= 2 && "Invalid interleave factor");
assert(isa<VectorType>(VecTy) && "Expect a vector type");		assert(isa<VectorType>(VecTy) && "Expect a vector type");

// vldN/vstN doesn't support vector types of i64/f64 element.		// vldN/vstN doesn't support vector types of i64/f64 element.
bool EltIs64Bits = DL.getTypeSizeInBits(VecTy->getScalarType()) == 64;		bool EltIs64Bits = DL.getTypeSizeInBits(VecTy->getScalarType()) == 64;

if (Factor <= TLI->getMaxSupportedInterleaveFactor() && !EltIs64Bits &&		if (Factor <= TLI->getMaxSupportedInterleaveFactor() && !EltIs64Bits &&
!UseMaskForCond && !UseMaskForGaps) {		!UseMaskForCond && !UseMaskForGaps) {
unsigned NumElts = VecTy->getVectorNumElements();		unsigned NumElts = VecTy->getVectorNumElements();
auto *SubVecTy = VectorType::get(VecTy->getScalarType(), NumElts / Factor);		auto *SubVecTy = VectorType::get(VecTy->getScalarType(), NumElts / Factor);

// vldN/vstN only support legal vector types of size 64 or 128 in bits.		// vldN/vstN only support legal vector types of size 64 or 128 in bits.
// Accesses having vector types that are a multiple of 128 bits can be		// Accesses having vector types that are a multiple of 128 bits can be
// matched to more than one vldN/vstN instruction.		// matched to more than one vldN/vstN instruction.
		int BaseCost = ST->hasMVEIntegerOps() ? ST->getMVEVectorCostFactor() : 1;
if (NumElts % Factor == 0 &&		if (NumElts % Factor == 0 &&
TLI->isLegalInterleavedAccessType(SubVecTy, DL))		TLI->isLegalInterleavedAccessType(Factor, SubVecTy, DL))
return Factor * TLI->getNumInterleavedAccesses(SubVecTy, DL);		return Factor * BaseCost * TLI->getNumInterleavedAccesses(SubVecTy, DL);

		// Some smaller than legal interleaved patterns are cheap as we can make
		// use of the vmovn or vrev patterns to interleave a standard load. This is
		// true for v4i8, v8i8 and v4i16 at least (but not for v4f16 as it is
		// promoted differently). The cost of 2 here is then a load and vrev or
		// vmovn.
		if (ST->hasMVEIntegerOps() && Factor == 2 && NumElts / Factor > 2 &&
		samparkerUnsubmitted Done Reply Inline Actions if this is handling the smaller vector types, shouldn't this be <= 64 instead? Is the 2 below 'Factor'? If so it would be good to use the variable name or a clear explanation of what it is. samparker: if this is handling the smaller vector types, shouldn't this be <= 64 instead? Is the 2 below…
		dmgreenAuthorUnsubmitted Not Done Reply Inline Actions Oh, yeah, it's SrcVecTy, so yes <= 64 will be fine (128 will already be handled in the above if). The 2 below is VLDR, VREV or VLDR; VMOVN. 2 Instructions though. I'll add a comment. dmgreen: Oh, yeah, it's SrcVecTy, so yes <= 64 will be fine (128 will already be handled in the above…
		VecTy->isIntOrIntVectorTy() && DL.getTypeSizeInBits(SubVecTy) <= 64)
		return 2 * BaseCost;
}		}

return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace,		Alignment, AddressSpace,
UseMaskForCond, UseMaskForGaps);		UseMaskForCond, UseMaskForGaps);
}		}

bool ARMTTIImpl::isLoweredToCall(const Function *F) {		bool ARMTTIImpl::isLoweredToCall(const Function *F) {
▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vld2.ll

Show All 22 Lines	entry:
%a = add <2 x i32> %s1, %s2		%a = add <2 x i32> %s1, %s2
store <2 x i32> %a, <2 x i32> *%dst		store <2 x i32> %a, <2 x i32> *%dst
ret void		ret void
}		}

define void @vld2_v4i32(<8 x i32> %src, <4 x i32> %dst) {		define void @vld2_v4i32(<8 x i32> %src, <4 x i32> %dst) {
; CHECK-LABEL: vld2_v4i32:		; CHECK-LABEL: vld2_v4i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vld20.32 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]		; CHECK-NEXT: vld21.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s8, s5		; CHECK-NEXT: vadd.i32 q0, q0, q1
; CHECK-NEXT: vmov.f32 s9, s7
; CHECK-NEXT: vmov.f32 s5, s6
; CHECK-NEXT: vmov.f32 s10, s1
; CHECK-NEXT: vmov.f32 s6, s0
; CHECK-NEXT: vmov.f32 s11, s3
; CHECK-NEXT: vmov.f32 s7, s2
; CHECK-NEXT: vadd.i32 q0, q1, q2
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <8 x i32>, <8 x i32>* %src, align 4		%l1 = load <8 x i32>, <8 x i32>* %src, align 4
%s1 = shufflevector <8 x i32> %l1, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>		%s1 = shufflevector <8 x i32> %l1, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%s2 = shufflevector <8 x i32> %l1, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>		%s2 = shufflevector <8 x i32> %l1, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
%a = add <4 x i32> %s1, %s2		%a = add <4 x i32> %s1, %s2
store <4 x i32> %a, <4 x i32> *%dst		store <4 x i32> %a, <4 x i32> *%dst
ret void		ret void
}		}

define void @vld2_v8i32(<16 x i32> %src, <8 x i32> %dst) {		define void @vld2_v8i32(<16 x i32> %src, <8 x i32> %dst) {
; CHECK-LABEL: vld2_v8i32:		; CHECK-LABEL: vld2_v8i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q0, [r0, #32]		; CHECK-NEXT: add.w r2, r0, #32
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: vld20.32 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0]		; CHECK-NEXT: vld20.32 {q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s12, s1		; CHECK-NEXT: vld21.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s13, s3		; CHECK-NEXT: vld21.32 {q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s1, s2		; CHECK-NEXT: vadd.i32 q0, q0, q1
; CHECK-NEXT: vmov.f32 s14, s5
; CHECK-NEXT: vmov.f32 s2, s4
; CHECK-NEXT: vmov.f32 s15, s7
; CHECK-NEXT: vmov.f32 s3, s6
; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vadd.i32 q0, q0, q3
; CHECK-NEXT: vmov.f32 s12, s9
; CHECK-NEXT: vmov.f32 s13, s11
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.f32 s9, s10
; CHECK-NEXT: vmov.f32 s14, s5
; CHECK-NEXT: vmov.f32 s10, s4
; CHECK-NEXT: vmov.f32 s15, s7
; CHECK-NEXT: vmov.f32 s11, s6
; CHECK-NEXT: vadd.i32 q1, q2, q3		; CHECK-NEXT: vadd.i32 q1, q2, q3
; CHECK-NEXT: vstrw.32 q1, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
		; CHECK-NEXT: vstrw.32 q1, [r1, #16]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <16 x i32>, <16 x i32>* %src, align 4		%l1 = load <16 x i32>, <16 x i32>* %src, align 4
%s1 = shufflevector <16 x i32> %l1, <16 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		%s1 = shufflevector <16 x i32> %l1, <16 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
%s2 = shufflevector <16 x i32> %l1, <16 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%s2 = shufflevector <16 x i32> %l1, <16 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%a = add <8 x i32> %s1, %s2		%a = add <8 x i32> %s1, %s2
store <8 x i32> %a, <8 x i32> *%dst		store <8 x i32> %a, <8 x i32> *%dst
ret void		ret void
}		}

define void @vld2_v16i32(<32 x i32> %src, <16 x i32> %dst) {		define void @vld2_v16i32(<32 x i32> %src, <16 x i32> %dst) {
; CHECK-LABEL: vld2_v16i32:		; CHECK-LABEL: vld2_v16i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: vpush {d8, d9, d10, d11}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vld20.32 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]		; CHECK-NEXT: add.w r12, r0, #96
; CHECK-NEXT: vldrw.u32 q4, [r0, #96]		; CHECK-NEXT: add.w r3, r0, #32
; CHECK-NEXT: vmov.f32 s8, s5		; CHECK-NEXT: add.w r2, r0, #64
; CHECK-NEXT: vmov.f32 s9, s7		; CHECK-NEXT: vld21.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s5, s6		; CHECK-NEXT: vadd.i32 q0, q0, q1
; CHECK-NEXT: vmov.f32 s10, s1		; CHECK-NEXT: vld20.32 {q1, q2}, [r2]
; CHECK-NEXT: vmov.f32 s6, s0		; CHECK-NEXT: vld20.32 {q3, q4}, [r12]
; CHECK-NEXT: vmov.f32 s11, s3		; CHECK-NEXT: vld20.32 {q5, q6}, [r3]
; CHECK-NEXT: vmov.f32 s7, s2		; CHECK-NEXT: vld21.32 {q5, q6}, [r3]
; CHECK-NEXT: vadd.i32 q0, q1, q2		; CHECK-NEXT: vld21.32 {q1, q2}, [r2]
; CHECK-NEXT: vldrw.u32 q2, [r0, #32]		; CHECK-NEXT: vld21.32 {q3, q4}, [r12]
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: @ kill: def $q1 killed $q1 killed $q1_q2
; CHECK-NEXT: vmov.f32 s12, s9		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vmov.f32 s13, s11		; CHECK-NEXT: vadd.i32 q5, q5, q6
; CHECK-NEXT: vmov.f32 s9, s10		; CHECK-NEXT: vadd.i32 q1, q1, q2
; CHECK-NEXT: vmov.f32 s14, s5		; CHECK-NEXT: vadd.i32 q3, q3, q4
; CHECK-NEXT: vmov.f32 s10, s4		; CHECK-NEXT: vstrw.32 q1, [r1, #32]
; CHECK-NEXT: vmov.f32 s15, s7
; CHECK-NEXT: vmov.f32 s11, s6
; CHECK-NEXT: vadd.i32 q1, q2, q3
; CHECK-NEXT: vldrw.u32 q2, [r0, #64]
; CHECK-NEXT: vldrw.u32 q3, [r0, #80]
; CHECK-NEXT: vmov.f32 s20, s9
; CHECK-NEXT: vmov.f32 s21, s11
; CHECK-NEXT: vmov.f32 s9, s10
; CHECK-NEXT: vmov.f32 s22, s13
; CHECK-NEXT: vmov.f32 s10, s12
; CHECK-NEXT: vmov.f32 s23, s15
; CHECK-NEXT: vmov.f32 s11, s14
; CHECK-NEXT: vldrw.u32 q3, [r0, #112]
; CHECK-NEXT: vadd.i32 q2, q2, q5
; CHECK-NEXT: vmov.f32 s20, s17
; CHECK-NEXT: vmov.f32 s21, s19
; CHECK-NEXT: vstrw.32 q2, [r1, #32]
; CHECK-NEXT: vmov.f32 s17, s18
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vmov.f32 s22, s13
; CHECK-NEXT: vstrw.32 q1, [r1, #16]
; CHECK-NEXT: vmov.f32 s18, s12
; CHECK-NEXT: vmov.f32 s23, s15
; CHECK-NEXT: vmov.f32 s19, s14
; CHECK-NEXT: vadd.i32 q3, q4, q5
; CHECK-NEXT: vstrw.32 q3, [r1, #48]		; CHECK-NEXT: vstrw.32 q3, [r1, #48]
; CHECK-NEXT: vpop {d8, d9, d10, d11}		; CHECK-NEXT: vstrw.32 q5, [r1, #16]
		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <32 x i32>, <32 x i32>* %src, align 4		%l1 = load <32 x i32>, <32 x i32>* %src, align 4
%s1 = shufflevector <32 x i32> %l1, <32 x i32> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>		%s1 = shufflevector <32 x i32> %l1, <32 x i32> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>
%s2 = shufflevector <32 x i32> %l1, <32 x i32> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>		%s2 = shufflevector <32 x i32> %l1, <32 x i32> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>
%a = add <16 x i32> %s1, %s2		%a = add <16 x i32> %s1, %s2
store <16 x i32> %a, <16 x i32> *%dst		store <16 x i32> %a, <16 x i32> *%dst
ret void		ret void
Show All 39 Lines	entry:
%a = add <4 x i16> %s1, %s2		%a = add <4 x i16> %s1, %s2
store <4 x i16> %a, <4 x i16> *%dst		store <4 x i16> %a, <4 x i16> *%dst
ret void		ret void
}		}

define void @vld2_v8i16(<16 x i16> %src, <8 x i16> %dst) {		define void @vld2_v8i16(<16 x i16> %src, <8 x i16> %dst) {
; CHECK-LABEL: vld2_v8i16:		; CHECK-LABEL: vld2_v8i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vld20.16 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vld21.16 {q0, q1}, [r0]
; CHECK-NEXT: vmov.u16 r2, q1[1]		; CHECK-NEXT: vadd.i16 q0, q0, q1
; CHECK-NEXT: vmov.u16 r0, q2[1]
; CHECK-NEXT: vmov.16 q0[0], r2
; CHECK-NEXT: vmov.u16 r2, q1[3]
; CHECK-NEXT: vmov.16 q0[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[5]
; CHECK-NEXT: vmov.16 q0[2], r2
; CHECK-NEXT: vmov.u16 r2, q1[7]
; CHECK-NEXT: vmov.16 q0[3], r2
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[3]
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vmov.u16 r0, q2[5]
; CHECK-NEXT: vmov.16 q0[6], r0
; CHECK-NEXT: vmov.u16 r0, q2[7]
; CHECK-NEXT: vmov.16 q0[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[0]
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov.u16 r0, q1[2]
; CHECK-NEXT: vmov.16 q3[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[4]
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov.u16 r0, q1[6]
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov.u16 r0, q2[0]
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[2]
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov.u16 r0, q2[4]
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmov.u16 r0, q2[6]
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vadd.i16 q0, q3, q0
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <16 x i16>, <16 x i16>* %src, align 4		%l1 = load <16 x i16>, <16 x i16>* %src, align 4
%s1 = shufflevector <16 x i16> %l1, <16 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		%s1 = shufflevector <16 x i16> %l1, <16 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
%s2 = shufflevector <16 x i16> %l1, <16 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%s2 = shufflevector <16 x i16> %l1, <16 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%a = add <8 x i16> %s1, %s2		%a = add <8 x i16> %s1, %s2
store <8 x i16> %a, <8 x i16> *%dst		store <8 x i16> %a, <8 x i16> *%dst
ret void		ret void
}		}

define void @vld2_v16i16(<32 x i16> %src, <16 x i16> %dst) {		define void @vld2_v16i16(<32 x i16> %src, <16 x i16> %dst) {
; CHECK-LABEL: vld2_v16i16:		; CHECK-LABEL: vld2_v16i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9}		; CHECK-NEXT: add.w r2, r0, #32
; CHECK-NEXT: vpush {d8, d9}		; CHECK-NEXT: vld20.16 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q1, [r0, #32]		; CHECK-NEXT: vld20.16 {q2, q3}, [r2]
; CHECK-NEXT: vldrw.u32 q2, [r0, #48]		; CHECK-NEXT: vld21.16 {q0, q1}, [r0]
; CHECK-NEXT: vmov.u16 r2, q1[1]		; CHECK-NEXT: vld21.16 {q2, q3}, [r2]
; CHECK-NEXT: vmov.16 q0[0], r2		; CHECK-NEXT: vadd.i16 q0, q0, q1
; CHECK-NEXT: vmov.u16 r2, q1[3]		; CHECK-NEXT: vadd.i16 q1, q2, q3
; CHECK-NEXT: vmov.16 q0[1], r2		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vmov.u16 r2, q1[5]		; CHECK-NEXT: vstrw.32 q1, [r1, #16]
; CHECK-NEXT: vmov.16 q0[2], r2
; CHECK-NEXT: vmov.u16 r2, q1[7]
; CHECK-NEXT: vmov.16 q0[3], r2
; CHECK-NEXT: vmov.u16 r2, q2[1]
; CHECK-NEXT: vmov.16 q0[4], r2
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vmov.16 q0[5], r2
; CHECK-NEXT: vmov.u16 r2, q2[5]
; CHECK-NEXT: vmov.16 q0[6], r2
; CHECK-NEXT: vmov.u16 r2, q2[7]
; CHECK-NEXT: vmov.16 q0[7], r2
; CHECK-NEXT: vmov.u16 r2, q1[0]
; CHECK-NEXT: vmov.16 q3[0], r2
; CHECK-NEXT: vmov.u16 r2, q1[2]
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[4]
; CHECK-NEXT: vmov.16 q3[2], r2
; CHECK-NEXT: vmov.u16 r2, q1[6]
; CHECK-NEXT: vmov.16 q3[3], r2
; CHECK-NEXT: vmov.u16 r2, q2[0]
; CHECK-NEXT: vmov.16 q3[4], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q3[5], r2
; CHECK-NEXT: vmov.u16 r2, q2[4]
; CHECK-NEXT: vmov.16 q3[6], r2
; CHECK-NEXT: vmov.u16 r2, q2[6]
; CHECK-NEXT: vmov.16 q3[7], r2
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]
; CHECK-NEXT: vadd.i16 q0, q3, q0
; CHECK-NEXT: vldrw.u32 q3, [r0]
; CHECK-NEXT: vmov.u16 r0, q2[0]
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.u16 r2, q3[0]
; CHECK-NEXT: vmov.16 q1[0], r2
; CHECK-NEXT: vmov.u16 r2, q3[2]
; CHECK-NEXT: vmov.16 q1[1], r2
; CHECK-NEXT: vmov.u16 r2, q3[4]
; CHECK-NEXT: vmov.16 q1[2], r2
; CHECK-NEXT: vmov.u16 r2, q3[6]
; CHECK-NEXT: vmov.16 q1[3], r2
; CHECK-NEXT: vmov.16 q1[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[2]
; CHECK-NEXT: vmov.16 q1[5], r0
; CHECK-NEXT: vmov.u16 r0, q2[4]
; CHECK-NEXT: vmov.16 q1[6], r0
; CHECK-NEXT: vmov.u16 r0, q3[1]
; CHECK-NEXT: vmov.16 q4[0], r0
; CHECK-NEXT: vmov.u16 r0, q3[3]
; CHECK-NEXT: vmov.16 q4[1], r0
; CHECK-NEXT: vmov.u16 r0, q3[5]
; CHECK-NEXT: vmov.16 q4[2], r0
; CHECK-NEXT: vmov.u16 r0, q3[7]
; CHECK-NEXT: vmov.16 q4[3], r0
; CHECK-NEXT: vmov.u16 r0, q2[1]
; CHECK-NEXT: vmov.16 q4[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[3]
; CHECK-NEXT: vmov.16 q4[5], r0
; CHECK-NEXT: vmov.u16 r0, q2[5]
; CHECK-NEXT: vmov.16 q4[6], r0
; CHECK-NEXT: vmov.u16 r0, q2[7]
; CHECK-NEXT: vmov.16 q4[7], r0
; CHECK-NEXT: vmov.u16 r0, q2[6]
; CHECK-NEXT: vmov.16 q1[7], r0
; CHECK-NEXT: vadd.i16 q1, q1, q4
; CHECK-NEXT: vstrw.32 q1, [r1]
; CHECK-NEXT: vpop {d8, d9}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <32 x i16>, <32 x i16>* %src, align 4		%l1 = load <32 x i16>, <32 x i16>* %src, align 4
%s1 = shufflevector <32 x i16> %l1, <32 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>		%s1 = shufflevector <32 x i16> %l1, <32 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>
%s2 = shufflevector <32 x i16> %l1, <32 x i16> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>		%s2 = shufflevector <32 x i16> %l1, <32 x i16> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>
%a = add <16 x i16> %s1, %s2		%a = add <16 x i16> %s1, %s2
store <16 x i16> %a, <16 x i16> *%dst		store <16 x i16> %a, <16 x i16> *%dst
ret void		ret void
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	entry:
%a = add <8 x i8> %s1, %s2		%a = add <8 x i8> %s1, %s2
store <8 x i8> %a, <8 x i8> *%dst		store <8 x i8> %a, <8 x i8> *%dst
ret void		ret void
}		}

define void @vld2_v16i8(<32 x i8> %src, <16 x i8> %dst) {		define void @vld2_v16i8(<32 x i8> %src, <16 x i8> %dst) {
; CHECK-LABEL: vld2_v16i8:		; CHECK-LABEL: vld2_v16i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vld20.8 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vld21.8 {q0, q1}, [r0]
; CHECK-NEXT: vmov.u8 r2, q1[1]		; CHECK-NEXT: vadd.i8 q0, q0, q1
; CHECK-NEXT: vmov.u8 r0, q2[1]
; CHECK-NEXT: vmov.8 q0[0], r2
; CHECK-NEXT: vmov.u8 r2, q1[3]
; CHECK-NEXT: vmov.8 q0[1], r2
; CHECK-NEXT: vmov.u8 r2, q1[5]
; CHECK-NEXT: vmov.8 q0[2], r2
; CHECK-NEXT: vmov.u8 r2, q1[7]
; CHECK-NEXT: vmov.8 q0[3], r2
; CHECK-NEXT: vmov.u8 r2, q1[9]
; CHECK-NEXT: vmov.8 q0[4], r2
; CHECK-NEXT: vmov.u8 r2, q1[11]
; CHECK-NEXT: vmov.8 q0[5], r2
; CHECK-NEXT: vmov.u8 r2, q1[13]
; CHECK-NEXT: vmov.8 q0[6], r2
; CHECK-NEXT: vmov.u8 r2, q1[15]
; CHECK-NEXT: vmov.8 q0[7], r2
; CHECK-NEXT: vmov.8 q0[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[3]
; CHECK-NEXT: vmov.8 q0[9], r0
; CHECK-NEXT: vmov.u8 r0, q2[5]
; CHECK-NEXT: vmov.8 q0[10], r0
; CHECK-NEXT: vmov.u8 r0, q2[7]
; CHECK-NEXT: vmov.8 q0[11], r0
; CHECK-NEXT: vmov.u8 r0, q2[9]
; CHECK-NEXT: vmov.8 q0[12], r0
; CHECK-NEXT: vmov.u8 r0, q2[11]
; CHECK-NEXT: vmov.8 q0[13], r0
; CHECK-NEXT: vmov.u8 r0, q2[13]
; CHECK-NEXT: vmov.8 q0[14], r0
; CHECK-NEXT: vmov.u8 r0, q2[15]
; CHECK-NEXT: vmov.8 q0[15], r0
; CHECK-NEXT: vmov.u8 r0, q1[0]
; CHECK-NEXT: vmov.8 q3[0], r0
; CHECK-NEXT: vmov.u8 r0, q1[2]
; CHECK-NEXT: vmov.8 q3[1], r0
; CHECK-NEXT: vmov.u8 r0, q1[4]
; CHECK-NEXT: vmov.8 q3[2], r0
; CHECK-NEXT: vmov.u8 r0, q1[6]
; CHECK-NEXT: vmov.8 q3[3], r0
; CHECK-NEXT: vmov.u8 r0, q1[8]
; CHECK-NEXT: vmov.8 q3[4], r0
; CHECK-NEXT: vmov.u8 r0, q1[10]
; CHECK-NEXT: vmov.8 q3[5], r0
; CHECK-NEXT: vmov.u8 r0, q1[12]
; CHECK-NEXT: vmov.8 q3[6], r0
; CHECK-NEXT: vmov.u8 r0, q1[14]
; CHECK-NEXT: vmov.8 q3[7], r0
; CHECK-NEXT: vmov.u8 r0, q2[0]
; CHECK-NEXT: vmov.8 q3[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[2]
; CHECK-NEXT: vmov.8 q3[9], r0
; CHECK-NEXT: vmov.u8 r0, q2[4]
; CHECK-NEXT: vmov.8 q3[10], r0
; CHECK-NEXT: vmov.u8 r0, q2[6]
; CHECK-NEXT: vmov.8 q3[11], r0
; CHECK-NEXT: vmov.u8 r0, q2[8]
; CHECK-NEXT: vmov.8 q3[12], r0
; CHECK-NEXT: vmov.u8 r0, q2[10]
; CHECK-NEXT: vmov.8 q3[13], r0
; CHECK-NEXT: vmov.u8 r0, q2[12]
; CHECK-NEXT: vmov.8 q3[14], r0
; CHECK-NEXT: vmov.u8 r0, q2[14]
; CHECK-NEXT: vmov.8 q3[15], r0
; CHECK-NEXT: vadd.i8 q0, q3, q0
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <32 x i8>, <32 x i8>* %src, align 4		%l1 = load <32 x i8>, <32 x i8>* %src, align 4
%s1 = shufflevector <32 x i8> %l1, <32 x i8> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>		%s1 = shufflevector <32 x i8> %l1, <32 x i8> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>
%s2 = shufflevector <32 x i8> %l1, <32 x i8> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>		%s2 = shufflevector <32 x i8> %l1, <32 x i8> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>
%a = add <16 x i8> %s1, %s2		%a = add <16 x i8> %s1, %s2
store <16 x i8> %a, <16 x i8> *%dst		store <16 x i8> %a, <16 x i8> *%dst
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	entry:
%a = fadd <2 x float> %s1, %s2		%a = fadd <2 x float> %s1, %s2
store <2 x float> %a, <2 x float> *%dst		store <2 x float> %a, <2 x float> *%dst
ret void		ret void
}		}

define void @vld2_v4f32(<8 x float> %src, <4 x float> %dst) {		define void @vld2_v4f32(<8 x float> %src, <4 x float> %dst) {
; CHECK-LABEL: vld2_v4f32:		; CHECK-LABEL: vld2_v4f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vld20.32 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]		; CHECK-NEXT: vld21.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s8, s5		; CHECK-NEXT: vadd.f32 q0, q0, q1
; CHECK-NEXT: vmov.f32 s9, s7
; CHECK-NEXT: vmov.f32 s5, s6
; CHECK-NEXT: vmov.f32 s10, s1
; CHECK-NEXT: vmov.f32 s6, s0
; CHECK-NEXT: vmov.f32 s11, s3
; CHECK-NEXT: vmov.f32 s7, s2
; CHECK-NEXT: vadd.f32 q0, q1, q2
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <8 x float>, <8 x float>* %src, align 4		%l1 = load <8 x float>, <8 x float>* %src, align 4
%s1 = shufflevector <8 x float> %l1, <8 x float> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>		%s1 = shufflevector <8 x float> %l1, <8 x float> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%s2 = shufflevector <8 x float> %l1, <8 x float> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>		%s2 = shufflevector <8 x float> %l1, <8 x float> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
%a = fadd <4 x float> %s1, %s2		%a = fadd <4 x float> %s1, %s2
store <4 x float> %a, <4 x float> *%dst		store <4 x float> %a, <4 x float> *%dst
ret void		ret void
}		}

define void @vld2_v8f32(<16 x float> %src, <8 x float> %dst) {		define void @vld2_v8f32(<16 x float> %src, <8 x float> %dst) {
; CHECK-LABEL: vld2_v8f32:		; CHECK-LABEL: vld2_v8f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q0, [r0, #32]		; CHECK-NEXT: add.w r2, r0, #32
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: vld20.32 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0]		; CHECK-NEXT: vld20.32 {q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s12, s1		; CHECK-NEXT: vld21.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s13, s3		; CHECK-NEXT: vld21.32 {q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s1, s2		; CHECK-NEXT: vadd.f32 q0, q0, q1
; CHECK-NEXT: vmov.f32 s14, s5
; CHECK-NEXT: vmov.f32 s2, s4
; CHECK-NEXT: vmov.f32 s15, s7
; CHECK-NEXT: vmov.f32 s3, s6
; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vadd.f32 q0, q0, q3
; CHECK-NEXT: vmov.f32 s12, s9
; CHECK-NEXT: vmov.f32 s13, s11
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.f32 s9, s10
; CHECK-NEXT: vmov.f32 s14, s5
; CHECK-NEXT: vmov.f32 s10, s4
; CHECK-NEXT: vmov.f32 s15, s7
; CHECK-NEXT: vmov.f32 s11, s6
; CHECK-NEXT: vadd.f32 q1, q2, q3		; CHECK-NEXT: vadd.f32 q1, q2, q3
; CHECK-NEXT: vstrw.32 q1, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
		; CHECK-NEXT: vstrw.32 q1, [r1, #16]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <16 x float>, <16 x float>* %src, align 4		%l1 = load <16 x float>, <16 x float>* %src, align 4
%s1 = shufflevector <16 x float> %l1, <16 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		%s1 = shufflevector <16 x float> %l1, <16 x float> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
%s2 = shufflevector <16 x float> %l1, <16 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%s2 = shufflevector <16 x float> %l1, <16 x float> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%a = fadd <8 x float> %s1, %s2		%a = fadd <8 x float> %s1, %s2
store <8 x float> %a, <8 x float> *%dst		store <8 x float> %a, <8 x float> *%dst
ret void		ret void
}		}

define void @vld2_v16f32(<32 x float> %src, <16 x float> %dst) {		define void @vld2_v16f32(<32 x float> %src, <16 x float> %dst) {
; CHECK-LABEL: vld2_v16f32:		; CHECK-LABEL: vld2_v16f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: vpush {d8, d9, d10, d11}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vld20.32 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]		; CHECK-NEXT: add.w r12, r0, #96
; CHECK-NEXT: vldrw.u32 q4, [r0, #96]		; CHECK-NEXT: add.w r3, r0, #32
; CHECK-NEXT: vmov.f32 s8, s5		; CHECK-NEXT: add.w r2, r0, #64
; CHECK-NEXT: vmov.f32 s9, s7		; CHECK-NEXT: vld21.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s5, s6		; CHECK-NEXT: vadd.f32 q0, q0, q1
; CHECK-NEXT: vmov.f32 s10, s1		; CHECK-NEXT: vld20.32 {q1, q2}, [r2]
; CHECK-NEXT: vmov.f32 s6, s0		; CHECK-NEXT: vld20.32 {q3, q4}, [r12]
; CHECK-NEXT: vmov.f32 s11, s3		; CHECK-NEXT: vld20.32 {q5, q6}, [r3]
; CHECK-NEXT: vmov.f32 s7, s2		; CHECK-NEXT: vld21.32 {q5, q6}, [r3]
; CHECK-NEXT: vadd.f32 q0, q1, q2		; CHECK-NEXT: vld21.32 {q1, q2}, [r2]
; CHECK-NEXT: vldrw.u32 q2, [r0, #32]		; CHECK-NEXT: vld21.32 {q3, q4}, [r12]
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: @ kill: def $q1 killed $q1 killed $q1_q2
; CHECK-NEXT: vmov.f32 s12, s9		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vmov.f32 s13, s11		; CHECK-NEXT: vadd.f32 q5, q5, q6
; CHECK-NEXT: vmov.f32 s9, s10		; CHECK-NEXT: vadd.f32 q1, q1, q2
; CHECK-NEXT: vmov.f32 s14, s5		; CHECK-NEXT: vadd.f32 q3, q3, q4
; CHECK-NEXT: vmov.f32 s10, s4		; CHECK-NEXT: vstrw.32 q1, [r1, #32]
; CHECK-NEXT: vmov.f32 s15, s7
; CHECK-NEXT: vmov.f32 s11, s6
; CHECK-NEXT: vadd.f32 q1, q2, q3
; CHECK-NEXT: vldrw.u32 q2, [r0, #64]
; CHECK-NEXT: vldrw.u32 q3, [r0, #80]
; CHECK-NEXT: vmov.f32 s20, s9
; CHECK-NEXT: vmov.f32 s21, s11
; CHECK-NEXT: vmov.f32 s9, s10
; CHECK-NEXT: vmov.f32 s22, s13
; CHECK-NEXT: vmov.f32 s10, s12
; CHECK-NEXT: vmov.f32 s23, s15
; CHECK-NEXT: vmov.f32 s11, s14
; CHECK-NEXT: vldrw.u32 q3, [r0, #112]
; CHECK-NEXT: vadd.f32 q2, q2, q5
; CHECK-NEXT: vmov.f32 s20, s17
; CHECK-NEXT: vmov.f32 s21, s19
; CHECK-NEXT: vstrw.32 q2, [r1, #32]
; CHECK-NEXT: vmov.f32 s17, s18
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vmov.f32 s22, s13
; CHECK-NEXT: vstrw.32 q1, [r1, #16]
; CHECK-NEXT: vmov.f32 s18, s12
; CHECK-NEXT: vmov.f32 s23, s15
; CHECK-NEXT: vmov.f32 s19, s14
; CHECK-NEXT: vadd.f32 q3, q4, q5
; CHECK-NEXT: vstrw.32 q3, [r1, #48]		; CHECK-NEXT: vstrw.32 q3, [r1, #48]
; CHECK-NEXT: vpop {d8, d9, d10, d11}		; CHECK-NEXT: vstrw.32 q5, [r1, #16]
		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <32 x float>, <32 x float>* %src, align 4		%l1 = load <32 x float>, <32 x float>* %src, align 4
%s1 = shufflevector <32 x float> %l1, <32 x float> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>		%s1 = shufflevector <32 x float> %l1, <32 x float> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>
%s2 = shufflevector <32 x float> %l1, <32 x float> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>		%s2 = shufflevector <32 x float> %l1, <32 x float> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>
%a = fadd <16 x float> %s1, %s2		%a = fadd <16 x float> %s1, %s2
store <16 x float> %a, <16 x float> *%dst		store <16 x float> %a, <16 x float> *%dst
ret void		ret void
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	entry:
%a = fadd <4 x half> %s1, %s2		%a = fadd <4 x half> %s1, %s2
store <4 x half> %a, <4 x half> *%dst		store <4 x half> %a, <4 x half> *%dst
ret void		ret void
}		}

define void @vld2_v8f16(<16 x half> %src, <8 x half> %dst) {		define void @vld2_v8f16(<16 x half> %src, <8 x half> %dst) {
; CHECK-LABEL: vld2_v8f16:		; CHECK-LABEL: vld2_v8f16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8}		; CHECK-NEXT: vld20.16 {q0, q1}, [r0]
; CHECK-NEXT: vpush {d8}		; CHECK-NEXT: vld21.16 {q0, q1}, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0]		; CHECK-NEXT: vadd.f16 q0, q0, q1
; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vmov r2, s8
; CHECK-NEXT: vmovx.f16 s12, s8
; CHECK-NEXT: vmov.16 q0[0], r2
; CHECK-NEXT: vmov r3, s9
; CHECK-NEXT: vmov.16 q0[1], r3
; CHECK-NEXT: vmov r2, s10
; CHECK-NEXT: vmov.16 q0[2], r2
; CHECK-NEXT: vmov r2, s11
; CHECK-NEXT: vmov.16 q0[3], r2
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov r0, s5
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vmov r0, s6
; CHECK-NEXT: vmov.16 q0[6], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s9
; CHECK-NEXT: vmovx.f16 s16, s10
; CHECK-NEXT: vmov r2, s12
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s8, s11
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmovx.f16 s8, s4
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmovx.f16 s8, s5
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmovx.f16 s8, s6
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmovx.f16 s8, s7
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vmov r0, s7
; CHECK-NEXT: vmov.16 q0[7], r0
; CHECK-NEXT: vadd.f16 q0, q0, q3
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vpop {d8}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <16 x half>, <16 x half>* %src, align 4		%l1 = load <16 x half>, <16 x half>* %src, align 4
%s1 = shufflevector <16 x half> %l1, <16 x half> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		%s1 = shufflevector <16 x half> %l1, <16 x half> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
%s2 = shufflevector <16 x half> %l1, <16 x half> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%s2 = shufflevector <16 x half> %l1, <16 x half> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%a = fadd <8 x half> %s1, %s2		%a = fadd <8 x half> %s1, %s2
store <8 x half> %a, <8 x half> *%dst		store <8 x half> %a, <8 x half> *%dst
ret void		ret void
}		}

define void @vld2_v16f16(<32 x half> %src, <16 x half> %dst) {		define void @vld2_v16f16(<32 x half> %src, <16 x half> %dst) {
; CHECK-LABEL: vld2_v16f16:		; CHECK-LABEL: vld2_v16f16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8}		; CHECK-NEXT: add.w r2, r0, #32
; CHECK-NEXT: vpush {d8}		; CHECK-NEXT: vld20.16 {q0, q1}, [r2]
; CHECK-NEXT: vldrw.u32 q2, [r0, #32]		; CHECK-NEXT: vld21.16 {q0, q1}, [r2]
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: vadd.f16 q0, q0, q1
; CHECK-NEXT: vmov r2, s8		; CHECK-NEXT: vld20.16 {q1, q2}, [r0]
; CHECK-NEXT: vmovx.f16 s12, s8		; CHECK-NEXT: vld21.16 {q1, q2}, [r0]
; CHECK-NEXT: vmov.16 q0[0], r2		; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov r3, s9		; CHECK-NEXT: vadd.f16 q0, q1, q2
; CHECK-NEXT: vmov.16 q0[1], r3
; CHECK-NEXT: vmov r2, s10
; CHECK-NEXT: vmov.16 q0[2], r2
; CHECK-NEXT: vmov r2, s11
; CHECK-NEXT: vmov.16 q0[3], r2
; CHECK-NEXT: vmov r2, s4
; CHECK-NEXT: vmov.16 q0[4], r2
; CHECK-NEXT: vmov r2, s5
; CHECK-NEXT: vmov.16 q0[5], r2
; CHECK-NEXT: vmov r2, s6
; CHECK-NEXT: vmov.16 q0[6], r2
; CHECK-NEXT: vmov r2, s12
; CHECK-NEXT: vmovx.f16 s12, s9
; CHECK-NEXT: vmovx.f16 s16, s10
; CHECK-NEXT: vmov r3, s12
; CHECK-NEXT: vmov.16 q3[0], r2
; CHECK-NEXT: vmov.16 q3[1], r3
; CHECK-NEXT: vmov r2, s16
; CHECK-NEXT: vmovx.f16 s8, s11
; CHECK-NEXT: vmov.16 q3[2], r2
; CHECK-NEXT: vmov r2, s8
; CHECK-NEXT: vmovx.f16 s8, s4
; CHECK-NEXT: vmov.16 q3[3], r2
; CHECK-NEXT: vmov r2, s8
; CHECK-NEXT: vmovx.f16 s8, s5
; CHECK-NEXT: vmov.16 q3[4], r2
; CHECK-NEXT: vmov r2, s8
; CHECK-NEXT: vmovx.f16 s8, s6
; CHECK-NEXT: vmov.16 q3[5], r2
; CHECK-NEXT: vmov r2, s8
; CHECK-NEXT: vmovx.f16 s8, s7
; CHECK-NEXT: vmov.16 q3[6], r2
; CHECK-NEXT: vmov r2, s8
; CHECK-NEXT: vldrw.u32 q2, [r0]
; CHECK-NEXT: vmov.16 q3[7], r2
; CHECK-NEXT: vmov r2, s7
; CHECK-NEXT: vmov.16 q0[7], r2
; CHECK-NEXT: vadd.f16 q1, q0, q3
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]
; CHECK-NEXT: vstrw.32 q1, [r1, #16]
; CHECK-NEXT: vmovx.f16 s4, s8
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmovx.f16 s4, s9
; CHECK-NEXT: vmov r2, s4
; CHECK-NEXT: vmov.16 q1[0], r0
; CHECK-NEXT: vmovx.f16 s12, s10
; CHECK-NEXT: vmov.16 q1[1], r2
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s11
; CHECK-NEXT: vmov.16 q1[2], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s0
; CHECK-NEXT: vmov.16 q1[3], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s1
; CHECK-NEXT: vmov.16 q1[4], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s2
; CHECK-NEXT: vmov.16 q1[5], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s3
; CHECK-NEXT: vmov.16 q1[6], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmov.16 q1[7], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov r2, s9
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r0, s10
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov r0, s11
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s1
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s2
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmov r0, s3
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vadd.f16 q0, q3, q1
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vpop {d8}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <32 x half>, <32 x half>* %src, align 4		%l1 = load <32 x half>, <32 x half>* %src, align 4
%s1 = shufflevector <32 x half> %l1, <32 x half> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>		%s1 = shufflevector <32 x half> %l1, <32 x half> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22, i32 24, i32 26, i32 28, i32 30>
%s2 = shufflevector <32 x half> %l1, <32 x half> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>		%s2 = shufflevector <32 x half> %l1, <32 x half> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23, i32 25, i32 27, i32 29, i32 31>
%a = fadd <16 x half> %s1, %s2		%a = fadd <16 x half> %s1, %s2
store <16 x half> %a, <16 x half> *%dst		store <16 x half> %a, <16 x half> *%dst
ret void		ret void
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vld4.ll

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	entry:
%a3 = add <2 x i32> %a1, %a2		%a3 = add <2 x i32> %a1, %a2
store <2 x i32> %a3, <2 x i32> *%dst		store <2 x i32> %a3, <2 x i32> *%dst
ret void		ret void
}		}

define void @vld4_v4i32(<16 x i32> %src, <4 x i32> %dst) {		define void @vld4_v4i32(<16 x i32> %src, <4 x i32> %dst) {
; CHECK-LABEL: vld4_v4i32:		; CHECK-LABEL: vld4_v4i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11}		; CHECK-NEXT: .vsave {d8, d9}
; CHECK-NEXT: vpush {d8, d9, d10, d11}		; CHECK-NEXT: vpush {d8, d9}
; CHECK-NEXT: vldrw.u32 q0, [r0]		; CHECK-NEXT: vld40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]		; CHECK-NEXT: vld41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: vld42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vld43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s18, s15		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
; CHECK-NEXT: vmov.f64 d10, d1		; CHECK-NEXT: vadd.i32 q4, q2, q3
; CHECK-NEXT: vmov.f32 s19, s7		; CHECK-NEXT: vadd.i32 q0, q0, q1
; CHECK-NEXT: vmov.f32 s21, s10
; CHECK-NEXT: vmov.f32 s16, s3
; CHECK-NEXT: vmov.f32 s15, s6
; CHECK-NEXT: vmov.f32 s22, s14
; CHECK-NEXT: vmov.f32 s17, s11
; CHECK-NEXT: vmov.f32 s23, s6
; CHECK-NEXT: vadd.i32 q4, q5, q4
; CHECK-NEXT: vmov.f32 s22, s13
; CHECK-NEXT: vmov.f32 s23, s5
; CHECK-NEXT: vmov.f32 s20, s1
; CHECK-NEXT: vmov.f32 s2, s12
; CHECK-NEXT: vmov.f32 s3, s4
; CHECK-NEXT: vmov.f32 s21, s9
; CHECK-NEXT: vmov.f32 s1, s8
; CHECK-NEXT: vadd.i32 q0, q0, q5
; CHECK-NEXT: vadd.i32 q0, q0, q4		; CHECK-NEXT: vadd.i32 q0, q0, q4
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vpop {d8, d9, d10, d11}		; CHECK-NEXT: vpop {d8, d9}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <16 x i32>, <16 x i32>* %src, align 4		%l1 = load <16 x i32>, <16 x i32>* %src, align 4
%s1 = shufflevector <16 x i32> %l1, <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>		%s1 = shufflevector <16 x i32> %l1, <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
%s2 = shufflevector <16 x i32> %l1, <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>		%s2 = shufflevector <16 x i32> %l1, <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>
%s3 = shufflevector <16 x i32> %l1, <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>		%s3 = shufflevector <16 x i32> %l1, <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>
%s4 = shufflevector <16 x i32> %l1, <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>		%s4 = shufflevector <16 x i32> %l1, <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>
%a1 = add <4 x i32> %s1, %s2		%a1 = add <4 x i32> %s1, %s2
%a2 = add <4 x i32> %s3, %s4		%a2 = add <4 x i32> %s3, %s4
%a3 = add <4 x i32> %a1, %a2		%a3 = add <4 x i32> %a1, %a2
store <4 x i32> %a3, <4 x i32> *%dst		store <4 x i32> %a3, <4 x i32> *%dst
ret void		ret void
}		}

define void @vld4_v8i32(<32 x i32> %src, <8 x i32> %dst) {		define void @vld4_v8i32(<32 x i32> %src, <8 x i32> %dst) {
; CHECK-LABEL: vld4_v8i32:		; CHECK-LABEL: vld4_v8i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13}		; CHECK-NEXT: .save {r4, r5}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13}		; CHECK-NEXT: push {r4, r5}
; CHECK-NEXT: vldrw.u32 q0, [r0, #64]		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vldrw.u32 q3, [r0, #96]		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vldrw.u32 q1, [r0, #112]		; CHECK-NEXT: .pad #88
; CHECK-NEXT: vldrw.u32 q2, [r0, #80]		; CHECK-NEXT: sub sp, #88
; CHECK-NEXT: vmov.f32 s18, s15		; CHECK-NEXT: add.w r2, r0, #64
; CHECK-NEXT: vmov.f64 d10, d1		; CHECK-NEXT: vld40.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s19, s7		; CHECK-NEXT: vld41.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s21, s10		; CHECK-NEXT: vstmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vmov.f32 s16, s3		; CHECK-NEXT: vld40.32 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.f32 s15, s6		; CHECK-NEXT: vld41.32 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.f32 s22, s14		; CHECK-NEXT: vld42.32 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.f32 s17, s11		; CHECK-NEXT: vld43.32 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.f32 s23, s6		; CHECK-NEXT: vstrw.32 q5, [sp, #64] @ 16-byte Spill
		; CHECK-NEXT: vmov q1, q4
		; CHECK-NEXT: vldrw.u32 q0, [sp, #64] @ 16-byte Reload
		; CHECK-NEXT: vadd.i32 q4, q6, q7
		; CHECK-NEXT: vadd.i32 q5, q1, q0
		; CHECK-NEXT: vldmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vadd.i32 q4, q5, q4		; CHECK-NEXT: vadd.i32 q4, q5, q4
; CHECK-NEXT: vmov.f32 s22, s13		; CHECK-NEXT: vld42.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s23, s5		; CHECK-NEXT: vld43.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s20, s1		; CHECK-NEXT: vstrw.32 q4, [r1]
; CHECK-NEXT: vmov.f32 s2, s12		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
; CHECK-NEXT: vldrw.u32 q3, [r0, #16]		; CHECK-NEXT: vadd.i32 q5, q2, q3
; CHECK-NEXT: vmov.f32 s3, s4		; CHECK-NEXT: vadd.i32 q0, q0, q1
; CHECK-NEXT: vldrw.u32 q1, [r0]
; CHECK-NEXT: vmov.f32 s21, s9
; CHECK-NEXT: vmov.f32 s1, s8
; CHECK-NEXT: vldrw.u32 q2, [r0, #48]
; CHECK-NEXT: vadd.i32 q0, q0, q5		; CHECK-NEXT: vadd.i32 q0, q0, q5
; CHECK-NEXT: vmov.f64 d12, d3
; CHECK-NEXT: vadd.i32 q0, q0, q4
; CHECK-NEXT: vldrw.u32 q4, [r0, #32]
; CHECK-NEXT: vstrw.32 q0, [r1, #16]		; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.f32 s22, s19		; CHECK-NEXT: add sp, #88
; CHECK-NEXT: vmov.f32 s23, s11		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vmov.f32 s25, s14		; CHECK-NEXT: pop {r4, r5}
; CHECK-NEXT: vmov.f32 s20, s7
; CHECK-NEXT: vmov.f32 s19, s10
; CHECK-NEXT: vmov.f32 s26, s18
; CHECK-NEXT: vmov.f32 s21, s15
; CHECK-NEXT: vmov.f32 s27, s10
; CHECK-NEXT: vadd.i32 q5, q6, q5
; CHECK-NEXT: vmov.f32 s26, s17
; CHECK-NEXT: vmov.f32 s27, s9
; CHECK-NEXT: vmov.f32 s24, s5
; CHECK-NEXT: vmov.f32 s6, s16
; CHECK-NEXT: vmov.f32 s7, s8
; CHECK-NEXT: vmov.f32 s25, s13
; CHECK-NEXT: vmov.f32 s5, s12
; CHECK-NEXT: vadd.i32 q1, q1, q6
; CHECK-NEXT: vadd.i32 q1, q1, q5
; CHECK-NEXT: vstrw.32 q1, [r1]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <32 x i32>, <32 x i32>* %src, align 4		%l1 = load <32 x i32>, <32 x i32>* %src, align 4
%s1 = shufflevector <32 x i32> %l1, <32 x i32> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>		%s1 = shufflevector <32 x i32> %l1, <32 x i32> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>
%s2 = shufflevector <32 x i32> %l1, <32 x i32> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>		%s2 = shufflevector <32 x i32> %l1, <32 x i32> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>
%s3 = shufflevector <32 x i32> %l1, <32 x i32> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>		%s3 = shufflevector <32 x i32> %l1, <32 x i32> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>
%s4 = shufflevector <32 x i32> %l1, <32 x i32> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>		%s4 = shufflevector <32 x i32> %l1, <32 x i32> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>
%a1 = add <8 x i32> %s1, %s2		%a1 = add <8 x i32> %s1, %s2
%a2 = add <8 x i32> %s3, %s4		%a2 = add <8 x i32> %s3, %s4
%a3 = add <8 x i32> %a1, %a2		%a3 = add <8 x i32> %a1, %a2
store <8 x i32> %a3, <8 x i32> *%dst		store <8 x i32> %a3, <8 x i32> *%dst
ret void		ret void
}		}

define void @vld4_v16i32(<64 x i32> %src, <16 x i32> %dst) {		define void @vld4_v16i32(<64 x i32> %src, <16 x i32> %dst) {
; CHECK-LABEL: vld4_v16i32:		; CHECK-LABEL: vld4_v16i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: .save {r4, r5}
		; CHECK-NEXT: push {r4, r5}
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #16		; CHECK-NEXT: .pad #152
; CHECK-NEXT: sub sp, #16		; CHECK-NEXT: sub sp, #152
; CHECK-NEXT: vldrw.u32 q0, [r0]		; CHECK-NEXT: add.w r2, r0, #128
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]		; CHECK-NEXT: add r3, sp, #64
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: vld40.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: add r4, sp, #64
; CHECK-NEXT: vmov.f32 s18, s15		; CHECK-NEXT: vld41.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f64 d10, d1		; CHECK-NEXT: vstmia r3, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vmov.f32 s19, s7		; CHECK-NEXT: add.w r3, r0, #64
; CHECK-NEXT: vmov.f32 s21, s10		; CHECK-NEXT: vld40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s16, s3		; CHECK-NEXT: vld41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s15, s6		; CHECK-NEXT: vld42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s22, s14		; CHECK-NEXT: vld43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s17, s11		; CHECK-NEXT: adds r0, #192
; CHECK-NEXT: vmov.f32 s23, s6		; CHECK-NEXT: vstrw.32 q1, [sp, #32] @ 16-byte Spill
; CHECK-NEXT: vadd.i32 q4, q5, q4		; CHECK-NEXT: vadd.i32 q4, q2, q3
; CHECK-NEXT: vmov.f32 s22, s13		; CHECK-NEXT: vmov q5, q0
; CHECK-NEXT: vmov.f32 s23, s5		; CHECK-NEXT: vldrw.u32 q0, [sp, #32] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s20, s1		; CHECK-NEXT: vstrw.32 q4, [sp, #48] @ 16-byte Spill
; CHECK-NEXT: vmov.f32 s2, s12		; CHECK-NEXT: vadd.i32 q4, q5, q0
; CHECK-NEXT: vldrw.u32 q3, [r0, #80]		; CHECK-NEXT: vldmia r4, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vmov.f32 s3, s4		; CHECK-NEXT: vstrw.32 q4, [sp, #32] @ 16-byte Spill
; CHECK-NEXT: vldrw.u32 q1, [r0, #64]		; CHECK-NEXT: add r4, sp, #64
; CHECK-NEXT: vmov.f32 s21, s9		; CHECK-NEXT: vld42.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s1, s8		; CHECK-NEXT: vstmia r4, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vldrw.u32 q2, [r0, #112]		; CHECK-NEXT: vld40.32 {q4, q5, q6, q7}, [r3]
; CHECK-NEXT: vadd.i32 q0, q0, q5		; CHECK-NEXT: vld41.32 {q4, q5, q6, q7}, [r3]
; CHECK-NEXT: vadd.i32 q0, q0, q4		; CHECK-NEXT: vld42.32 {q4, q5, q6, q7}, [r3]
; CHECK-NEXT: vldrw.u32 q4, [r0, #96]		; CHECK-NEXT: vld43.32 {q4, q5, q6, q7}, [r3]
; CHECK-NEXT: vstrw.32 q0, [sp] @ 16-byte Spill		; CHECK-NEXT: vldrw.u32 q0, [sp, #48] @ 16-byte Reload
; CHECK-NEXT: vmov.f64 d0, d3		; CHECK-NEXT: vldrw.u32 q1, [sp, #32] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s22, s19		; CHECK-NEXT: add r3, sp, #64
; CHECK-NEXT: vmov.f32 s19, s10		; CHECK-NEXT: vstrw.32 q6, [sp, #16] @ 16-byte Spill
; CHECK-NEXT: vmov.f32 s26, s17		; CHECK-NEXT: vadd.i32 q4, q4, q5
; CHECK-NEXT: vmov.f32 s23, s11		; CHECK-NEXT: vadd.i32 q0, q1, q0
; CHECK-NEXT: vmov.f32 s27, s9		; CHECK-NEXT: vstrw.32 q0, [sp, #48] @ 16-byte Spill
; CHECK-NEXT: vmov.f32 s20, s7		; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s24, s5
; CHECK-NEXT: vmov.f32 s1, s14
; CHECK-NEXT: vmov.f32 s6, s16
; CHECK-NEXT: vmov.f32 s2, s18
; CHECK-NEXT: vldrw.u32 q4, [r0, #144]
; CHECK-NEXT: vmov.f32 s7, s8
; CHECK-NEXT: vmov.f32 s3, s10
; CHECK-NEXT: vldrw.u32 q2, [r0, #128]
; CHECK-NEXT: vmov.f32 s21, s15
; CHECK-NEXT: vmov.f32 s25, s13
; CHECK-NEXT: vadd.i32 q5, q0, q5
; CHECK-NEXT: vmov.f32 s5, s12
; CHECK-NEXT: vldrw.u32 q3, [r0, #176]
; CHECK-NEXT: vadd.i32 q0, q1, q6
; CHECK-NEXT: vadd.i32 q1, q0, q5
; CHECK-NEXT: vldrw.u32 q5, [r0, #160]
; CHECK-NEXT: vmov.f64 d0, d5
; CHECK-NEXT: vmov.f32 s26, s23
; CHECK-NEXT: vmov.f32 s23, s14
; CHECK-NEXT: vmov.f32 s30, s21
; CHECK-NEXT: vmov.f32 s27, s15
; CHECK-NEXT: vmov.f32 s31, s13
; CHECK-NEXT: vmov.f32 s24, s11
; CHECK-NEXT: vmov.f32 s28, s9
; CHECK-NEXT: vmov.f32 s1, s18
; CHECK-NEXT: vmov.f32 s10, s20
; CHECK-NEXT: vmov.f32 s2, s22
; CHECK-NEXT: vmov.f32 s11, s12
; CHECK-NEXT: vmov.f32 s3, s14
; CHECK-NEXT: vldrw.u32 q3, [r0, #192]
; CHECK-NEXT: vmov.f32 s25, s19
; CHECK-NEXT: vmov.f32 s29, s17
; CHECK-NEXT: vadd.i32 q6, q0, q6
; CHECK-NEXT: vmov.f32 s9, s16
; CHECK-NEXT: vldrw.u32 q4, [r0, #240]
; CHECK-NEXT: vadd.i32 q0, q2, q7
; CHECK-NEXT: vmov.f64 d10, d7
; CHECK-NEXT: vadd.i32 q2, q0, q6
; CHECK-NEXT: vldrw.u32 q6, [r0, #224]
; CHECK-NEXT: vldrw.u32 q0, [r0, #208]
; CHECK-NEXT: vstrw.32 q2, [r1, #32]
; CHECK-NEXT: vstrw.32 q1, [r1, #16]
; CHECK-NEXT: vmov.f32 s30, s27
; CHECK-NEXT: vmov.f32 s31, s19
; CHECK-NEXT: vmov.f32 s21, s2
; CHECK-NEXT: vmov.f32 s28, s15
; CHECK-NEXT: vmov.f32 s27, s18
; CHECK-NEXT: vmov.f32 s22, s26
; CHECK-NEXT: vmov.f32 s29, s3
; CHECK-NEXT: vmov.f32 s23, s18
; CHECK-NEXT: vadd.i32 q7, q5, q7
; CHECK-NEXT: vmov.f32 s22, s25
; CHECK-NEXT: vmov.f32 s23, s17
; CHECK-NEXT: vmov.f32 s20, s13
; CHECK-NEXT: vmov.f32 s14, s24
; CHECK-NEXT: vmov.f32 s15, s16
; CHECK-NEXT: vmov.f32 s21, s1
; CHECK-NEXT: vmov.f32 s13, s0
; CHECK-NEXT: vadd.i32 q0, q3, q5
; CHECK-NEXT: vadd.i32 q0, q0, q7		; CHECK-NEXT: vadd.i32 q0, q0, q7
; CHECK-NEXT: vstrw.32 q0, [r1, #48]		; CHECK-NEXT: vadd.i32 q0, q4, q0
; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload		; CHECK-NEXT: vstrw.32 q0, [sp, #32] @ 16-byte Spill
		; CHECK-NEXT: vldmia r3, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
		; CHECK-NEXT: vld43.32 {q0, q1, q2, q3}, [r2]
		; CHECK-NEXT: add r2, sp, #64
		; CHECK-NEXT: vstmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
		; CHECK-NEXT: vld40.32 {q4, q5, q6, q7}, [r0]
		; CHECK-NEXT: vld41.32 {q4, q5, q6, q7}, [r0]
		; CHECK-NEXT: vld42.32 {q4, q5, q6, q7}, [r0]
		; CHECK-NEXT: vld43.32 {q4, q5, q6, q7}, [r0]
		; CHECK-NEXT: add r0, sp, #64
		; CHECK-NEXT: @ kill: def $q4 killed $q4 killed $q4_q5_q6_q7
		; CHECK-NEXT: vstrw.32 q7, [sp, #16] @ 16-byte Spill
		; CHECK-NEXT: vmov q2, q5
		; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
		; CHECK-NEXT: vadd.i32 q4, q4, q2
		; CHECK-NEXT: vadd.i32 q5, q6, q0
		; CHECK-NEXT: vldmia r0, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
		; CHECK-NEXT: vadd.i32 q4, q4, q5
		; CHECK-NEXT: vadd.i32 q5, q2, q3
		; CHECK-NEXT: vadd.i32 q0, q0, q1
		; CHECK-NEXT: vstrw.32 q4, [r1, #48]
		; CHECK-NEXT: vadd.i32 q0, q0, q5
		; CHECK-NEXT: vstrw.32 q0, [r1, #32]
		; CHECK-NEXT: vldrw.u32 q0, [sp, #32] @ 16-byte Reload
		; CHECK-NEXT: vstrw.32 q0, [r1, #16]
		; CHECK-NEXT: vldrw.u32 q0, [sp, #48] @ 16-byte Reload
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: add sp, #16		; CHECK-NEXT: add sp, #152
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
		; CHECK-NEXT: pop {r4, r5}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <64 x i32>, <64 x i32>* %src, align 4		%l1 = load <64 x i32>, <64 x i32>* %src, align 4
%s1 = shufflevector <64 x i32> %l1, <64 x i32> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>		%s1 = shufflevector <64 x i32> %l1, <64 x i32> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>
%s2 = shufflevector <64 x i32> %l1, <64 x i32> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>		%s2 = shufflevector <64 x i32> %l1, <64 x i32> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>
%s3 = shufflevector <64 x i32> %l1, <64 x i32> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>		%s3 = shufflevector <64 x i32> %l1, <64 x i32> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>
%s4 = shufflevector <64 x i32> %l1, <64 x i32> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>		%s4 = shufflevector <64 x i32> %l1, <64 x i32> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>
%a1 = add <16 x i32> %s1, %s2		%a1 = add <16 x i32> %s1, %s2
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	entry:
%a3 = add <4 x i16> %a1, %a2		%a3 = add <4 x i16> %a1, %a2
store <4 x i16> %a3, <4 x i16> *%dst		store <4 x i16> %a3, <4 x i16> *%dst
ret void		ret void
}		}

define void @vld4_v8i16(<32 x i16> %src, <8 x i16> %dst) {		define void @vld4_v8i16(<32 x i16> %src, <8 x i16> %dst) {
; CHECK-LABEL: vld4_v8i16:		; CHECK-LABEL: vld4_v8i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9}
; CHECK-NEXT: vldrw.u32 q0, [r0]		; CHECK-NEXT: vld40.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q1, [r0, #16]		; CHECK-NEXT: vld41.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q3, [r0, #48]		; CHECK-NEXT: vld42.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.u16 r2, q0[3]		; CHECK-NEXT: vld43.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.16 q2[0], r2		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
; CHECK-NEXT: vmov.u16 r2, q0[7]		; CHECK-NEXT: vadd.i16 q4, q2, q3
; CHECK-NEXT: vmov.16 q2[1], r2		; CHECK-NEXT: vadd.i16 q0, q0, q1
; CHECK-NEXT: vmov.u16 r2, q1[3]
; CHECK-NEXT: vmov.16 q2[2], r2
; CHECK-NEXT: vmov.u16 r2, q1[7]
; CHECK-NEXT: vmov.16 q2[3], r2
; CHECK-NEXT: vmov.u16 r2, q2[0]
; CHECK-NEXT: vmov.16 q4[0], r2
; CHECK-NEXT: vmov.u16 r2, q2[1]
; CHECK-NEXT: vmov.16 q4[1], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q4[2], r2
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vldrw.u32 q2, [r0, #32]
; CHECK-NEXT: vmov.16 q4[3], r2
; CHECK-NEXT: vmov.u16 r0, q3[3]
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vmov.16 q5[4], r2
; CHECK-NEXT: vmov.u16 r2, q2[7]
; CHECK-NEXT: vmov.16 q5[5], r2
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov.u16 r0, q3[7]
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vmov.u16 r0, q5[4]
; CHECK-NEXT: vmov.16 q4[4], r0
; CHECK-NEXT: vmov.u16 r0, q5[5]
; CHECK-NEXT: vmov.16 q4[5], r0
; CHECK-NEXT: vmov.u16 r0, q5[6]
; CHECK-NEXT: vmov.16 q4[6], r0
; CHECK-NEXT: vmov.u16 r0, q5[7]
; CHECK-NEXT: vmov.16 q4[7], r0
; CHECK-NEXT: vmov.u16 r0, q0[2]
; CHECK-NEXT: vmov.16 q6[0], r0
; CHECK-NEXT: vmov.u16 r0, q0[6]
; CHECK-NEXT: vmov.16 q6[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[2]
; CHECK-NEXT: vmov.16 q6[2], r0
; CHECK-NEXT: vmov.u16 r0, q1[6]
; CHECK-NEXT: vmov.16 q6[3], r0
; CHECK-NEXT: vmov.u16 r0, q6[0]
; CHECK-NEXT: vmov.16 q5[0], r0
; CHECK-NEXT: vmov.u16 r0, q6[1]
; CHECK-NEXT: vmov.16 q5[1], r0
; CHECK-NEXT: vmov.u16 r0, q6[2]
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov.u16 r0, q6[3]
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov.u16 r0, q2[2]
; CHECK-NEXT: vmov.16 q6[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[6]
; CHECK-NEXT: vmov.16 q6[5], r0
; CHECK-NEXT: vmov.u16 r0, q3[2]
; CHECK-NEXT: vmov.16 q6[6], r0
; CHECK-NEXT: vmov.u16 r0, q3[6]
; CHECK-NEXT: vmov.16 q6[7], r0
; CHECK-NEXT: vmov.u16 r0, q6[4]
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov.u16 r0, q6[5]
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov.u16 r0, q6[6]
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov.u16 r0, q6[7]
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vmov.u16 r0, q0[0]
; CHECK-NEXT: vmov.16 q6[0], r0
; CHECK-NEXT: vmov.u16 r0, q0[4]
; CHECK-NEXT: vmov.16 q6[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[0]
; CHECK-NEXT: vmov.16 q6[2], r0
; CHECK-NEXT: vmov.u16 r0, q1[4]
; CHECK-NEXT: vmov.16 q6[3], r0
; CHECK-NEXT: vadd.i16 q4, q5, q4
; CHECK-NEXT: vmov.u16 r0, q6[0]
; CHECK-NEXT: vmov.16 q5[0], r0
; CHECK-NEXT: vmov.u16 r0, q6[1]
; CHECK-NEXT: vmov.16 q5[1], r0
; CHECK-NEXT: vmov.u16 r0, q6[2]
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov.u16 r0, q6[3]
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov.u16 r0, q2[0]
; CHECK-NEXT: vmov.16 q6[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[4]
; CHECK-NEXT: vmov.16 q6[5], r0
; CHECK-NEXT: vmov.u16 r0, q3[0]
; CHECK-NEXT: vmov.16 q6[6], r0
; CHECK-NEXT: vmov.u16 r0, q3[4]
; CHECK-NEXT: vmov.16 q6[7], r0
; CHECK-NEXT: vmov.u16 r0, q6[4]
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov.u16 r0, q6[5]
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov.u16 r0, q6[6]
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov.u16 r0, q0[1]
; CHECK-NEXT: vmov.16 q7[0], r0
; CHECK-NEXT: vmov.u16 r0, q0[5]
; CHECK-NEXT: vmov.16 q7[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[1]
; CHECK-NEXT: vmov.16 q7[2], r0
; CHECK-NEXT: vmov.u16 r0, q1[5]
; CHECK-NEXT: vmov.16 q7[3], r0
; CHECK-NEXT: vmov.u16 r0, q7[0]
; CHECK-NEXT: vmov.16 q0[0], r0
; CHECK-NEXT: vmov.u16 r0, q7[1]
; CHECK-NEXT: vmov.16 q0[1], r0
; CHECK-NEXT: vmov.u16 r0, q7[2]
; CHECK-NEXT: vmov.16 q0[2], r0
; CHECK-NEXT: vmov.u16 r0, q7[3]
; CHECK-NEXT: vmov.16 q0[3], r0
; CHECK-NEXT: vmov.u16 r0, q2[1]
; CHECK-NEXT: vmov.16 q1[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[5]
; CHECK-NEXT: vmov.16 q1[5], r0
; CHECK-NEXT: vmov.u16 r0, q3[1]
; CHECK-NEXT: vmov.16 q1[6], r0
; CHECK-NEXT: vmov.u16 r0, q3[5]
; CHECK-NEXT: vmov.16 q1[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[4]
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov.u16 r0, q1[5]
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vmov.u16 r0, q1[6]
; CHECK-NEXT: vmov.16 q0[6], r0
; CHECK-NEXT: vmov.u16 r0, q1[7]
; CHECK-NEXT: vmov.16 q0[7], r0
; CHECK-NEXT: vmov.u16 r0, q6[7]
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vadd.i16 q0, q5, q0
; CHECK-NEXT: vadd.i16 q0, q0, q4		; CHECK-NEXT: vadd.i16 q0, q0, q4
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <32 x i16>, <32 x i16>* %src, align 4		%l1 = load <32 x i16>, <32 x i16>* %src, align 4
%s1 = shufflevector <32 x i16> %l1, <32 x i16> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>		%s1 = shufflevector <32 x i16> %l1, <32 x i16> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>
%s2 = shufflevector <32 x i16> %l1, <32 x i16> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>		%s2 = shufflevector <32 x i16> %l1, <32 x i16> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>
%s3 = shufflevector <32 x i16> %l1, <32 x i16> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>		%s3 = shufflevector <32 x i16> %l1, <32 x i16> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>
%s4 = shufflevector <32 x i16> %l1, <32 x i16> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>		%s4 = shufflevector <32 x i16> %l1, <32 x i16> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>
%a1 = add <8 x i16> %s1, %s2		%a1 = add <8 x i16> %s1, %s2
%a2 = add <8 x i16> %s3, %s4		%a2 = add <8 x i16> %s3, %s4
%a3 = add <8 x i16> %a1, %a2		%a3 = add <8 x i16> %a1, %a2
store <8 x i16> %a3, <8 x i16> *%dst		store <8 x i16> %a3, <8 x i16> *%dst
ret void		ret void
}		}

define void @vld4_v16i16(<64 x i16> %src, <16 x i16> %dst) {		define void @vld4_v16i16(<64 x i16> %src, <16 x i16> %dst) {
; CHECK-LABEL: vld4_v16i16:		; CHECK-LABEL: vld4_v16i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: .save {r4, r5}
		; CHECK-NEXT: push {r4, r5}
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #16		; CHECK-NEXT: .pad #88
; CHECK-NEXT: sub sp, #16		; CHECK-NEXT: sub sp, #88
; CHECK-NEXT: vldrw.u32 q0, [r0, #64]		; CHECK-NEXT: add.w r2, r0, #64
; CHECK-NEXT: vldrw.u32 q1, [r0, #80]		; CHECK-NEXT: vld40.16 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vldrw.u32 q3, [r0, #112]		; CHECK-NEXT: vld41.16 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.u16 r2, q0[3]		; CHECK-NEXT: vstmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vmov.16 q2[0], r2		; CHECK-NEXT: vld40.16 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.u16 r2, q0[7]		; CHECK-NEXT: vld41.16 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.16 q2[1], r2		; CHECK-NEXT: vld42.16 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.u16 r2, q1[3]		; CHECK-NEXT: vld43.16 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.16 q2[2], r2		; CHECK-NEXT: vstrw.32 q5, [sp, #64] @ 16-byte Spill
; CHECK-NEXT: vmov.u16 r2, q1[7]		; CHECK-NEXT: vmov q1, q4
; CHECK-NEXT: vmov.16 q2[3], r2		; CHECK-NEXT: vldrw.u32 q0, [sp, #64] @ 16-byte Reload
; CHECK-NEXT: vmov.u16 r2, q2[0]		; CHECK-NEXT: vadd.i16 q4, q6, q7
; CHECK-NEXT: vmov.16 q4[0], r2		; CHECK-NEXT: vadd.i16 q5, q1, q0
; CHECK-NEXT: vmov.u16 r2, q2[1]		; CHECK-NEXT: vldmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vmov.16 q4[1], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q4[2], r2
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vldrw.u32 q2, [r0, #96]
; CHECK-NEXT: vmov.16 q4[3], r2
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vmov.16 q5[4], r2
; CHECK-NEXT: vmov.u16 r2, q2[7]
; CHECK-NEXT: vmov.16 q5[5], r2
; CHECK-NEXT: vmov.u16 r2, q3[3]
; CHECK-NEXT: vmov.16 q5[6], r2
; CHECK-NEXT: vmov.u16 r2, q3[7]
; CHECK-NEXT: vmov.16 q5[7], r2
; CHECK-NEXT: vmov.u16 r2, q5[4]
; CHECK-NEXT: vmov.16 q4[4], r2
; CHECK-NEXT: vmov.u16 r2, q5[5]
; CHECK-NEXT: vmov.16 q4[5], r2
; CHECK-NEXT: vmov.u16 r2, q5[6]
; CHECK-NEXT: vmov.16 q4[6], r2
; CHECK-NEXT: vmov.u16 r2, q5[7]
; CHECK-NEXT: vmov.16 q4[7], r2
; CHECK-NEXT: vmov.u16 r2, q0[2]
; CHECK-NEXT: vmov.16 q6[0], r2
; CHECK-NEXT: vmov.u16 r2, q0[6]
; CHECK-NEXT: vmov.16 q6[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[2]
; CHECK-NEXT: vmov.16 q6[2], r2
; CHECK-NEXT: vmov.u16 r2, q1[6]
; CHECK-NEXT: vmov.16 q6[3], r2
; CHECK-NEXT: vmov.u16 r2, q6[0]
; CHECK-NEXT: vmov.16 q5[0], r2
; CHECK-NEXT: vmov.u16 r2, q6[1]
; CHECK-NEXT: vmov.16 q5[1], r2
; CHECK-NEXT: vmov.u16 r2, q6[2]
; CHECK-NEXT: vmov.16 q5[2], r2
; CHECK-NEXT: vmov.u16 r2, q6[3]
; CHECK-NEXT: vmov.16 q5[3], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q6[4], r2
; CHECK-NEXT: vmov.u16 r2, q2[6]
; CHECK-NEXT: vmov.16 q6[5], r2
; CHECK-NEXT: vmov.u16 r2, q3[2]
; CHECK-NEXT: vmov.16 q6[6], r2
; CHECK-NEXT: vmov.u16 r2, q3[6]
; CHECK-NEXT: vmov.16 q6[7], r2
; CHECK-NEXT: vmov.u16 r2, q6[4]
; CHECK-NEXT: vmov.16 q5[4], r2
; CHECK-NEXT: vmov.u16 r2, q6[5]
; CHECK-NEXT: vmov.16 q5[5], r2
; CHECK-NEXT: vmov.u16 r2, q6[6]
; CHECK-NEXT: vmov.16 q5[6], r2
; CHECK-NEXT: vmov.u16 r2, q6[7]
; CHECK-NEXT: vmov.16 q5[7], r2
; CHECK-NEXT: vmov.u16 r2, q0[0]
; CHECK-NEXT: vmov.16 q6[0], r2
; CHECK-NEXT: vmov.u16 r2, q0[4]
; CHECK-NEXT: vmov.16 q6[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[0]
; CHECK-NEXT: vmov.16 q6[2], r2
; CHECK-NEXT: vmov.u16 r2, q1[4]
; CHECK-NEXT: vmov.16 q6[3], r2
; CHECK-NEXT: vadd.i16 q4, q5, q4		; CHECK-NEXT: vadd.i16 q4, q5, q4
; CHECK-NEXT: vmov.u16 r2, q6[0]		; CHECK-NEXT: vld42.16 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.16 q5[0], r2		; CHECK-NEXT: vld43.16 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.u16 r2, q6[1]		; CHECK-NEXT: vstrw.32 q4, [r1]
; CHECK-NEXT: vmov.16 q5[1], r2		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
; CHECK-NEXT: vmov.u16 r2, q6[2]		; CHECK-NEXT: vadd.i16 q5, q2, q3
; CHECK-NEXT: vmov.16 q5[2], r2		; CHECK-NEXT: vadd.i16 q0, q0, q1
; CHECK-NEXT: vmov.u16 r2, q6[3]
; CHECK-NEXT: vmov.16 q5[3], r2
; CHECK-NEXT: vmov.u16 r2, q2[0]
; CHECK-NEXT: vmov.16 q6[4], r2
; CHECK-NEXT: vmov.u16 r2, q2[4]
; CHECK-NEXT: vmov.16 q6[5], r2
; CHECK-NEXT: vmov.u16 r2, q3[0]
; CHECK-NEXT: vmov.16 q6[6], r2
; CHECK-NEXT: vmov.u16 r2, q3[4]
; CHECK-NEXT: vmov.16 q6[7], r2
; CHECK-NEXT: vmov.u16 r2, q6[4]
; CHECK-NEXT: vmov.16 q5[4], r2
; CHECK-NEXT: vmov.u16 r2, q6[5]
; CHECK-NEXT: vmov.16 q5[5], r2
; CHECK-NEXT: vmov.u16 r2, q6[6]
; CHECK-NEXT: vmov.16 q5[6], r2
; CHECK-NEXT: vmov.u16 r2, q0[1]
; CHECK-NEXT: vmov.16 q7[0], r2
; CHECK-NEXT: vmov.u16 r2, q0[5]
; CHECK-NEXT: vmov.16 q7[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[1]
; CHECK-NEXT: vmov.16 q7[2], r2
; CHECK-NEXT: vmov.u16 r2, q1[5]
; CHECK-NEXT: vmov.16 q7[3], r2
; CHECK-NEXT: vmov.u16 r2, q7[0]
; CHECK-NEXT: vmov.16 q0[0], r2
; CHECK-NEXT: vmov.u16 r2, q7[1]
; CHECK-NEXT: vmov.16 q0[1], r2
; CHECK-NEXT: vmov.u16 r2, q7[2]
; CHECK-NEXT: vmov.16 q0[2], r2
; CHECK-NEXT: vmov.u16 r2, q7[3]
; CHECK-NEXT: vmov.16 q0[3], r2
; CHECK-NEXT: vmov.u16 r2, q2[1]
; CHECK-NEXT: vmov.16 q1[4], r2
; CHECK-NEXT: vmov.u16 r2, q2[5]
; CHECK-NEXT: vmov.16 q1[5], r2
; CHECK-NEXT: vmov.u16 r2, q3[1]
; CHECK-NEXT: vmov.16 q1[6], r2
; CHECK-NEXT: vmov.u16 r2, q3[5]
; CHECK-NEXT: vmov.16 q1[7], r2
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]
; CHECK-NEXT: vmov.u16 r2, q1[4]
; CHECK-NEXT: vmov.16 q0[4], r2
; CHECK-NEXT: vmov.u16 r2, q1[5]
; CHECK-NEXT: vmov.16 q0[5], r2
; CHECK-NEXT: vmov.u16 r2, q1[6]
; CHECK-NEXT: vmov.16 q0[6], r2
; CHECK-NEXT: vmov.u16 r2, q1[7]
; CHECK-NEXT: vldrw.u32 q1, [r0]
; CHECK-NEXT: vmov.16 q0[7], r2
; CHECK-NEXT: vmov.u16 r2, q6[7]
; CHECK-NEXT: vmov.16 q5[7], r2
; CHECK-NEXT: vmov.u16 r2, q1[2]
; CHECK-NEXT: vmov.16 q3[0], r2
; CHECK-NEXT: vmov.u16 r2, q1[6]
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q3[2], r2
; CHECK-NEXT: vmov.u16 r2, q2[6]
; CHECK-NEXT: vmov.16 q3[3], r2
; CHECK-NEXT: vadd.i16 q0, q5, q0
; CHECK-NEXT: vmov.u16 r2, q3[0]
; CHECK-NEXT: vadd.i16 q0, q0, q4
; CHECK-NEXT: vmov.16 q5[0], r2
; CHECK-NEXT: vmov.u16 r2, q3[1]
; CHECK-NEXT: vmov.16 q5[1], r2
; CHECK-NEXT: vmov.u16 r2, q3[2]
; CHECK-NEXT: vmov.16 q5[2], r2
; CHECK-NEXT: vmov.u16 r2, q3[3]
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]
; CHECK-NEXT: vldrw.u32 q4, [r0, #48]
; CHECK-NEXT: vmov.16 q5[3], r2
; CHECK-NEXT: vstrw.32 q0, [sp] @ 16-byte Spill
; CHECK-NEXT: vmov.u16 r2, q3[2]
; CHECK-NEXT: vmov.u16 r0, q4[2]
; CHECK-NEXT: vmov.16 q6[4], r2
; CHECK-NEXT: vmov.u16 r2, q3[6]
; CHECK-NEXT: vmov.16 q6[5], r2
; CHECK-NEXT: vmov.16 q6[6], r0
; CHECK-NEXT: vmov.u16 r0, q4[6]
; CHECK-NEXT: vmov.16 q6[7], r0
; CHECK-NEXT: vmov.u16 r0, q6[4]
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov.u16 r0, q6[5]
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov.u16 r0, q6[6]
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov.u16 r0, q1[3]
; CHECK-NEXT: vmov.16 q0[0], r0
; CHECK-NEXT: vmov.u16 r0, q1[7]
; CHECK-NEXT: vmov.16 q0[1], r0
; CHECK-NEXT: vmov.u16 r0, q2[3]
; CHECK-NEXT: vmov.16 q0[2], r0
; CHECK-NEXT: vmov.u16 r0, q2[7]
; CHECK-NEXT: vmov.16 q0[3], r0
; CHECK-NEXT: vmov.u16 r0, q0[0]
; CHECK-NEXT: vmov.16 q7[0], r0
; CHECK-NEXT: vmov.u16 r0, q0[1]
; CHECK-NEXT: vmov.16 q7[1], r0
; CHECK-NEXT: vmov.u16 r0, q0[2]
; CHECK-NEXT: vmov.16 q7[2], r0
; CHECK-NEXT: vmov.u16 r0, q0[3]
; CHECK-NEXT: vmov.16 q7[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[3]
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov.u16 r0, q3[7]
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vmov.u16 r0, q4[3]
; CHECK-NEXT: vmov.16 q0[6], r0
; CHECK-NEXT: vmov.u16 r0, q4[7]
; CHECK-NEXT: vmov.16 q0[7], r0
; CHECK-NEXT: vmov.u16 r0, q0[4]
; CHECK-NEXT: vmov.16 q7[4], r0
; CHECK-NEXT: vmov.u16 r0, q0[5]
; CHECK-NEXT: vmov.16 q7[5], r0
; CHECK-NEXT: vmov.u16 r0, q0[6]
; CHECK-NEXT: vmov.16 q7[6], r0
; CHECK-NEXT: vmov.u16 r0, q0[7]
; CHECK-NEXT: vmov.16 q7[7], r0
; CHECK-NEXT: vmov.u16 r0, q6[7]
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[1]
; CHECK-NEXT: vmov.16 q0[0], r0
; CHECK-NEXT: vmov.u16 r0, q1[5]
; CHECK-NEXT: vmov.16 q0[1], r0
; CHECK-NEXT: vmov.u16 r0, q2[1]
; CHECK-NEXT: vmov.16 q0[2], r0
; CHECK-NEXT: vmov.u16 r0, q2[5]
; CHECK-NEXT: vmov.16 q0[3], r0
; CHECK-NEXT: vadd.i16 q5, q5, q7
; CHECK-NEXT: vmov.u16 r0, q0[0]
; CHECK-NEXT: vmov.16 q6[0], r0
; CHECK-NEXT: vmov.u16 r0, q0[1]
; CHECK-NEXT: vmov.16 q6[1], r0
; CHECK-NEXT: vmov.u16 r0, q0[2]
; CHECK-NEXT: vmov.16 q6[2], r0
; CHECK-NEXT: vmov.u16 r0, q0[3]
; CHECK-NEXT: vmov.16 q6[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[1]
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov.u16 r0, q3[5]
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vmov.u16 r0, q4[1]
; CHECK-NEXT: vmov.16 q0[6], r0
; CHECK-NEXT: vmov.u16 r0, q4[5]
; CHECK-NEXT: vmov.16 q0[7], r0
; CHECK-NEXT: vmov.u16 r0, q0[4]
; CHECK-NEXT: vmov.16 q6[4], r0
; CHECK-NEXT: vmov.u16 r0, q0[5]
; CHECK-NEXT: vmov.16 q6[5], r0
; CHECK-NEXT: vmov.u16 r0, q0[6]
; CHECK-NEXT: vmov.16 q6[6], r0
; CHECK-NEXT: vmov.u16 r0, q0[7]
; CHECK-NEXT: vmov.16 q6[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[0]
; CHECK-NEXT: vmov.16 q0[0], r0
; CHECK-NEXT: vmov.u16 r0, q1[4]
; CHECK-NEXT: vmov.16 q0[1], r0
; CHECK-NEXT: vmov.u16 r0, q2[0]
; CHECK-NEXT: vmov.16 q0[2], r0
; CHECK-NEXT: vmov.u16 r0, q2[4]
; CHECK-NEXT: vmov.16 q0[3], r0
; CHECK-NEXT: vmov.u16 r0, q0[0]
; CHECK-NEXT: vmov.16 q1[0], r0
; CHECK-NEXT: vmov.u16 r0, q0[1]
; CHECK-NEXT: vmov.16 q1[1], r0
; CHECK-NEXT: vmov.u16 r0, q0[2]
; CHECK-NEXT: vmov.16 q1[2], r0
; CHECK-NEXT: vmov.u16 r0, q0[3]
; CHECK-NEXT: vmov.16 q1[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[0]
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov.u16 r0, q3[4]
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vmov.u16 r0, q4[0]
; CHECK-NEXT: vmov.16 q0[6], r0
; CHECK-NEXT: vmov.u16 r0, q4[4]
; CHECK-NEXT: vmov.16 q0[7], r0
; CHECK-NEXT: vmov.u16 r0, q0[4]
; CHECK-NEXT: vmov.16 q1[4], r0
; CHECK-NEXT: vmov.u16 r0, q0[5]
; CHECK-NEXT: vmov.16 q1[5], r0
; CHECK-NEXT: vmov.u16 r0, q0[6]
; CHECK-NEXT: vmov.16 q1[6], r0
; CHECK-NEXT: vmov.u16 r0, q0[7]
; CHECK-NEXT: vmov.16 q1[7], r0
; CHECK-NEXT: vadd.i16 q0, q1, q6
; CHECK-NEXT: vldrw.u32 q1, [sp] @ 16-byte Reload
; CHECK-NEXT: vadd.i16 q0, q0, q5		; CHECK-NEXT: vadd.i16 q0, q0, q5
; CHECK-NEXT: vstrw.32 q1, [r1, #16]		; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: add sp, #88
; CHECK-NEXT: add sp, #16
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
		; CHECK-NEXT: pop {r4, r5}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <64 x i16>, <64 x i16>* %src, align 4		%l1 = load <64 x i16>, <64 x i16>* %src, align 4
%s1 = shufflevector <64 x i16> %l1, <64 x i16> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>		%s1 = shufflevector <64 x i16> %l1, <64 x i16> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>
%s2 = shufflevector <64 x i16> %l1, <64 x i16> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>		%s2 = shufflevector <64 x i16> %l1, <64 x i16> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>
%s3 = shufflevector <64 x i16> %l1, <64 x i16> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>		%s3 = shufflevector <64 x i16> %l1, <64 x i16> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>
%s4 = shufflevector <64 x i16> %l1, <64 x i16> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>		%s4 = shufflevector <64 x i16> %l1, <64 x i16> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>
%a1 = add <16 x i16> %s1, %s2		%a1 = add <16 x i16> %s1, %s2
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	entry:
%a3 = add <8 x i8> %a1, %a2		%a3 = add <8 x i8> %a1, %a2
store <8 x i8> %a3, <8 x i8> *%dst		store <8 x i8> %a3, <8 x i8> *%dst
ret void		ret void
}		}

define void @vld4_v16i8(<64 x i8> %src, <16 x i8> %dst) {		define void @vld4_v16i8(<64 x i8> %src, <16 x i8> %dst) {
; CHECK-LABEL: vld4_v16i8:		; CHECK-LABEL: vld4_v16i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9}
; CHECK-NEXT: vldrw.u32 q0, [r0]		; CHECK-NEXT: vld40.8 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q1, [r0, #16]		; CHECK-NEXT: vld41.8 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q3, [r0, #48]		; CHECK-NEXT: vld42.8 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.u8 r2, q0[3]		; CHECK-NEXT: vld43.8 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.8 q2[0], r2		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
; CHECK-NEXT: vmov.u8 r2, q0[7]		; CHECK-NEXT: vadd.i8 q4, q2, q3
; CHECK-NEXT: vmov.8 q2[1], r2		; CHECK-NEXT: vadd.i8 q0, q0, q1
; CHECK-NEXT: vmov.u8 r2, q0[11]
; CHECK-NEXT: vmov.8 q2[2], r2
; CHECK-NEXT: vmov.u8 r2, q0[15]
; CHECK-NEXT: vmov.8 q2[3], r2
; CHECK-NEXT: vmov.u8 r2, q1[3]
; CHECK-NEXT: vmov.8 q2[4], r2
; CHECK-NEXT: vmov.u8 r2, q1[7]
; CHECK-NEXT: vmov.8 q2[5], r2
; CHECK-NEXT: vmov.u8 r2, q1[11]
; CHECK-NEXT: vmov.8 q2[6], r2
; CHECK-NEXT: vmov.u8 r2, q1[15]
; CHECK-NEXT: vmov.8 q2[7], r2
; CHECK-NEXT: vmov.u8 r2, q2[0]
; CHECK-NEXT: vmov.8 q4[0], r2
; CHECK-NEXT: vmov.u8 r2, q2[1]
; CHECK-NEXT: vmov.8 q4[1], r2
; CHECK-NEXT: vmov.u8 r2, q2[2]
; CHECK-NEXT: vmov.8 q4[2], r2
; CHECK-NEXT: vmov.u8 r2, q2[3]
; CHECK-NEXT: vmov.8 q4[3], r2
; CHECK-NEXT: vmov.u8 r2, q2[4]
; CHECK-NEXT: vmov.8 q4[4], r2
; CHECK-NEXT: vmov.u8 r2, q2[5]
; CHECK-NEXT: vmov.8 q4[5], r2
; CHECK-NEXT: vmov.u8 r2, q2[6]
; CHECK-NEXT: vmov.8 q4[6], r2
; CHECK-NEXT: vmov.u8 r2, q2[7]
; CHECK-NEXT: vldrw.u32 q2, [r0, #32]
; CHECK-NEXT: vmov.8 q4[7], r2
; CHECK-NEXT: vmov.u8 r0, q3[3]
; CHECK-NEXT: vmov.u8 r2, q2[3]
; CHECK-NEXT: vmov.8 q5[8], r2
; CHECK-NEXT: vmov.u8 r2, q2[7]
; CHECK-NEXT: vmov.8 q5[9], r2
; CHECK-NEXT: vmov.u8 r2, q2[11]
; CHECK-NEXT: vmov.8 q5[10], r2
; CHECK-NEXT: vmov.u8 r2, q2[15]
; CHECK-NEXT: vmov.8 q5[11], r2
; CHECK-NEXT: vmov.8 q5[12], r0
; CHECK-NEXT: vmov.u8 r0, q3[7]
; CHECK-NEXT: vmov.8 q5[13], r0
; CHECK-NEXT: vmov.u8 r0, q3[11]
; CHECK-NEXT: vmov.8 q5[14], r0
; CHECK-NEXT: vmov.u8 r0, q3[15]
; CHECK-NEXT: vmov.8 q5[15], r0
; CHECK-NEXT: vmov.u8 r0, q5[8]
; CHECK-NEXT: vmov.8 q4[8], r0
; CHECK-NEXT: vmov.u8 r0, q5[9]
; CHECK-NEXT: vmov.8 q4[9], r0
; CHECK-NEXT: vmov.u8 r0, q5[10]
; CHECK-NEXT: vmov.8 q4[10], r0
; CHECK-NEXT: vmov.u8 r0, q5[11]
; CHECK-NEXT: vmov.8 q4[11], r0
; CHECK-NEXT: vmov.u8 r0, q5[12]
; CHECK-NEXT: vmov.8 q4[12], r0
; CHECK-NEXT: vmov.u8 r0, q5[13]
; CHECK-NEXT: vmov.8 q4[13], r0
; CHECK-NEXT: vmov.u8 r0, q5[14]
; CHECK-NEXT: vmov.8 q4[14], r0
; CHECK-NEXT: vmov.u8 r0, q5[15]
; CHECK-NEXT: vmov.8 q4[15], r0
; CHECK-NEXT: vmov.u8 r0, q0[2]
; CHECK-NEXT: vmov.8 q6[0], r0
; CHECK-NEXT: vmov.u8 r0, q0[6]
; CHECK-NEXT: vmov.8 q6[1], r0
; CHECK-NEXT: vmov.u8 r0, q0[10]
; CHECK-NEXT: vmov.8 q6[2], r0
; CHECK-NEXT: vmov.u8 r0, q0[14]
; CHECK-NEXT: vmov.8 q6[3], r0
; CHECK-NEXT: vmov.u8 r0, q1[2]
; CHECK-NEXT: vmov.8 q6[4], r0
; CHECK-NEXT: vmov.u8 r0, q1[6]
; CHECK-NEXT: vmov.8 q6[5], r0
; CHECK-NEXT: vmov.u8 r0, q1[10]
; CHECK-NEXT: vmov.8 q6[6], r0
; CHECK-NEXT: vmov.u8 r0, q1[14]
; CHECK-NEXT: vmov.8 q6[7], r0
; CHECK-NEXT: vmov.u8 r0, q6[0]
; CHECK-NEXT: vmov.8 q5[0], r0
; CHECK-NEXT: vmov.u8 r0, q6[1]
; CHECK-NEXT: vmov.8 q5[1], r0
; CHECK-NEXT: vmov.u8 r0, q6[2]
; CHECK-NEXT: vmov.8 q5[2], r0
; CHECK-NEXT: vmov.u8 r0, q6[3]
; CHECK-NEXT: vmov.8 q5[3], r0
; CHECK-NEXT: vmov.u8 r0, q6[4]
; CHECK-NEXT: vmov.8 q5[4], r0
; CHECK-NEXT: vmov.u8 r0, q6[5]
; CHECK-NEXT: vmov.8 q5[5], r0
; CHECK-NEXT: vmov.u8 r0, q6[6]
; CHECK-NEXT: vmov.8 q5[6], r0
; CHECK-NEXT: vmov.u8 r0, q6[7]
; CHECK-NEXT: vmov.8 q5[7], r0
; CHECK-NEXT: vmov.u8 r0, q2[2]
; CHECK-NEXT: vmov.8 q6[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[6]
; CHECK-NEXT: vmov.8 q6[9], r0
; CHECK-NEXT: vmov.u8 r0, q2[10]
; CHECK-NEXT: vmov.8 q6[10], r0
; CHECK-NEXT: vmov.u8 r0, q2[14]
; CHECK-NEXT: vmov.8 q6[11], r0
; CHECK-NEXT: vmov.u8 r0, q3[2]
; CHECK-NEXT: vmov.8 q6[12], r0
; CHECK-NEXT: vmov.u8 r0, q3[6]
; CHECK-NEXT: vmov.8 q6[13], r0
; CHECK-NEXT: vmov.u8 r0, q3[10]
; CHECK-NEXT: vmov.8 q6[14], r0
; CHECK-NEXT: vmov.u8 r0, q3[14]
; CHECK-NEXT: vmov.8 q6[15], r0
; CHECK-NEXT: vmov.u8 r0, q6[8]
; CHECK-NEXT: vmov.8 q5[8], r0
; CHECK-NEXT: vmov.u8 r0, q6[9]
; CHECK-NEXT: vmov.8 q5[9], r0
; CHECK-NEXT: vmov.u8 r0, q6[10]
; CHECK-NEXT: vmov.8 q5[10], r0
; CHECK-NEXT: vmov.u8 r0, q6[11]
; CHECK-NEXT: vmov.8 q5[11], r0
; CHECK-NEXT: vmov.u8 r0, q6[12]
; CHECK-NEXT: vmov.8 q5[12], r0
; CHECK-NEXT: vmov.u8 r0, q6[13]
; CHECK-NEXT: vmov.8 q5[13], r0
; CHECK-NEXT: vmov.u8 r0, q6[14]
; CHECK-NEXT: vmov.8 q5[14], r0
; CHECK-NEXT: vmov.u8 r0, q6[15]
; CHECK-NEXT: vmov.8 q5[15], r0
; CHECK-NEXT: vmov.u8 r0, q0[0]
; CHECK-NEXT: vmov.8 q6[0], r0
; CHECK-NEXT: vmov.u8 r0, q0[4]
; CHECK-NEXT: vmov.8 q6[1], r0
; CHECK-NEXT: vmov.u8 r0, q0[8]
; CHECK-NEXT: vmov.8 q6[2], r0
; CHECK-NEXT: vmov.u8 r0, q0[12]
; CHECK-NEXT: vmov.8 q6[3], r0
; CHECK-NEXT: vmov.u8 r0, q1[0]
; CHECK-NEXT: vmov.8 q6[4], r0
; CHECK-NEXT: vmov.u8 r0, q1[4]
; CHECK-NEXT: vmov.8 q6[5], r0
; CHECK-NEXT: vmov.u8 r0, q1[8]
; CHECK-NEXT: vmov.8 q6[6], r0
; CHECK-NEXT: vmov.u8 r0, q1[12]
; CHECK-NEXT: vmov.8 q6[7], r0
; CHECK-NEXT: vadd.i8 q4, q5, q4
; CHECK-NEXT: vmov.u8 r0, q6[0]
; CHECK-NEXT: vmov.8 q5[0], r0
; CHECK-NEXT: vmov.u8 r0, q6[1]
; CHECK-NEXT: vmov.8 q5[1], r0
; CHECK-NEXT: vmov.u8 r0, q6[2]
; CHECK-NEXT: vmov.8 q5[2], r0
; CHECK-NEXT: vmov.u8 r0, q6[3]
; CHECK-NEXT: vmov.8 q5[3], r0
; CHECK-NEXT: vmov.u8 r0, q6[4]
; CHECK-NEXT: vmov.8 q5[4], r0
; CHECK-NEXT: vmov.u8 r0, q6[5]
; CHECK-NEXT: vmov.8 q5[5], r0
; CHECK-NEXT: vmov.u8 r0, q6[6]
; CHECK-NEXT: vmov.8 q5[6], r0
; CHECK-NEXT: vmov.u8 r0, q6[7]
; CHECK-NEXT: vmov.8 q5[7], r0
; CHECK-NEXT: vmov.u8 r0, q2[0]
; CHECK-NEXT: vmov.8 q6[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[4]
; CHECK-NEXT: vmov.8 q6[9], r0
; CHECK-NEXT: vmov.u8 r0, q2[8]
; CHECK-NEXT: vmov.8 q6[10], r0
; CHECK-NEXT: vmov.u8 r0, q2[12]
; CHECK-NEXT: vmov.8 q6[11], r0
; CHECK-NEXT: vmov.u8 r0, q3[0]
; CHECK-NEXT: vmov.8 q6[12], r0
; CHECK-NEXT: vmov.u8 r0, q3[4]
; CHECK-NEXT: vmov.8 q6[13], r0
; CHECK-NEXT: vmov.u8 r0, q3[8]
; CHECK-NEXT: vmov.8 q6[14], r0
; CHECK-NEXT: vmov.u8 r0, q3[12]
; CHECK-NEXT: vmov.8 q6[15], r0
; CHECK-NEXT: vmov.u8 r0, q6[8]
; CHECK-NEXT: vmov.8 q5[8], r0
; CHECK-NEXT: vmov.u8 r0, q6[9]
; CHECK-NEXT: vmov.8 q5[9], r0
; CHECK-NEXT: vmov.u8 r0, q6[10]
; CHECK-NEXT: vmov.8 q5[10], r0
; CHECK-NEXT: vmov.u8 r0, q6[11]
; CHECK-NEXT: vmov.8 q5[11], r0
; CHECK-NEXT: vmov.u8 r0, q6[12]
; CHECK-NEXT: vmov.8 q5[12], r0
; CHECK-NEXT: vmov.u8 r0, q6[13]
; CHECK-NEXT: vmov.8 q5[13], r0
; CHECK-NEXT: vmov.u8 r0, q6[14]
; CHECK-NEXT: vmov.8 q5[14], r0
; CHECK-NEXT: vmov.u8 r0, q0[1]
; CHECK-NEXT: vmov.8 q7[0], r0
; CHECK-NEXT: vmov.u8 r0, q0[5]
; CHECK-NEXT: vmov.8 q7[1], r0
; CHECK-NEXT: vmov.u8 r0, q0[9]
; CHECK-NEXT: vmov.8 q7[2], r0
; CHECK-NEXT: vmov.u8 r0, q0[13]
; CHECK-NEXT: vmov.8 q7[3], r0
; CHECK-NEXT: vmov.u8 r0, q1[1]
; CHECK-NEXT: vmov.8 q7[4], r0
; CHECK-NEXT: vmov.u8 r0, q1[5]
; CHECK-NEXT: vmov.8 q7[5], r0
; CHECK-NEXT: vmov.u8 r0, q1[9]
; CHECK-NEXT: vmov.8 q7[6], r0
; CHECK-NEXT: vmov.u8 r0, q1[13]
; CHECK-NEXT: vmov.8 q7[7], r0
; CHECK-NEXT: vmov.u8 r0, q7[0]
; CHECK-NEXT: vmov.8 q0[0], r0
; CHECK-NEXT: vmov.u8 r0, q7[1]
; CHECK-NEXT: vmov.8 q0[1], r0
; CHECK-NEXT: vmov.u8 r0, q7[2]
; CHECK-NEXT: vmov.8 q0[2], r0
; CHECK-NEXT: vmov.u8 r0, q7[3]
; CHECK-NEXT: vmov.8 q0[3], r0
; CHECK-NEXT: vmov.u8 r0, q7[4]
; CHECK-NEXT: vmov.8 q0[4], r0
; CHECK-NEXT: vmov.u8 r0, q7[5]
; CHECK-NEXT: vmov.8 q0[5], r0
; CHECK-NEXT: vmov.u8 r0, q7[6]
; CHECK-NEXT: vmov.8 q0[6], r0
; CHECK-NEXT: vmov.u8 r0, q7[7]
; CHECK-NEXT: vmov.8 q0[7], r0
; CHECK-NEXT: vmov.u8 r0, q2[1]
; CHECK-NEXT: vmov.8 q1[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[5]
; CHECK-NEXT: vmov.8 q1[9], r0
; CHECK-NEXT: vmov.u8 r0, q2[9]
; CHECK-NEXT: vmov.8 q1[10], r0
; CHECK-NEXT: vmov.u8 r0, q2[13]
; CHECK-NEXT: vmov.8 q1[11], r0
; CHECK-NEXT: vmov.u8 r0, q3[1]
; CHECK-NEXT: vmov.8 q1[12], r0
; CHECK-NEXT: vmov.u8 r0, q3[5]
; CHECK-NEXT: vmov.8 q1[13], r0
; CHECK-NEXT: vmov.u8 r0, q3[9]
; CHECK-NEXT: vmov.8 q1[14], r0
; CHECK-NEXT: vmov.u8 r0, q3[13]
; CHECK-NEXT: vmov.8 q1[15], r0
; CHECK-NEXT: vmov.u8 r0, q1[8]
; CHECK-NEXT: vmov.8 q0[8], r0
; CHECK-NEXT: vmov.u8 r0, q1[9]
; CHECK-NEXT: vmov.8 q0[9], r0
; CHECK-NEXT: vmov.u8 r0, q1[10]
; CHECK-NEXT: vmov.8 q0[10], r0
; CHECK-NEXT: vmov.u8 r0, q1[11]
; CHECK-NEXT: vmov.8 q0[11], r0
; CHECK-NEXT: vmov.u8 r0, q1[12]
; CHECK-NEXT: vmov.8 q0[12], r0
; CHECK-NEXT: vmov.u8 r0, q1[13]
; CHECK-NEXT: vmov.8 q0[13], r0
; CHECK-NEXT: vmov.u8 r0, q1[14]
; CHECK-NEXT: vmov.8 q0[14], r0
; CHECK-NEXT: vmov.u8 r0, q1[15]
; CHECK-NEXT: vmov.8 q0[15], r0
; CHECK-NEXT: vmov.u8 r0, q6[15]
; CHECK-NEXT: vmov.8 q5[15], r0
; CHECK-NEXT: vadd.i8 q0, q5, q0
; CHECK-NEXT: vadd.i8 q0, q0, q4		; CHECK-NEXT: vadd.i8 q0, q0, q4
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <64 x i8>, <64 x i8>* %src, align 4		%l1 = load <64 x i8>, <64 x i8>* %src, align 4
%s1 = shufflevector <64 x i8> %l1, <64 x i8> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>		%s1 = shufflevector <64 x i8> %l1, <64 x i8> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>
%s2 = shufflevector <64 x i8> %l1, <64 x i8> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>		%s2 = shufflevector <64 x i8> %l1, <64 x i8> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>
%s3 = shufflevector <64 x i8> %l1, <64 x i8> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>		%s3 = shufflevector <64 x i8> %l1, <64 x i8> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>
%s4 = shufflevector <64 x i8> %l1, <64 x i8> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>		%s4 = shufflevector <64 x i8> %l1, <64 x i8> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>
%a1 = add <16 x i8> %s1, %s2		%a1 = add <16 x i8> %s1, %s2
▲ Show 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	entry:
%a3 = fadd <2 x float> %a1, %a2		%a3 = fadd <2 x float> %a1, %a2
store <2 x float> %a3, <2 x float> *%dst		store <2 x float> %a3, <2 x float> *%dst
ret void		ret void
}		}

define void @vld4_v4f32(<16 x float> %src, <4 x float> %dst) {		define void @vld4_v4f32(<16 x float> %src, <4 x float> %dst) {
; CHECK-LABEL: vld4_v4f32:		; CHECK-LABEL: vld4_v4f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11}		; CHECK-NEXT: .vsave {d8, d9}
; CHECK-NEXT: vpush {d8, d9, d10, d11}		; CHECK-NEXT: vpush {d8, d9}
; CHECK-NEXT: vldrw.u32 q0, [r0]		; CHECK-NEXT: vld40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]		; CHECK-NEXT: vld41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: vld42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vld43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s18, s15		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
; CHECK-NEXT: vmov.f64 d10, d1		; CHECK-NEXT: vadd.f32 q4, q2, q3
; CHECK-NEXT: vmov.f32 s19, s7		; CHECK-NEXT: vadd.f32 q0, q0, q1
; CHECK-NEXT: vmov.f32 s21, s10
; CHECK-NEXT: vmov.f32 s16, s3
; CHECK-NEXT: vmov.f32 s15, s6
; CHECK-NEXT: vmov.f32 s22, s14
; CHECK-NEXT: vmov.f32 s17, s11
; CHECK-NEXT: vmov.f32 s23, s6
; CHECK-NEXT: vadd.f32 q4, q5, q4
; CHECK-NEXT: vmov.f32 s22, s13
; CHECK-NEXT: vmov.f32 s23, s5
; CHECK-NEXT: vmov.f32 s20, s1
; CHECK-NEXT: vmov.f32 s2, s12
; CHECK-NEXT: vmov.f32 s3, s4
; CHECK-NEXT: vmov.f32 s21, s9
; CHECK-NEXT: vmov.f32 s1, s8
; CHECK-NEXT: vadd.f32 q0, q0, q5
; CHECK-NEXT: vadd.f32 q0, q0, q4		; CHECK-NEXT: vadd.f32 q0, q0, q4
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vpop {d8, d9, d10, d11}		; CHECK-NEXT: vpop {d8, d9}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <16 x float>, <16 x float>* %src, align 4		%l1 = load <16 x float>, <16 x float>* %src, align 4
%s1 = shufflevector <16 x float> %l1, <16 x float> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>		%s1 = shufflevector <16 x float> %l1, <16 x float> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
%s2 = shufflevector <16 x float> %l1, <16 x float> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>		%s2 = shufflevector <16 x float> %l1, <16 x float> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>
%s3 = shufflevector <16 x float> %l1, <16 x float> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>		%s3 = shufflevector <16 x float> %l1, <16 x float> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>
%s4 = shufflevector <16 x float> %l1, <16 x float> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>		%s4 = shufflevector <16 x float> %l1, <16 x float> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>
%a1 = fadd <4 x float> %s1, %s2		%a1 = fadd <4 x float> %s1, %s2
%a2 = fadd <4 x float> %s3, %s4		%a2 = fadd <4 x float> %s3, %s4
%a3 = fadd <4 x float> %a1, %a2		%a3 = fadd <4 x float> %a1, %a2
store <4 x float> %a3, <4 x float> *%dst		store <4 x float> %a3, <4 x float> *%dst
ret void		ret void
}		}

define void @vld4_v8f32(<32 x float> %src, <8 x float> %dst) {		define void @vld4_v8f32(<32 x float> %src, <8 x float> %dst) {
; CHECK-LABEL: vld4_v8f32:		; CHECK-LABEL: vld4_v8f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13}		; CHECK-NEXT: .save {r4, r5}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13}		; CHECK-NEXT: push {r4, r5}
; CHECK-NEXT: vldrw.u32 q0, [r0, #64]		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vldrw.u32 q3, [r0, #96]		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vldrw.u32 q1, [r0, #112]		; CHECK-NEXT: .pad #88
; CHECK-NEXT: vldrw.u32 q2, [r0, #80]		; CHECK-NEXT: sub sp, #88
; CHECK-NEXT: vmov.f32 s18, s15		; CHECK-NEXT: add.w r2, r0, #64
; CHECK-NEXT: vmov.f64 d10, d1		; CHECK-NEXT: vld40.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s19, s7		; CHECK-NEXT: vld41.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s21, s10		; CHECK-NEXT: vstmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vmov.f32 s16, s3		; CHECK-NEXT: vld40.32 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.f32 s15, s6		; CHECK-NEXT: vld41.32 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.f32 s22, s14		; CHECK-NEXT: vld42.32 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.f32 s17, s11		; CHECK-NEXT: vld43.32 {q4, q5, q6, q7}, [r0]
; CHECK-NEXT: vmov.f32 s23, s6		; CHECK-NEXT: vstrw.32 q5, [sp, #64] @ 16-byte Spill
		; CHECK-NEXT: vmov q1, q4
		; CHECK-NEXT: vldrw.u32 q0, [sp, #64] @ 16-byte Reload
		; CHECK-NEXT: vadd.f32 q4, q6, q7
		; CHECK-NEXT: vadd.f32 q5, q1, q0
		; CHECK-NEXT: vldmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vadd.f32 q4, q5, q4		; CHECK-NEXT: vadd.f32 q4, q5, q4
; CHECK-NEXT: vmov.f32 s22, s13		; CHECK-NEXT: vld42.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s23, s5		; CHECK-NEXT: vld43.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s20, s1		; CHECK-NEXT: vstrw.32 q4, [r1]
; CHECK-NEXT: vmov.f32 s2, s12		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
; CHECK-NEXT: vldrw.u32 q3, [r0, #16]		; CHECK-NEXT: vadd.f32 q5, q2, q3
; CHECK-NEXT: vmov.f32 s3, s4		; CHECK-NEXT: vadd.f32 q0, q0, q1
; CHECK-NEXT: vldrw.u32 q1, [r0]
; CHECK-NEXT: vmov.f32 s21, s9
; CHECK-NEXT: vmov.f32 s1, s8
; CHECK-NEXT: vldrw.u32 q2, [r0, #48]
; CHECK-NEXT: vadd.f32 q0, q0, q5		; CHECK-NEXT: vadd.f32 q0, q0, q5
; CHECK-NEXT: vmov.f64 d12, d3
; CHECK-NEXT: vadd.f32 q0, q0, q4
; CHECK-NEXT: vldrw.u32 q4, [r0, #32]
; CHECK-NEXT: vstrw.32 q0, [r1, #16]		; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.f32 s22, s19		; CHECK-NEXT: add sp, #88
; CHECK-NEXT: vmov.f32 s23, s11		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vmov.f32 s25, s14		; CHECK-NEXT: pop {r4, r5}
; CHECK-NEXT: vmov.f32 s20, s7
; CHECK-NEXT: vmov.f32 s19, s10
; CHECK-NEXT: vmov.f32 s26, s18
; CHECK-NEXT: vmov.f32 s21, s15
; CHECK-NEXT: vmov.f32 s27, s10
; CHECK-NEXT: vadd.f32 q5, q6, q5
; CHECK-NEXT: vmov.f32 s26, s17
; CHECK-NEXT: vmov.f32 s27, s9
; CHECK-NEXT: vmov.f32 s24, s5
; CHECK-NEXT: vmov.f32 s6, s16
; CHECK-NEXT: vmov.f32 s7, s8
; CHECK-NEXT: vmov.f32 s25, s13
; CHECK-NEXT: vmov.f32 s5, s12
; CHECK-NEXT: vadd.f32 q1, q1, q6
; CHECK-NEXT: vadd.f32 q1, q1, q5
; CHECK-NEXT: vstrw.32 q1, [r1]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <32 x float>, <32 x float>* %src, align 4		%l1 = load <32 x float>, <32 x float>* %src, align 4
%s1 = shufflevector <32 x float> %l1, <32 x float> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>		%s1 = shufflevector <32 x float> %l1, <32 x float> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>
%s2 = shufflevector <32 x float> %l1, <32 x float> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>		%s2 = shufflevector <32 x float> %l1, <32 x float> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>
%s3 = shufflevector <32 x float> %l1, <32 x float> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>		%s3 = shufflevector <32 x float> %l1, <32 x float> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>
%s4 = shufflevector <32 x float> %l1, <32 x float> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>		%s4 = shufflevector <32 x float> %l1, <32 x float> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>
%a1 = fadd <8 x float> %s1, %s2		%a1 = fadd <8 x float> %s1, %s2
%a2 = fadd <8 x float> %s3, %s4		%a2 = fadd <8 x float> %s3, %s4
%a3 = fadd <8 x float> %a1, %a2		%a3 = fadd <8 x float> %a1, %a2
store <8 x float> %a3, <8 x float> *%dst		store <8 x float> %a3, <8 x float> *%dst
ret void		ret void
}		}

define void @vld4_v16f32(<64 x float> %src, <16 x float> %dst) {		define void @vld4_v16f32(<64 x float> %src, <16 x float> %dst) {
; CHECK-LABEL: vld4_v16f32:		; CHECK-LABEL: vld4_v16f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: .save {r4, r5}
		; CHECK-NEXT: push {r4, r5}
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #16		; CHECK-NEXT: .pad #152
; CHECK-NEXT: sub sp, #16		; CHECK-NEXT: sub sp, #152
; CHECK-NEXT: vldrw.u32 q0, [r0]		; CHECK-NEXT: add.w r2, r0, #128
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]		; CHECK-NEXT: add r3, sp, #64
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: vld40.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: add r4, sp, #64
; CHECK-NEXT: vmov.f32 s18, s15		; CHECK-NEXT: vld41.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f64 d10, d1		; CHECK-NEXT: vstmia r3, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vmov.f32 s19, s7		; CHECK-NEXT: add.w r3, r0, #64
; CHECK-NEXT: vmov.f32 s21, s10		; CHECK-NEXT: vld40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s16, s3		; CHECK-NEXT: vld41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s15, s6		; CHECK-NEXT: vld42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s22, s14		; CHECK-NEXT: vld43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s17, s11		; CHECK-NEXT: adds r0, #192
; CHECK-NEXT: vmov.f32 s23, s6		; CHECK-NEXT: vstrw.32 q1, [sp, #32] @ 16-byte Spill
; CHECK-NEXT: vadd.f32 q4, q5, q4		; CHECK-NEXT: vadd.f32 q4, q2, q3
; CHECK-NEXT: vmov.f32 s22, s13		; CHECK-NEXT: vmov q5, q0
; CHECK-NEXT: vmov.f32 s23, s5		; CHECK-NEXT: vldrw.u32 q0, [sp, #32] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s20, s1		; CHECK-NEXT: vstrw.32 q4, [sp, #48] @ 16-byte Spill
; CHECK-NEXT: vmov.f32 s2, s12		; CHECK-NEXT: vadd.f32 q4, q5, q0
; CHECK-NEXT: vldrw.u32 q3, [r0, #80]		; CHECK-NEXT: vldmia r4, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vmov.f32 s3, s4		; CHECK-NEXT: vstrw.32 q4, [sp, #32] @ 16-byte Spill
; CHECK-NEXT: vldrw.u32 q1, [r0, #64]		; CHECK-NEXT: add r4, sp, #64
; CHECK-NEXT: vmov.f32 s21, s9		; CHECK-NEXT: vld42.32 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s1, s8		; CHECK-NEXT: vstmia r4, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vldrw.u32 q2, [r0, #112]		; CHECK-NEXT: vld40.32 {q4, q5, q6, q7}, [r3]
; CHECK-NEXT: vadd.f32 q0, q0, q5		; CHECK-NEXT: vld41.32 {q4, q5, q6, q7}, [r3]
; CHECK-NEXT: vadd.f32 q0, q0, q4		; CHECK-NEXT: vld42.32 {q4, q5, q6, q7}, [r3]
; CHECK-NEXT: vldrw.u32 q4, [r0, #96]		; CHECK-NEXT: vld43.32 {q4, q5, q6, q7}, [r3]
; CHECK-NEXT: vstrw.32 q0, [sp] @ 16-byte Spill		; CHECK-NEXT: vldrw.u32 q0, [sp, #48] @ 16-byte Reload
; CHECK-NEXT: vmov.f64 d0, d3		; CHECK-NEXT: vldrw.u32 q1, [sp, #32] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s22, s19		; CHECK-NEXT: add r3, sp, #64
; CHECK-NEXT: vmov.f32 s19, s10		; CHECK-NEXT: vstrw.32 q6, [sp, #16] @ 16-byte Spill
; CHECK-NEXT: vmov.f32 s26, s17		; CHECK-NEXT: vadd.f32 q4, q4, q5
; CHECK-NEXT: vmov.f32 s23, s11		; CHECK-NEXT: vadd.f32 q0, q1, q0
; CHECK-NEXT: vmov.f32 s27, s9		; CHECK-NEXT: vstrw.32 q0, [sp, #48] @ 16-byte Spill
; CHECK-NEXT: vmov.f32 s20, s7		; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s24, s5
; CHECK-NEXT: vmov.f32 s1, s14
; CHECK-NEXT: vmov.f32 s6, s16
; CHECK-NEXT: vmov.f32 s2, s18
; CHECK-NEXT: vldrw.u32 q4, [r0, #144]
; CHECK-NEXT: vmov.f32 s7, s8
; CHECK-NEXT: vmov.f32 s3, s10
; CHECK-NEXT: vldrw.u32 q2, [r0, #128]
; CHECK-NEXT: vmov.f32 s21, s15
; CHECK-NEXT: vmov.f32 s25, s13
; CHECK-NEXT: vadd.f32 q5, q0, q5
; CHECK-NEXT: vmov.f32 s5, s12
; CHECK-NEXT: vldrw.u32 q3, [r0, #176]
; CHECK-NEXT: vadd.f32 q0, q1, q6
; CHECK-NEXT: vadd.f32 q1, q0, q5
; CHECK-NEXT: vldrw.u32 q5, [r0, #160]
; CHECK-NEXT: vmov.f64 d0, d5
; CHECK-NEXT: vmov.f32 s26, s23
; CHECK-NEXT: vmov.f32 s23, s14
; CHECK-NEXT: vmov.f32 s30, s21
; CHECK-NEXT: vmov.f32 s27, s15
; CHECK-NEXT: vmov.f32 s31, s13
; CHECK-NEXT: vmov.f32 s24, s11
; CHECK-NEXT: vmov.f32 s28, s9
; CHECK-NEXT: vmov.f32 s1, s18
; CHECK-NEXT: vmov.f32 s10, s20
; CHECK-NEXT: vmov.f32 s2, s22
; CHECK-NEXT: vmov.f32 s11, s12
; CHECK-NEXT: vmov.f32 s3, s14
; CHECK-NEXT: vldrw.u32 q3, [r0, #192]
; CHECK-NEXT: vmov.f32 s25, s19
; CHECK-NEXT: vmov.f32 s29, s17
; CHECK-NEXT: vadd.f32 q6, q0, q6
; CHECK-NEXT: vmov.f32 s9, s16
; CHECK-NEXT: vldrw.u32 q4, [r0, #240]
; CHECK-NEXT: vadd.f32 q0, q2, q7
; CHECK-NEXT: vmov.f64 d10, d7
; CHECK-NEXT: vadd.f32 q2, q0, q6
; CHECK-NEXT: vldrw.u32 q6, [r0, #224]
; CHECK-NEXT: vldrw.u32 q0, [r0, #208]
; CHECK-NEXT: vstrw.32 q2, [r1, #32]
; CHECK-NEXT: vstrw.32 q1, [r1, #16]
; CHECK-NEXT: vmov.f32 s30, s27
; CHECK-NEXT: vmov.f32 s31, s19
; CHECK-NEXT: vmov.f32 s21, s2
; CHECK-NEXT: vmov.f32 s28, s15
; CHECK-NEXT: vmov.f32 s27, s18
; CHECK-NEXT: vmov.f32 s22, s26
; CHECK-NEXT: vmov.f32 s29, s3
; CHECK-NEXT: vmov.f32 s23, s18
; CHECK-NEXT: vadd.f32 q7, q5, q7
; CHECK-NEXT: vmov.f32 s22, s25
; CHECK-NEXT: vmov.f32 s23, s17
; CHECK-NEXT: vmov.f32 s20, s13
; CHECK-NEXT: vmov.f32 s14, s24
; CHECK-NEXT: vmov.f32 s15, s16
; CHECK-NEXT: vmov.f32 s21, s1
; CHECK-NEXT: vmov.f32 s13, s0
; CHECK-NEXT: vadd.f32 q0, q3, q5
; CHECK-NEXT: vadd.f32 q0, q0, q7		; CHECK-NEXT: vadd.f32 q0, q0, q7
; CHECK-NEXT: vstrw.32 q0, [r1, #48]		; CHECK-NEXT: vadd.f32 q0, q4, q0
; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload		; CHECK-NEXT: vstrw.32 q0, [sp, #32] @ 16-byte Spill
		; CHECK-NEXT: vldmia r3, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
		; CHECK-NEXT: vld43.32 {q0, q1, q2, q3}, [r2]
		; CHECK-NEXT: add r2, sp, #64
		; CHECK-NEXT: vstmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
		; CHECK-NEXT: vld40.32 {q4, q5, q6, q7}, [r0]
		; CHECK-NEXT: vld41.32 {q4, q5, q6, q7}, [r0]
		; CHECK-NEXT: vld42.32 {q4, q5, q6, q7}, [r0]
		; CHECK-NEXT: vld43.32 {q4, q5, q6, q7}, [r0]
		; CHECK-NEXT: add r0, sp, #64
		; CHECK-NEXT: @ kill: def $q4 killed $q4 killed $q4_q5_q6_q7
		; CHECK-NEXT: vstrw.32 q7, [sp, #16] @ 16-byte Spill
		; CHECK-NEXT: vmov q2, q5
		; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
		; CHECK-NEXT: vadd.f32 q4, q4, q2
		; CHECK-NEXT: vadd.f32 q5, q6, q0
		; CHECK-NEXT: vldmia r0, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
		; CHECK-NEXT: vadd.f32 q4, q4, q5
		; CHECK-NEXT: vadd.f32 q5, q2, q3
		; CHECK-NEXT: vadd.f32 q0, q0, q1
		; CHECK-NEXT: vstrw.32 q4, [r1, #48]
		; CHECK-NEXT: vadd.f32 q0, q0, q5
		; CHECK-NEXT: vstrw.32 q0, [r1, #32]
		; CHECK-NEXT: vldrw.u32 q0, [sp, #32] @ 16-byte Reload
		; CHECK-NEXT: vstrw.32 q0, [r1, #16]
		; CHECK-NEXT: vldrw.u32 q0, [sp, #48] @ 16-byte Reload
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: add sp, #16		; CHECK-NEXT: add sp, #152
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
		; CHECK-NEXT: pop {r4, r5}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <64 x float>, <64 x float>* %src, align 4		%l1 = load <64 x float>, <64 x float>* %src, align 4
%s1 = shufflevector <64 x float> %l1, <64 x float> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>		%s1 = shufflevector <64 x float> %l1, <64 x float> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>
%s2 = shufflevector <64 x float> %l1, <64 x float> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>		%s2 = shufflevector <64 x float> %l1, <64 x float> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>
%s3 = shufflevector <64 x float> %l1, <64 x float> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>		%s3 = shufflevector <64 x float> %l1, <64 x float> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>
%s4 = shufflevector <64 x float> %l1, <64 x float> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>		%s4 = shufflevector <64 x float> %l1, <64 x float> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>
%a1 = fadd <16 x float> %s1, %s2		%a1 = fadd <16 x float> %s1, %s2
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	entry:
%a3 = fadd <4 x half> %a1, %a2		%a3 = fadd <4 x half> %a1, %a2
store <4 x half> %a3, <4 x half> *%dst		store <4 x half> %a3, <4 x half> *%dst
ret void		ret void
}		}

define void @vld4_v8f16(<32 x half> %src, <8 x half> %dst) {		define void @vld4_v8f16(<32 x half> %src, <8 x half> %dst) {
; CHECK-LABEL: vld4_v8f16:		; CHECK-LABEL: vld4_v8f16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12}		; CHECK-NEXT: .vsave {d8, d9}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12}		; CHECK-NEXT: vpush {d8, d9}
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vld40.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]		; CHECK-NEXT: vld41.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0, #32]		; CHECK-NEXT: vld42.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vldrw.u32 q3, [r0, #48]		; CHECK-NEXT: vld43.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov r2, s5		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
; CHECK-NEXT: vmovx.f16 s20, s5		; CHECK-NEXT: vadd.f16 q4, q2, q3
; CHECK-NEXT: vmov.16 q4[0], r2		; CHECK-NEXT: vadd.f16 q0, q0, q1
; CHECK-NEXT: vmov r3, s7
; CHECK-NEXT: vmov.16 q4[1], r3
; CHECK-NEXT: vmov r2, s1
; CHECK-NEXT: vmov.16 q4[2], r2
; CHECK-NEXT: vmov r2, s3
; CHECK-NEXT: vmov.16 q4[3], r2
; CHECK-NEXT: vmov r2, s9
; CHECK-NEXT: vmov.16 q4[4], r2
; CHECK-NEXT: vmov r2, s11
; CHECK-NEXT: vmov.16 q4[5], r2
; CHECK-NEXT: vmov r0, s13
; CHECK-NEXT: vmov.16 q4[6], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s20, s7
; CHECK-NEXT: vmovx.f16 s24, s1
; CHECK-NEXT: vmov r2, s20
; CHECK-NEXT: vmov.16 q5[0], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmov.16 q5[1], r2
; CHECK-NEXT: vmovx.f16 s24, s3
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s9
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s11
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s13
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s15
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s0
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vmov r0, s15
; CHECK-NEXT: vmov.16 q4[7], r0
; CHECK-NEXT: vadd.f16 q4, q4, q5
; CHECK-NEXT: vmovx.f16 s20, s4
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s20, s6
; CHECK-NEXT: vmov r2, s20
; CHECK-NEXT: vmov.16 q5[0], r0
; CHECK-NEXT: vmov.16 q5[1], r2
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s2
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s8
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s10
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s12
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s24, s14
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmov r2, s6
; CHECK-NEXT: vmov.16 q1[0], r0
; CHECK-NEXT: vmov.16 q1[1], r2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmov.16 q1[2], r0
; CHECK-NEXT: vmov r0, s2
; CHECK-NEXT: vmov.16 q1[3], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmov.16 q1[4], r0
; CHECK-NEXT: vmov r0, s10
; CHECK-NEXT: vmov.16 q1[5], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmov.16 q1[6], r0
; CHECK-NEXT: vmov r0, s14
; CHECK-NEXT: vmov.16 q1[7], r0
; CHECK-NEXT: vadd.f16 q0, q1, q5
; CHECK-NEXT: vadd.f16 q0, q0, q4		; CHECK-NEXT: vadd.f16 q0, q0, q4
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12}		; CHECK-NEXT: vpop {d8, d9}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <32 x half>, <32 x half>* %src, align 4		%l1 = load <32 x half>, <32 x half>* %src, align 4
%s1 = shufflevector <32 x half> %l1, <32 x half> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>		%s1 = shufflevector <32 x half> %l1, <32 x half> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>
%s2 = shufflevector <32 x half> %l1, <32 x half> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>		%s2 = shufflevector <32 x half> %l1, <32 x half> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>
%s3 = shufflevector <32 x half> %l1, <32 x half> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>		%s3 = shufflevector <32 x half> %l1, <32 x half> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>
%s4 = shufflevector <32 x half> %l1, <32 x half> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>		%s4 = shufflevector <32 x half> %l1, <32 x half> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>
%a1 = fadd <8 x half> %s1, %s2		%a1 = fadd <8 x half> %s1, %s2
%a2 = fadd <8 x half> %s3, %s4		%a2 = fadd <8 x half> %s3, %s4
%a3 = fadd <8 x half> %a1, %a2		%a3 = fadd <8 x half> %a1, %a2
store <8 x half> %a3, <8 x half> *%dst		store <8 x half> %a3, <8 x half> *%dst
ret void		ret void
}		}

define void @vld4_v16f16(<64 x half> %src, <16 x half> %dst) {		define void @vld4_v16f16(<64 x half> %src, <16 x half> %dst) {
; CHECK-LABEL: vld4_v16f16:		; CHECK-LABEL: vld4_v16f16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12}		; CHECK-NEXT: .vsave {d8, d9}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12}		; CHECK-NEXT: vpush {d8, d9}
; CHECK-NEXT: vldrw.u32 q1, [r0, #64]		; CHECK-NEXT: add.w r2, r0, #64
; CHECK-NEXT: vldrw.u32 q0, [r0, #80]		; CHECK-NEXT: vld40.16 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vldrw.u32 q2, [r0, #96]		; CHECK-NEXT: vld41.16 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vldrw.u32 q3, [r0, #112]		; CHECK-NEXT: vld42.16 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmov r2, s5		; CHECK-NEXT: vld43.16 {q0, q1, q2, q3}, [r2]
; CHECK-NEXT: vmovx.f16 s20, s5		; CHECK-NEXT: @ kill: def $q0 killed $q0 killed $q0_q1_q2_q3
; CHECK-NEXT: vmov.16 q4[0], r2		; CHECK-NEXT: vadd.f16 q4, q2, q3
; CHECK-NEXT: vmov r3, s7		; CHECK-NEXT: vadd.f16 q0, q0, q1
; CHECK-NEXT: vmov.16 q4[1], r3
; CHECK-NEXT: vmov r2, s1
; CHECK-NEXT: vmov.16 q4[2], r2
; CHECK-NEXT: vmov r2, s3
; CHECK-NEXT: vmov.16 q4[3], r2
; CHECK-NEXT: vmov r2, s9
; CHECK-NEXT: vmov.16 q4[4], r2
; CHECK-NEXT: vmov r2, s11
; CHECK-NEXT: vmov.16 q4[5], r2
; CHECK-NEXT: vmov r2, s13
; CHECK-NEXT: vmov.16 q4[6], r2
; CHECK-NEXT: vmov r2, s20
; CHECK-NEXT: vmovx.f16 s20, s7
; CHECK-NEXT: vmovx.f16 s24, s1
; CHECK-NEXT: vmov r3, s20
; CHECK-NEXT: vmov.16 q5[0], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmov.16 q5[1], r3
; CHECK-NEXT: vmovx.f16 s24, s3
; CHECK-NEXT: vmov.16 q5[2], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s9
; CHECK-NEXT: vmov.16 q5[3], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s11
; CHECK-NEXT: vmov.16 q5[4], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s13
; CHECK-NEXT: vmov.16 q5[5], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s15
; CHECK-NEXT: vmov.16 q5[6], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s0
; CHECK-NEXT: vmov.16 q5[7], r2
; CHECK-NEXT: vmov r2, s15
; CHECK-NEXT: vmov.16 q4[7], r2
; CHECK-NEXT: vadd.f16 q4, q4, q5
; CHECK-NEXT: vmovx.f16 s20, s4
; CHECK-NEXT: vmov r2, s20
; CHECK-NEXT: vmovx.f16 s20, s6
; CHECK-NEXT: vmov r3, s20
; CHECK-NEXT: vmov.16 q5[0], r2
; CHECK-NEXT: vmov.16 q5[1], r3
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s2
; CHECK-NEXT: vmov.16 q5[2], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s8
; CHECK-NEXT: vmov.16 q5[3], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s10
; CHECK-NEXT: vmov.16 q5[4], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s12
; CHECK-NEXT: vmov.16 q5[5], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmovx.f16 s24, s14
; CHECK-NEXT: vmov.16 q5[6], r2
; CHECK-NEXT: vmov r2, s24
; CHECK-NEXT: vmov.16 q5[7], r2
; CHECK-NEXT: vmov r2, s4
; CHECK-NEXT: vmov r3, s6
; CHECK-NEXT: vmov.16 q1[0], r2
; CHECK-NEXT: vmov.16 q1[1], r3
; CHECK-NEXT: vmov r2, s0
; CHECK-NEXT: vmov.16 q1[2], r2
; CHECK-NEXT: vmov r2, s2
; CHECK-NEXT: vmov.16 q1[3], r2
; CHECK-NEXT: vmov r2, s8
; CHECK-NEXT: vmov.16 q1[4], r2
; CHECK-NEXT: vmov r2, s10
; CHECK-NEXT: vmov.16 q1[5], r2
; CHECK-NEXT: vmov r2, s12
; CHECK-NEXT: vmov.16 q1[6], r2
; CHECK-NEXT: vmov r2, s14
; CHECK-NEXT: vmov.16 q1[7], r2
; CHECK-NEXT: vldrw.u32 q3, [r0]
; CHECK-NEXT: vadd.f16 q0, q1, q5
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]
; CHECK-NEXT: vadd.f16 q4, q0, q4
; CHECK-NEXT: vldrw.u32 q0, [r0, #48]
; CHECK-NEXT: vldrw.u32 q1, [r0, #32]
; CHECK-NEXT: vstrw.32 q4, [r1, #16]
; CHECK-NEXT: vmovx.f16 s16, s13
; CHECK-NEXT: vmovx.f16 s20, s9
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s15
; CHECK-NEXT: vmov r2, s16
; CHECK-NEXT: vmov.16 q4[0], r0
; CHECK-NEXT: vmov.16 q4[1], r2
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s20, s11
; CHECK-NEXT: vmov.16 q4[2], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s20, s5
; CHECK-NEXT: vmov.16 q4[3], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s20, s7
; CHECK-NEXT: vmov.16 q4[4], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s20, s1
; CHECK-NEXT: vmov.16 q4[5], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s20, s3
; CHECK-NEXT: vmov.16 q4[6], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s24, s12
; CHECK-NEXT: vmov.16 q4[7], r0
; CHECK-NEXT: vmov r0, s13
; CHECK-NEXT: vmov.16 q5[0], r0
; CHECK-NEXT: vmov r2, s15
; CHECK-NEXT: vmov.16 q5[1], r2
; CHECK-NEXT: vmov r0, s9
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov r0, s11
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov r0, s5
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov r0, s7
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov r0, s1
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov r0, s3
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vmov r2, s12
; CHECK-NEXT: vadd.f16 q4, q5, q4
; CHECK-NEXT: vmov r0, s14
; CHECK-NEXT: vmov.16 q5[0], r2
; CHECK-NEXT: vmovx.f16 s12, s14
; CHECK-NEXT: vmov.16 q5[1], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov r0, s10
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov r0, s6
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmov r2, s12
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmovx.f16 s24, s8
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r0, s24
; CHECK-NEXT: vmovx.f16 s8, s10
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmovx.f16 s8, s4
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmovx.f16 s4, s6
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmovx.f16 s4, s0
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmovx.f16 s4, s2
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vmov r0, s2
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vadd.f16 q0, q5, q3
; CHECK-NEXT: vadd.f16 q0, q0, q4		; CHECK-NEXT: vadd.f16 q0, q0, q4
		; CHECK-NEXT: vld40.16 {q1, q2, q3, q4}, [r0]
		; CHECK-NEXT: vld41.16 {q1, q2, q3, q4}, [r0]
		; CHECK-NEXT: vld42.16 {q1, q2, q3, q4}, [r0]
		; CHECK-NEXT: vld43.16 {q1, q2, q3, q4}, [r0]
		; CHECK-NEXT: vstrw.32 q0, [r1, #16]
		; CHECK-NEXT: @ kill: def $q1 killed $q1 killed $q1_q2_q3_q4
		; CHECK-NEXT: vadd.f16 q0, q3, q4
		; CHECK-NEXT: vadd.f16 q1, q1, q2
		; CHECK-NEXT: vadd.f16 q0, q1, q0
; CHECK-NEXT: vstrw.32 q0, [r1]		; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12}		; CHECK-NEXT: vpop {d8, d9}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%l1 = load <64 x half>, <64 x half>* %src, align 4		%l1 = load <64 x half>, <64 x half>* %src, align 4
%s1 = shufflevector <64 x half> %l1, <64 x half> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>		%s1 = shufflevector <64 x half> %l1, <64 x half> undef, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28, i32 32, i32 36, i32 40, i32 44, i32 48, i32 52, i32 56, i32 60>
%s2 = shufflevector <64 x half> %l1, <64 x half> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>		%s2 = shufflevector <64 x half> %l1, <64 x half> undef, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 61>
%s3 = shufflevector <64 x half> %l1, <64 x half> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>		%s3 = shufflevector <64 x half> %l1, <64 x half> undef, <16 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30, i32 34, i32 38, i32 42, i32 46, i32 50, i32 54, i32 58, i32 62>
%s4 = shufflevector <64 x half> %l1, <64 x half> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>		%s4 = shufflevector <64 x half> %l1, <64 x half> undef, <16 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31, i32 35, i32 39, i32 43, i32 47, i32 51, i32 55, i32 59, i32 63>
%a1 = fadd <16 x half> %s1, %s2		%a1 = fadd <16 x half> %s1, %s2
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vst2.ll

Show All 30 Lines	entry:
%s = shufflevector <2 x i32> %l1, <2 x i32> %l2, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%s = shufflevector <2 x i32> %l1, <2 x i32> %l2, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
store <4 x i32> %s, <4 x i32> *%dst		store <4 x i32> %s, <4 x i32> *%dst
ret void		ret void
}		}

define void @vst2_v4i32(<4 x i32> %src, <8 x i32> %dst) {		define void @vst2_v4i32(<4 x i32> %src, <8 x i32> %dst) {
; CHECK-LABEL: vst2_v4i32:		; CHECK-LABEL: vst2_v4i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vmov.f64 d4, d3		; CHECK-NEXT: vst20.32 {q0, q1}, [r1]
; CHECK-NEXT: vmov.f64 d6, d2		; CHECK-NEXT: vst21.32 {q0, q1}, [r1]
; CHECK-NEXT: vmov.f32 s9, s2
; CHECK-NEXT: vmov.f32 s13, s0
; CHECK-NEXT: vmov.f32 s10, s7
; CHECK-NEXT: vmov.f32 s14, s5
; CHECK-NEXT: vmov.f32 s11, s3
; CHECK-NEXT: vmov.f32 s15, s1
; CHECK-NEXT: vstrw.32 q2, [r1, #16]
; CHECK-NEXT: vstrw.32 q3, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <4 x i32>, <4 x i32>* %src, i32 0		%s1 = getelementptr <4 x i32>, <4 x i32>* %src, i32 0
%l1 = load <4 x i32>, <4 x i32>* %s1, align 4		%l1 = load <4 x i32>, <4 x i32>* %s1, align 4
%s2 = getelementptr <4 x i32>, <4 x i32>* %src, i32 1		%s2 = getelementptr <4 x i32>, <4 x i32>* %src, i32 1
%l2 = load <4 x i32>, <4 x i32>* %s2, align 4		%l2 = load <4 x i32>, <4 x i32>* %s2, align 4
%s = shufflevector <4 x i32> %l1, <4 x i32> %l2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>		%s = shufflevector <4 x i32> %l1, <4 x i32> %l2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %s, <8 x i32> *%dst		store <8 x i32> %s, <8 x i32> *%dst
ret void		ret void
}		}

define void @vst2_v8i32(<8 x i32> %src, <16 x i32> %dst) {		define void @vst2_v8i32(<8 x i32> %src, <16 x i32> %dst) {
; CHECK-LABEL: vst2_v8i32:		; CHECK-LABEL: vst2_v8i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9}		; CHECK-NEXT: vldrw.u32 q1, [r0, #32]
; CHECK-NEXT: vpush {d8, d9}		; CHECK-NEXT: vldrw.u32 q3, [r0, #48]
; CHECK-NEXT: vldrw.u32 q4, [r0]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vldrw.u32 q0, [r0, #32]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vldrw.u32 q2, [r0, #16]
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: add.w r0, r1, #32
; CHECK-NEXT: vmov.f64 d6, d8		; CHECK-NEXT: vst20.32 {q0, q1}, [r1]
; CHECK-NEXT: vmov.f32 s13, s0		; CHECK-NEXT: vst21.32 {q0, q1}, [r1]
; CHECK-NEXT: vmov.f32 s14, s17		; CHECK-NEXT: vst20.32 {q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s15, s1		; CHECK-NEXT: vst21.32 {q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s0, s18
; CHECK-NEXT: vstrw.32 q3, [r1]
; CHECK-NEXT: vmov.f32 s1, s2
; CHECK-NEXT: vmov.f32 s2, s19
; CHECK-NEXT: vmov.f64 d8, d4
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.f32 s17, s4
; CHECK-NEXT: vmov.f32 s18, s9
; CHECK-NEXT: vmov.f32 s19, s5
; CHECK-NEXT: vmov.f32 s4, s10
; CHECK-NEXT: vstrw.32 q4, [r1, #32]
; CHECK-NEXT: vmov.f32 s5, s6
; CHECK-NEXT: vmov.f32 s6, s11
; CHECK-NEXT: vstrw.32 q1, [r1, #48]
; CHECK-NEXT: vpop {d8, d9}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <8 x i32>, <8 x i32>* %src, i32 0		%s1 = getelementptr <8 x i32>, <8 x i32>* %src, i32 0
%l1 = load <8 x i32>, <8 x i32>* %s1, align 4		%l1 = load <8 x i32>, <8 x i32>* %s1, align 4
%s2 = getelementptr <8 x i32>, <8 x i32>* %src, i32 1		%s2 = getelementptr <8 x i32>, <8 x i32>* %src, i32 1
%l2 = load <8 x i32>, <8 x i32>* %s2, align 4		%l2 = load <8 x i32>, <8 x i32>* %s2, align 4
%s = shufflevector <8 x i32> %l1, <8 x i32> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%s = shufflevector <8 x i32> %l1, <8 x i32> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
store <16 x i32> %s, <16 x i32> *%dst		store <16 x i32> %s, <16 x i32> *%dst
ret void		ret void
}		}

define void @vst2_v16i32(<16 x i32> %src, <32 x i32> %dst) {		define void @vst2_v16i32(<16 x i32> %src, <32 x i32> %dst) {
; CHECK-LABEL: vst2_v16i32:		; CHECK-LABEL: vst2_v16i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #32		; CHECK-NEXT: vldrw.u32 q1, [r0, #112]
; CHECK-NEXT: sub sp, #32		; CHECK-NEXT: vldrw.u32 q3, [r0, #96]
; CHECK-NEXT: vldrw.u32 q7, [r0, #48]		; CHECK-NEXT: vldrw.u32 q5, [r0, #80]
; CHECK-NEXT: vldrw.u32 q0, [r0]		; CHECK-NEXT: vldrw.u32 q7, [r0, #64]
; CHECK-NEXT: vldrw.u32 q4, [r0, #64]		; CHECK-NEXT: vldrw.u32 q0, [r0, #48]
; CHECK-NEXT: vldrw.u32 q5, [r0, #16]		; CHECK-NEXT: vldrw.u32 q6, [r0]
; CHECK-NEXT: vstrw.32 q7, [sp] @ 16-byte Spill		; CHECK-NEXT: vldrw.u32 q2, [r0, #32]
; CHECK-NEXT: vmov.f64 d14, d0		; CHECK-NEXT: vldrw.u32 q4, [r0, #16]
; CHECK-NEXT: vldrw.u32 q1, [r0, #80]		; CHECK-NEXT: add.w r0, r1, #96
; CHECK-NEXT: vldrw.u32 q6, [r0, #32]		; CHECK-NEXT: add.w r2, r1, #64
; CHECK-NEXT: vldrw.u32 q2, [r0, #96]		; CHECK-NEXT: add.w r3, r1, #32
; CHECK-NEXT: vldrw.u32 q3, [r0, #112]		; CHECK-NEXT: vst20.32 {q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s29, s16		; CHECK-NEXT: vst21.32 {q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s30, s1		; CHECK-NEXT: vst20.32 {q4, q5}, [r3]
; CHECK-NEXT: vmov.f32 s31, s17		; CHECK-NEXT: vst21.32 {q4, q5}, [r3]
; CHECK-NEXT: vstrw.32 q7, [sp, #16] @ 16-byte Spill		; CHECK-NEXT: vst20.32 {q2, q3}, [r2]
; CHECK-NEXT: vmov.f64 d14, d10		; CHECK-NEXT: vst21.32 {q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s29, s4		; CHECK-NEXT: vst20.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s30, s21		; CHECK-NEXT: vst21.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s31, s5
; CHECK-NEXT: vmov.f32 s4, s22
; CHECK-NEXT: vstrw.32 q7, [r1, #32]
; CHECK-NEXT: vmov.f32 s5, s6
; CHECK-NEXT: vmov.f32 s6, s23
; CHECK-NEXT: vmov.f64 d10, d12
; CHECK-NEXT: vstrw.32 q1, [r1, #48]
; CHECK-NEXT: vmov.f32 s16, s2
; CHECK-NEXT: vmov.f32 s21, s8
; CHECK-NEXT: vmov.f32 s17, s18
; CHECK-NEXT: vmov.f32 s22, s25
; CHECK-NEXT: vmov.f32 s18, s3
; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s23, s9
; CHECK-NEXT: vstrw.32 q4, [r1, #16]
; CHECK-NEXT: vmov.f32 s8, s26
; CHECK-NEXT: vstrw.32 q5, [r1, #64]
; CHECK-NEXT: vmov.f32 s9, s10
; CHECK-NEXT: vmov.f32 s10, s27
; CHECK-NEXT: vmov.f64 d12, d0
; CHECK-NEXT: vstrw.32 q2, [r1, #80]
; CHECK-NEXT: vmov.f32 s25, s12
; CHECK-NEXT: vmov.f32 s26, s1
; CHECK-NEXT: vmov.f32 s27, s13
; CHECK-NEXT: vmov.f32 s12, s2
; CHECK-NEXT: vstrw.32 q6, [r1, #96]
; CHECK-NEXT: vmov.f32 s13, s14
; CHECK-NEXT: vmov.f32 s14, s3
; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
; CHECK-NEXT: vstrw.32 q3, [r1, #112]
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: add sp, #32
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x i32>, <16 x i32>* %src, i32 0		%s1 = getelementptr <16 x i32>, <16 x i32>* %src, i32 0
%l1 = load <16 x i32>, <16 x i32>* %s1, align 4		%l1 = load <16 x i32>, <16 x i32>* %s1, align 4
%s2 = getelementptr <16 x i32>, <16 x i32>* %src, i32 1		%s2 = getelementptr <16 x i32>, <16 x i32>* %src, i32 1
%l2 = load <16 x i32>, <16 x i32>* %s2, align 4		%l2 = load <16 x i32>, <16 x i32>* %s2, align 4
%s = shufflevector <16 x i32> %l1, <16 x i32> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>		%s = shufflevector <16 x i32> %l1, <16 x i32> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	entry:
%s = shufflevector <4 x i16> %l1, <4 x i16> %l2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>		%s = shufflevector <4 x i16> %l1, <4 x i16> %l2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i16> %s, <8 x i16> *%dst		store <8 x i16> %s, <8 x i16> *%dst
ret void		ret void
}		}

define void @vst2_v8i16(<8 x i16> %src, <16 x i16> %dst) {		define void @vst2_v8i16(<8 x i16> %src, <16 x i16> %dst) {
; CHECK-LABEL: vst2_v8i16:		; CHECK-LABEL: vst2_v8i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vmov.u16 r2, q1[4]		; CHECK-NEXT: vst20.16 {q0, q1}, [r1]
; CHECK-NEXT: vmov.u16 r0, q2[4]		; CHECK-NEXT: vst21.16 {q0, q1}, [r1]
; CHECK-NEXT: vmov.16 q0[0], r2
; CHECK-NEXT: vmov.16 q0[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[5]
; CHECK-NEXT: vmov.16 q0[2], r0
; CHECK-NEXT: vmov.u16 r0, q2[5]
; CHECK-NEXT: vmov.16 q0[3], r0
; CHECK-NEXT: vmov.u16 r0, q1[6]
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[6]
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vmov.u16 r0, q1[7]
; CHECK-NEXT: vmov.16 q0[6], r0
; CHECK-NEXT: vmov.u16 r0, q2[7]
; CHECK-NEXT: vmov.16 q0[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[0]
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov.u16 r0, q2[0]
; CHECK-NEXT: vmov.16 q3[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[1]
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov.u16 r0, q2[1]
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov.u16 r0, q1[2]
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[2]
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov.u16 r0, q1[3]
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmov.u16 r0, q2[3]
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vstrw.32 q3, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <8 x i16>, <8 x i16>* %src, i32 0		%s1 = getelementptr <8 x i16>, <8 x i16>* %src, i32 0
%l1 = load <8 x i16>, <8 x i16>* %s1, align 4		%l1 = load <8 x i16>, <8 x i16>* %s1, align 4
%s2 = getelementptr <8 x i16>, <8 x i16>* %src, i32 1		%s2 = getelementptr <8 x i16>, <8 x i16>* %src, i32 1
%l2 = load <8 x i16>, <8 x i16>* %s2, align 4		%l2 = load <8 x i16>, <8 x i16>* %s2, align 4
%s = shufflevector <8 x i16> %l1, <8 x i16> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%s = shufflevector <8 x i16> %l1, <8 x i16> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
store <16 x i16> %s, <16 x i16> *%dst		store <16 x i16> %s, <16 x i16> *%dst
ret void		ret void
}		}

define void @vst2_v16i16(<16 x i16> %src, <32 x i16> %dst) {		define void @vst2_v16i16(<16 x i16> %src, <32 x i16> %dst) {
; CHECK-LABEL: vst2_v16i16:		; CHECK-LABEL: vst2_v16i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11}		; CHECK-NEXT: vldrw.u32 q1, [r0, #32]
; CHECK-NEXT: vpush {d8, d9, d10, d11}		; CHECK-NEXT: vldrw.u32 q3, [r0, #48]
; CHECK-NEXT: vldrw.u32 q2, [r0]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]		; CHECK-NEXT: vldrw.u32 q2, [r0, #16]
; CHECK-NEXT: vldrw.u32 q4, [r0, #48]		; CHECK-NEXT: add.w r0, r1, #32
; CHECK-NEXT: vmov.u16 r2, q2[0]		; CHECK-NEXT: vst20.16 {q0, q1}, [r1]
; CHECK-NEXT: vmov.16 q0[0], r2		; CHECK-NEXT: vst21.16 {q0, q1}, [r1]
; CHECK-NEXT: vmov.u16 r2, q3[0]		; CHECK-NEXT: vst20.16 {q2, q3}, [r0]
; CHECK-NEXT: vmov.16 q0[1], r2		; CHECK-NEXT: vst21.16 {q2, q3}, [r0]
; CHECK-NEXT: vmov.u16 r2, q2[1]
; CHECK-NEXT: vmov.16 q0[2], r2
; CHECK-NEXT: vmov.u16 r2, q3[1]
; CHECK-NEXT: vmov.16 q0[3], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q0[4], r2
; CHECK-NEXT: vmov.u16 r2, q3[2]
; CHECK-NEXT: vmov.16 q0[5], r2
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vmov.16 q0[6], r2
; CHECK-NEXT: vmov.u16 r2, q3[3]
; CHECK-NEXT: vmov.16 q0[7], r2
; CHECK-NEXT: vmov.u16 r2, q2[4]
; CHECK-NEXT: vmov.16 q1[0], r2
; CHECK-NEXT: vmov.u16 r2, q3[4]
; CHECK-NEXT: vmov.16 q1[1], r2
; CHECK-NEXT: vmov.u16 r2, q2[5]
; CHECK-NEXT: vmov.16 q1[2], r2
; CHECK-NEXT: vmov.u16 r2, q3[5]
; CHECK-NEXT: vmov.16 q1[3], r2
; CHECK-NEXT: vmov.u16 r2, q2[6]
; CHECK-NEXT: vmov.16 q1[4], r2
; CHECK-NEXT: vmov.u16 r2, q3[6]
; CHECK-NEXT: vmov.16 q1[5], r2
; CHECK-NEXT: vmov.u16 r2, q2[7]
; CHECK-NEXT: vmov.16 q1[6], r2
; CHECK-NEXT: vmov.u16 r2, q3[7]
; CHECK-NEXT: vldrw.u32 q3, [r0, #16]
; CHECK-NEXT: vmov.16 q1[7], r2
; CHECK-NEXT: vmov.u16 r0, q4[0]
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vmov.u16 r2, q3[0]
; CHECK-NEXT: vstrw.32 q1, [r1, #16]
; CHECK-NEXT: vmov.16 q2[0], r2
; CHECK-NEXT: vmov.16 q2[1], r0
; CHECK-NEXT: vmov.u16 r0, q3[1]
; CHECK-NEXT: vmov.16 q2[2], r0
; CHECK-NEXT: vmov.u16 r0, q4[1]
; CHECK-NEXT: vmov.16 q2[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[2]
; CHECK-NEXT: vmov.16 q2[4], r0
; CHECK-NEXT: vmov.u16 r0, q4[2]
; CHECK-NEXT: vmov.16 q2[5], r0
; CHECK-NEXT: vmov.u16 r0, q3[3]
; CHECK-NEXT: vmov.16 q2[6], r0
; CHECK-NEXT: vmov.u16 r0, q4[3]
; CHECK-NEXT: vmov.16 q2[7], r0
; CHECK-NEXT: vmov.u16 r0, q3[4]
; CHECK-NEXT: vmov.16 q5[0], r0
; CHECK-NEXT: vmov.u16 r0, q4[4]
; CHECK-NEXT: vmov.16 q5[1], r0
; CHECK-NEXT: vmov.u16 r0, q3[5]
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov.u16 r0, q4[5]
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[6]
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov.u16 r0, q4[6]
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov.u16 r0, q3[7]
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov.u16 r0, q4[7]
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vstrw.32 q2, [r1, #32]
; CHECK-NEXT: vstrw.32 q5, [r1, #48]
; CHECK-NEXT: vpop {d8, d9, d10, d11}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x i16>, <16 x i16>* %src, i32 0		%s1 = getelementptr <16 x i16>, <16 x i16>* %src, i32 0
%l1 = load <16 x i16>, <16 x i16>* %s1, align 4		%l1 = load <16 x i16>, <16 x i16>* %s1, align 4
%s2 = getelementptr <16 x i16>, <16 x i16>* %src, i32 1		%s2 = getelementptr <16 x i16>, <16 x i16>* %src, i32 1
%l2 = load <16 x i16>, <16 x i16>* %s2, align 4		%l2 = load <16 x i16>, <16 x i16>* %s2, align 4
%s = shufflevector <16 x i16> %l1, <16 x i16> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>		%s = shufflevector <16 x i16> %l1, <16 x i16> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
store <32 x i16> %s, <32 x i16> *%dst		store <32 x i16> %s, <32 x i16> *%dst
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	entry:
%s = shufflevector <8 x i8> %l1, <8 x i8> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%s = shufflevector <8 x i8> %l1, <8 x i8> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
store <16 x i8> %s, <16 x i8> *%dst		store <16 x i8> %s, <16 x i8> *%dst
ret void		ret void
}		}

define void @vst2_v16i8(<16 x i8> %src, <32 x i8> %dst) {		define void @vst2_v16i8(<16 x i8> %src, <32 x i8> %dst) {
; CHECK-LABEL: vst2_v16i8:		; CHECK-LABEL: vst2_v16i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vmov.u8 r2, q1[8]		; CHECK-NEXT: vst20.8 {q0, q1}, [r1]
; CHECK-NEXT: vmov.u8 r0, q2[8]		; CHECK-NEXT: vst21.8 {q0, q1}, [r1]
; CHECK-NEXT: vmov.8 q0[0], r2
; CHECK-NEXT: vmov.8 q0[1], r0
; CHECK-NEXT: vmov.u8 r0, q1[9]
; CHECK-NEXT: vmov.8 q0[2], r0
; CHECK-NEXT: vmov.u8 r0, q2[9]
; CHECK-NEXT: vmov.8 q0[3], r0
; CHECK-NEXT: vmov.u8 r0, q1[10]
; CHECK-NEXT: vmov.8 q0[4], r0
; CHECK-NEXT: vmov.u8 r0, q2[10]
; CHECK-NEXT: vmov.8 q0[5], r0
; CHECK-NEXT: vmov.u8 r0, q1[11]
; CHECK-NEXT: vmov.8 q0[6], r0
; CHECK-NEXT: vmov.u8 r0, q2[11]
; CHECK-NEXT: vmov.8 q0[7], r0
; CHECK-NEXT: vmov.u8 r0, q1[12]
; CHECK-NEXT: vmov.8 q0[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[12]
; CHECK-NEXT: vmov.8 q0[9], r0
; CHECK-NEXT: vmov.u8 r0, q1[13]
; CHECK-NEXT: vmov.8 q0[10], r0
; CHECK-NEXT: vmov.u8 r0, q2[13]
; CHECK-NEXT: vmov.8 q0[11], r0
; CHECK-NEXT: vmov.u8 r0, q1[14]
; CHECK-NEXT: vmov.8 q0[12], r0
; CHECK-NEXT: vmov.u8 r0, q2[14]
; CHECK-NEXT: vmov.8 q0[13], r0
; CHECK-NEXT: vmov.u8 r0, q1[15]
; CHECK-NEXT: vmov.8 q0[14], r0
; CHECK-NEXT: vmov.u8 r0, q2[15]
; CHECK-NEXT: vmov.8 q0[15], r0
; CHECK-NEXT: vmov.u8 r0, q1[0]
; CHECK-NEXT: vmov.8 q3[0], r0
; CHECK-NEXT: vmov.u8 r0, q2[0]
; CHECK-NEXT: vmov.8 q3[1], r0
; CHECK-NEXT: vmov.u8 r0, q1[1]
; CHECK-NEXT: vmov.8 q3[2], r0
; CHECK-NEXT: vmov.u8 r0, q2[1]
; CHECK-NEXT: vmov.8 q3[3], r0
; CHECK-NEXT: vmov.u8 r0, q1[2]
; CHECK-NEXT: vmov.8 q3[4], r0
; CHECK-NEXT: vmov.u8 r0, q2[2]
; CHECK-NEXT: vmov.8 q3[5], r0
; CHECK-NEXT: vmov.u8 r0, q1[3]
; CHECK-NEXT: vmov.8 q3[6], r0
; CHECK-NEXT: vmov.u8 r0, q2[3]
; CHECK-NEXT: vmov.8 q3[7], r0
; CHECK-NEXT: vmov.u8 r0, q1[4]
; CHECK-NEXT: vmov.8 q3[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[4]
; CHECK-NEXT: vmov.8 q3[9], r0
; CHECK-NEXT: vmov.u8 r0, q1[5]
; CHECK-NEXT: vmov.8 q3[10], r0
; CHECK-NEXT: vmov.u8 r0, q2[5]
; CHECK-NEXT: vmov.8 q3[11], r0
; CHECK-NEXT: vmov.u8 r0, q1[6]
; CHECK-NEXT: vmov.8 q3[12], r0
; CHECK-NEXT: vmov.u8 r0, q2[6]
; CHECK-NEXT: vmov.8 q3[13], r0
; CHECK-NEXT: vmov.u8 r0, q1[7]
; CHECK-NEXT: vmov.8 q3[14], r0
; CHECK-NEXT: vmov.u8 r0, q2[7]
; CHECK-NEXT: vmov.8 q3[15], r0
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vstrw.32 q3, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x i8>, <16 x i8>* %src, i32 0		%s1 = getelementptr <16 x i8>, <16 x i8>* %src, i32 0
%l1 = load <16 x i8>, <16 x i8>* %s1, align 4		%l1 = load <16 x i8>, <16 x i8>* %s1, align 4
%s2 = getelementptr <16 x i8>, <16 x i8>* %src, i32 1		%s2 = getelementptr <16 x i8>, <16 x i8>* %src, i32 1
%l2 = load <16 x i8>, <16 x i8>* %s2, align 4		%l2 = load <16 x i8>, <16 x i8>* %s2, align 4
%s = shufflevector <16 x i8> %l1, <16 x i8> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>		%s = shufflevector <16 x i8> %l1, <16 x i8> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
store <32 x i8> %s, <32 x i8> *%dst		store <32 x i8> %s, <32 x i8> *%dst
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	entry:
%s = shufflevector <2 x float> %l1, <2 x float> %l2, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%s = shufflevector <2 x float> %l1, <2 x float> %l2, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
store <4 x float> %s, <4 x float> *%dst		store <4 x float> %s, <4 x float> *%dst
ret void		ret void
}		}

define void @vst2_v4f32(<4 x float> %src, <8 x float> %dst) {		define void @vst2_v4f32(<4 x float> %src, <8 x float> %dst) {
; CHECK-LABEL: vst2_v4f32:		; CHECK-LABEL: vst2_v4f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vmov.f64 d4, d3		; CHECK-NEXT: vst20.32 {q0, q1}, [r1]
; CHECK-NEXT: vmov.f64 d6, d2		; CHECK-NEXT: vst21.32 {q0, q1}, [r1]
; CHECK-NEXT: vmov.f32 s9, s2
; CHECK-NEXT: vmov.f32 s13, s0
; CHECK-NEXT: vmov.f32 s10, s7
; CHECK-NEXT: vmov.f32 s14, s5
; CHECK-NEXT: vmov.f32 s11, s3
; CHECK-NEXT: vmov.f32 s15, s1
; CHECK-NEXT: vstrw.32 q2, [r1, #16]
; CHECK-NEXT: vstrw.32 q3, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <4 x float>, <4 x float>* %src, i32 0		%s1 = getelementptr <4 x float>, <4 x float>* %src, i32 0
%l1 = load <4 x float>, <4 x float>* %s1, align 4		%l1 = load <4 x float>, <4 x float>* %s1, align 4
%s2 = getelementptr <4 x float>, <4 x float>* %src, i32 1		%s2 = getelementptr <4 x float>, <4 x float>* %src, i32 1
%l2 = load <4 x float>, <4 x float>* %s2, align 4		%l2 = load <4 x float>, <4 x float>* %s2, align 4
%s = shufflevector <4 x float> %l1, <4 x float> %l2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>		%s = shufflevector <4 x float> %l1, <4 x float> %l2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x float> %s, <8 x float> *%dst		store <8 x float> %s, <8 x float> *%dst
ret void		ret void
}		}

define void @vst2_v8f32(<8 x float> %src, <16 x float> %dst) {		define void @vst2_v8f32(<8 x float> %src, <16 x float> %dst) {
; CHECK-LABEL: vst2_v8f32:		; CHECK-LABEL: vst2_v8f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9}		; CHECK-NEXT: vldrw.u32 q1, [r0, #32]
; CHECK-NEXT: vpush {d8, d9}		; CHECK-NEXT: vldrw.u32 q3, [r0, #48]
; CHECK-NEXT: vldrw.u32 q4, [r0]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vldrw.u32 q0, [r0, #32]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vldrw.u32 q2, [r0, #16]
; CHECK-NEXT: vldrw.u32 q1, [r0, #48]		; CHECK-NEXT: add.w r0, r1, #32
; CHECK-NEXT: vmov.f64 d6, d8		; CHECK-NEXT: vst20.32 {q0, q1}, [r1]
; CHECK-NEXT: vmov.f32 s13, s0		; CHECK-NEXT: vst21.32 {q0, q1}, [r1]
; CHECK-NEXT: vmov.f32 s14, s17		; CHECK-NEXT: vst20.32 {q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s15, s1		; CHECK-NEXT: vst21.32 {q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s0, s18
; CHECK-NEXT: vstrw.32 q3, [r1]
; CHECK-NEXT: vmov.f32 s1, s2
; CHECK-NEXT: vmov.f32 s2, s19
; CHECK-NEXT: vmov.f64 d8, d4
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.f32 s17, s4
; CHECK-NEXT: vmov.f32 s18, s9
; CHECK-NEXT: vmov.f32 s19, s5
; CHECK-NEXT: vmov.f32 s4, s10
; CHECK-NEXT: vstrw.32 q4, [r1, #32]
; CHECK-NEXT: vmov.f32 s5, s6
; CHECK-NEXT: vmov.f32 s6, s11
; CHECK-NEXT: vstrw.32 q1, [r1, #48]
; CHECK-NEXT: vpop {d8, d9}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <8 x float>, <8 x float>* %src, i32 0		%s1 = getelementptr <8 x float>, <8 x float>* %src, i32 0
%l1 = load <8 x float>, <8 x float>* %s1, align 4		%l1 = load <8 x float>, <8 x float>* %s1, align 4
%s2 = getelementptr <8 x float>, <8 x float>* %src, i32 1		%s2 = getelementptr <8 x float>, <8 x float>* %src, i32 1
%l2 = load <8 x float>, <8 x float>* %s2, align 4		%l2 = load <8 x float>, <8 x float>* %s2, align 4
%s = shufflevector <8 x float> %l1, <8 x float> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%s = shufflevector <8 x float> %l1, <8 x float> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
store <16 x float> %s, <16 x float> *%dst		store <16 x float> %s, <16 x float> *%dst
ret void		ret void
}		}

define void @vst2_v16f32(<16 x float> %src, <32 x float> %dst) {		define void @vst2_v16f32(<16 x float> %src, <32 x float> %dst) {
; CHECK-LABEL: vst2_v16f32:		; CHECK-LABEL: vst2_v16f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #32		; CHECK-NEXT: vldrw.u32 q1, [r0, #112]
; CHECK-NEXT: sub sp, #32		; CHECK-NEXT: vldrw.u32 q3, [r0, #96]
; CHECK-NEXT: vldrw.u32 q7, [r0, #48]		; CHECK-NEXT: vldrw.u32 q5, [r0, #80]
; CHECK-NEXT: vldrw.u32 q0, [r0]		; CHECK-NEXT: vldrw.u32 q7, [r0, #64]
; CHECK-NEXT: vldrw.u32 q4, [r0, #64]		; CHECK-NEXT: vldrw.u32 q0, [r0, #48]
; CHECK-NEXT: vldrw.u32 q5, [r0, #16]		; CHECK-NEXT: vldrw.u32 q6, [r0]
; CHECK-NEXT: vstrw.32 q7, [sp] @ 16-byte Spill		; CHECK-NEXT: vldrw.u32 q2, [r0, #32]
; CHECK-NEXT: vmov.f64 d14, d0		; CHECK-NEXT: vldrw.u32 q4, [r0, #16]
; CHECK-NEXT: vldrw.u32 q1, [r0, #80]		; CHECK-NEXT: add.w r0, r1, #96
; CHECK-NEXT: vldrw.u32 q6, [r0, #32]		; CHECK-NEXT: add.w r2, r1, #64
; CHECK-NEXT: vldrw.u32 q2, [r0, #96]		; CHECK-NEXT: add.w r3, r1, #32
; CHECK-NEXT: vldrw.u32 q3, [r0, #112]		; CHECK-NEXT: vst20.32 {q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s29, s16		; CHECK-NEXT: vst21.32 {q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s30, s1		; CHECK-NEXT: vst20.32 {q4, q5}, [r3]
; CHECK-NEXT: vmov.f32 s31, s17		; CHECK-NEXT: vst21.32 {q4, q5}, [r3]
; CHECK-NEXT: vstrw.32 q7, [sp, #16] @ 16-byte Spill		; CHECK-NEXT: vst20.32 {q2, q3}, [r2]
; CHECK-NEXT: vmov.f64 d14, d10		; CHECK-NEXT: vst21.32 {q2, q3}, [r2]
; CHECK-NEXT: vmov.f32 s29, s4		; CHECK-NEXT: vst20.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s30, s21		; CHECK-NEXT: vst21.32 {q0, q1}, [r0]
; CHECK-NEXT: vmov.f32 s31, s5
; CHECK-NEXT: vmov.f32 s4, s22
; CHECK-NEXT: vstrw.32 q7, [r1, #32]
; CHECK-NEXT: vmov.f32 s5, s6
; CHECK-NEXT: vmov.f32 s6, s23
; CHECK-NEXT: vmov.f64 d10, d12
; CHECK-NEXT: vstrw.32 q1, [r1, #48]
; CHECK-NEXT: vmov.f32 s16, s2
; CHECK-NEXT: vmov.f32 s21, s8
; CHECK-NEXT: vmov.f32 s17, s18
; CHECK-NEXT: vmov.f32 s22, s25
; CHECK-NEXT: vmov.f32 s18, s3
; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s23, s9
; CHECK-NEXT: vstrw.32 q4, [r1, #16]
; CHECK-NEXT: vmov.f32 s8, s26
; CHECK-NEXT: vstrw.32 q5, [r1, #64]
; CHECK-NEXT: vmov.f32 s9, s10
; CHECK-NEXT: vmov.f32 s10, s27
; CHECK-NEXT: vmov.f64 d12, d0
; CHECK-NEXT: vstrw.32 q2, [r1, #80]
; CHECK-NEXT: vmov.f32 s25, s12
; CHECK-NEXT: vmov.f32 s26, s1
; CHECK-NEXT: vmov.f32 s27, s13
; CHECK-NEXT: vmov.f32 s12, s2
; CHECK-NEXT: vstrw.32 q6, [r1, #96]
; CHECK-NEXT: vmov.f32 s13, s14
; CHECK-NEXT: vmov.f32 s14, s3
; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
; CHECK-NEXT: vstrw.32 q3, [r1, #112]
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: add sp, #32
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x float>, <16 x float>* %src, i32 0		%s1 = getelementptr <16 x float>, <16 x float>* %src, i32 0
%l1 = load <16 x float>, <16 x float>* %s1, align 4		%l1 = load <16 x float>, <16 x float>* %s1, align 4
%s2 = getelementptr <16 x float>, <16 x float>* %src, i32 1		%s2 = getelementptr <16 x float>, <16 x float>* %src, i32 1
%l2 = load <16 x float>, <16 x float>* %s2, align 4		%l2 = load <16 x float>, <16 x float>* %s2, align 4
%s = shufflevector <16 x float> %l1, <16 x float> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>		%s = shufflevector <16 x float> %l1, <16 x float> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	entry:
%s = shufflevector <4 x half> %l1, <4 x half> %l2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>		%s = shufflevector <4 x half> %l1, <4 x half> %l2, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x half> %s, <8 x half> *%dst		store <8 x half> %s, <8 x half> *%dst
ret void		ret void
}		}

define void @vst2_v8f16(<8 x half> %src, <16 x half> %dst) {		define void @vst2_v8f16(<8 x half> %src, <16 x half> %dst) {
; CHECK-LABEL: vst2_v8f16:		; CHECK-LABEL: vst2_v8f16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vmov r2, s6		; CHECK-NEXT: vst20.16 {q0, q1}, [r1]
; CHECK-NEXT: vmovx.f16 s12, s6		; CHECK-NEXT: vst21.16 {q0, q1}, [r1]
; CHECK-NEXT: vmov.16 q2[0], r2
; CHECK-NEXT: vmov r0, s2
; CHECK-NEXT: vmov.16 q2[1], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s2
; CHECK-NEXT: vmov.16 q2[2], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s7
; CHECK-NEXT: vmov.16 q2[3], r0
; CHECK-NEXT: vmov r0, s7
; CHECK-NEXT: vmov.16 q2[4], r0
; CHECK-NEXT: vmov r0, s3
; CHECK-NEXT: vmov.16 q2[5], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s3
; CHECK-NEXT: vmov.16 q2[6], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s4
; CHECK-NEXT: vmov.16 q2[7], r0
; CHECK-NEXT: vmov r2, s4
; CHECK-NEXT: vstrw.32 q2, [r1, #16]
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmov.16 q2[0], r2
; CHECK-NEXT: vmovx.f16 s4, s5
; CHECK-NEXT: vmov.16 q2[1], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s0
; CHECK-NEXT: vmov.16 q2[2], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s0, s1
; CHECK-NEXT: vmov.16 q2[3], r0
; CHECK-NEXT: vmov r0, s5
; CHECK-NEXT: vmov.16 q2[4], r0
; CHECK-NEXT: vmov r0, s1
; CHECK-NEXT: vmov.16 q2[5], r0
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmov.16 q2[6], r0
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmov.16 q2[7], r0
; CHECK-NEXT: vstrw.32 q2, [r1]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <8 x half>, <8 x half>* %src, i32 0		%s1 = getelementptr <8 x half>, <8 x half>* %src, i32 0
%l1 = load <8 x half>, <8 x half>* %s1, align 4		%l1 = load <8 x half>, <8 x half>* %s1, align 4
%s2 = getelementptr <8 x half>, <8 x half>* %src, i32 1		%s2 = getelementptr <8 x half>, <8 x half>* %src, i32 1
%l2 = load <8 x half>, <8 x half>* %s2, align 4		%l2 = load <8 x half>, <8 x half>* %s2, align 4
%s = shufflevector <8 x half> %l1, <8 x half> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%s = shufflevector <8 x half> %l1, <8 x half> %l2, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
store <16 x half> %s, <16 x half> *%dst		store <16 x half> %s, <16 x half> *%dst
ret void		ret void
}		}

define void @vst2_v16f16(<16 x half> %src, <32 x half> %dst) {		define void @vst2_v16f16(<16 x half> %src, <32 x half> %dst) {
; CHECK-LABEL: vst2_v16f16:		; CHECK-LABEL: vst2_v16f16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10}		; CHECK-NEXT: vldrw.u32 q1, [r0, #48]
; CHECK-NEXT: vpush {d8, d9, d10}		; CHECK-NEXT: vldrw.u32 q3, [r0, #32]
; CHECK-NEXT: vldrw.u32 q3, [r0, #16]		; CHECK-NEXT: vldrw.u32 q0, [r0, #16]
; CHECK-NEXT: vldrw.u32 q2, [r0, #48]		; CHECK-NEXT: vldrw.u32 q2, [r0]
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: add.w r0, r1, #32
; CHECK-NEXT: vmov r2, s12		; CHECK-NEXT: vst20.16 {q2, q3}, [r1]
; CHECK-NEXT: vmovx.f16 s0, s12		; CHECK-NEXT: vst21.16 {q2, q3}, [r1]
; CHECK-NEXT: vmov.16 q4[0], r2		; CHECK-NEXT: vst20.16 {q0, q1}, [r0]
; CHECK-NEXT: vmov r3, s8		; CHECK-NEXT: vst21.16 {q0, q1}, [r0]
; CHECK-NEXT: vmov r2, s0
; CHECK-NEXT: vmov.16 q4[1], r3
; CHECK-NEXT: vmovx.f16 s0, s8
; CHECK-NEXT: vmov.16 q4[2], r2
; CHECK-NEXT: vmov r2, s0
; CHECK-NEXT: vmovx.f16 s0, s13
; CHECK-NEXT: vmov.16 q4[3], r2
; CHECK-NEXT: vmov r2, s13
; CHECK-NEXT: vmov.16 q4[4], r2
; CHECK-NEXT: vmov r2, s9
; CHECK-NEXT: vmov.16 q4[5], r2
; CHECK-NEXT: vmov r2, s0
; CHECK-NEXT: vmovx.f16 s0, s9
; CHECK-NEXT: vmov.16 q4[6], r2
; CHECK-NEXT: vmov r2, s0
; CHECK-NEXT: vldrw.u32 q0, [r0, #32]
; CHECK-NEXT: vmov.16 q4[7], r2
; CHECK-NEXT: vmov r2, s14
; CHECK-NEXT: vstrw.32 q4, [r1, #32]
; CHECK-NEXT: vmov.16 q4[0], r2
; CHECK-NEXT: vmov r0, s10
; CHECK-NEXT: vmovx.f16 s20, s14
; CHECK-NEXT: vmov.16 q4[1], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s20, s10
; CHECK-NEXT: vmov.16 q4[2], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmovx.f16 s12, s15
; CHECK-NEXT: vmov.16 q4[3], r0
; CHECK-NEXT: vmov r0, s15
; CHECK-NEXT: vmov.16 q4[4], r0
; CHECK-NEXT: vmov r0, s11
; CHECK-NEXT: vmov.16 q4[5], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s8, s11
; CHECK-NEXT: vmov.16 q4[6], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmovx.f16 s12, s4
; CHECK-NEXT: vmov.16 q4[7], r0
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmov.16 q2[0], r0
; CHECK-NEXT: vmov r2, s0
; CHECK-NEXT: vmov.16 q2[1], r2
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s0
; CHECK-NEXT: vmov.16 q2[2], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s5
; CHECK-NEXT: vmov.16 q2[3], r0
; CHECK-NEXT: vmov r0, s5
; CHECK-NEXT: vmov.16 q2[4], r0
; CHECK-NEXT: vmov r0, s1
; CHECK-NEXT: vmov.16 q2[5], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s1
; CHECK-NEXT: vmov.16 q2[6], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s6
; CHECK-NEXT: vmov.16 q2[7], r0
; CHECK-NEXT: vmov r2, s6
; CHECK-NEXT: vstrw.32 q2, [r1]
; CHECK-NEXT: vmov r0, s2
; CHECK-NEXT: vmov.16 q2[0], r2
; CHECK-NEXT: vmovx.f16 s4, s7
; CHECK-NEXT: vmov.16 q2[1], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s2
; CHECK-NEXT: vmov.16 q2[2], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s0, s3
; CHECK-NEXT: vmov.16 q2[3], r0
; CHECK-NEXT: vmov r0, s7
; CHECK-NEXT: vmov.16 q2[4], r0
; CHECK-NEXT: vmov r0, s3
; CHECK-NEXT: vmov.16 q2[5], r0
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmov.16 q2[6], r0
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmov.16 q2[7], r0
; CHECK-NEXT: vstrw.32 q4, [r1, #48]
; CHECK-NEXT: vstrw.32 q2, [r1, #16]
; CHECK-NEXT: vpop {d8, d9, d10}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x half>, <16 x half>* %src, i32 0		%s1 = getelementptr <16 x half>, <16 x half>* %src, i32 0
%l1 = load <16 x half>, <16 x half>* %s1, align 4		%l1 = load <16 x half>, <16 x half>* %s1, align 4
%s2 = getelementptr <16 x half>, <16 x half>* %src, i32 1		%s2 = getelementptr <16 x half>, <16 x half>* %src, i32 1
%l2 = load <16 x half>, <16 x half>* %s2, align 4		%l2 = load <16 x half>, <16 x half>* %s2, align 4
%s = shufflevector <16 x half> %l1, <16 x half> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>		%s = shufflevector <16 x half> %l1, <16 x half> %l2, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
store <32 x half> %s, <32 x half> *%dst		store <32 x half> %s, <32 x half> *%dst
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vst4.ll

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	entry:
%s = shufflevector <4 x i32> %t1, <4 x i32> %t2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>		%s = shufflevector <4 x i32> %t1, <4 x i32> %t2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>
store <8 x i32> %s, <8 x i32> *%dst		store <8 x i32> %s, <8 x i32> *%dst
ret void		ret void
}		}

define void @vst4_v4i32(<4 x i32> %src, <16 x i32> %dst) {		define void @vst4_v4i32(<4 x i32> %src, <16 x i32> %dst) {
; CHECK-LABEL: vst4_v4i32:		; CHECK-LABEL: vst4_v4i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13}		; CHECK-NEXT: vldrw.u32 q2, [r0, #32]
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: vldrw.u32 q2, [r0]
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]
; CHECK-NEXT: vldrw.u32 q1, [r0, #16]		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vmov.f32 s0, s9		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vmov.32 r0, q3[1]		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vdup.32 q4, r0		; CHECK-NEXT: vst40.32 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.f32 s1, s5		; CHECK-NEXT: vst41.32 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.f32 s2, s18		; CHECK-NEXT: vst42.32 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.32 r0, q3[0]		; CHECK-NEXT: vst43.32 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.f32 s3, s19
; CHECK-NEXT: vdup.32 q4, r0
; CHECK-NEXT: vmov.f32 s9, s4
; CHECK-NEXT: vmov.32 r0, q3[3]
; CHECK-NEXT: vmov.f32 s16, s8
; CHECK-NEXT: vdup.32 q6, r0
; CHECK-NEXT: vmov.f32 s20, s11
; CHECK-NEXT: vmov.32 r0, q3[2]
; CHECK-NEXT: vmov.f32 s8, s10
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.f32 s21, s7
; CHECK-NEXT: vmov.f32 s17, s4
; CHECK-NEXT: vmov.f32 s9, s6
; CHECK-NEXT: vdup.32 q1, r0
; CHECK-NEXT: vmov.f32 s22, s26
; CHECK-NEXT: vstrw.32 q4, [r1]
; CHECK-NEXT: vmov.f32 s10, s6
; CHECK-NEXT: vmov.f32 s23, s27
; CHECK-NEXT: vmov.f32 s11, s7
; CHECK-NEXT: vstrw.32 q5, [r1, #48]
; CHECK-NEXT: vstrw.32 q2, [r1, #32]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <4 x i32>, <4 x i32>* %src, i32 0		%s1 = getelementptr <4 x i32>, <4 x i32>* %src, i32 0
%l1 = load <4 x i32>, <4 x i32>* %s1, align 4		%l1 = load <4 x i32>, <4 x i32>* %s1, align 4
%s2 = getelementptr <4 x i32>, <4 x i32>* %src, i32 1		%s2 = getelementptr <4 x i32>, <4 x i32>* %src, i32 1
%l2 = load <4 x i32>, <4 x i32>* %s2, align 4		%l2 = load <4 x i32>, <4 x i32>* %s2, align 4
%s3 = getelementptr <4 x i32>, <4 x i32>* %src, i32 2		%s3 = getelementptr <4 x i32>, <4 x i32>* %src, i32 2
%l3 = load <4 x i32>, <4 x i32>* %s3, align 4		%l3 = load <4 x i32>, <4 x i32>* %s3, align 4
%s4 = getelementptr <4 x i32>, <4 x i32>* %src, i32 3		%s4 = getelementptr <4 x i32>, <4 x i32>* %src, i32 3
%l4 = load <4 x i32>, <4 x i32>* %s3, align 4		%l4 = load <4 x i32>, <4 x i32>* %s3, align 4
%t1 = shufflevector <4 x i32> %l1, <4 x i32> %l2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%t1 = shufflevector <4 x i32> %l1, <4 x i32> %l2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%t2 = shufflevector <4 x i32> %l3, <4 x i32> %l4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%t2 = shufflevector <4 x i32> %l3, <4 x i32> %l4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%s = shufflevector <8 x i32> %t1, <8 x i32> %t2, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>		%s = shufflevector <8 x i32> %t1, <8 x i32> %t2, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
store <16 x i32> %s, <16 x i32> *%dst		store <16 x i32> %s, <16 x i32> *%dst
ret void		ret void
}		}

define void @vst4_v8i32(<8 x i32> %src, <32 x i32> %dst) {		define void @vst4_v8i32(<8 x i32> %src, <32 x i32> %dst) {
; CHECK-LABEL: vst4_v8i32:		; CHECK-LABEL: vst4_v8i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #48
; CHECK-NEXT: sub sp, #48
; CHECK-NEXT: vldrw.u32 q3, [r0]
; CHECK-NEXT: vldrw.u32 q5, [r0, #64]
; CHECK-NEXT: vldrw.u32 q4, [r0, #32]
; CHECK-NEXT: vmov.f64 d2, d6
; CHECK-NEXT: vmov.32 r2, q5[0]
; CHECK-NEXT: vdup.32 q0, r2
; CHECK-NEXT: vmov.f32 s5, s16
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vstrw.32 q1, [sp, #32] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r2, q5[1]
; CHECK-NEXT: vdup.32 q0, r2
; CHECK-NEXT: vmov.f32 s16, s13
; CHECK-NEXT: vmov.f32 s0, s13
; CHECK-NEXT: vmov.f32 s1, s17
; CHECK-NEXT: vmov.f64 d2, d7
; CHECK-NEXT: vstrw.32 q0, [sp, #16] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r2, q5[2]
; CHECK-NEXT: vdup.32 q0, r2
; CHECK-NEXT: vmov.f32 s5, s18
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vmov.f32 s12, s15
; CHECK-NEXT: vstrw.32 q1, [sp] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r2, q5[3]
; CHECK-NEXT: vldrw.u32 q7, [r0, #16]
; CHECK-NEXT: vldrw.u32 q2, [r0, #80]		; CHECK-NEXT: vldrw.u32 q2, [r0, #80]
; CHECK-NEXT: vldrw.u32 q6, [r0, #48]		; CHECK-NEXT: vldrw.u32 q6, [r0, #64]
; CHECK-NEXT: vmov.f32 s13, s19		; CHECK-NEXT: vldrw.u32 q5, [r0, #32]
; CHECK-NEXT: vdup.32 q0, r2		; CHECK-NEXT: vldrw.u32 q1, [r0, #48]
; CHECK-NEXT: vmov.f32 s14, s2		; CHECK-NEXT: vldrw.u32 q4, [r0]
; CHECK-NEXT: vmov.32 r0, q2[0]		; CHECK-NEXT: vldrw.u32 q0, [r0, #16]
; CHECK-NEXT: vmov.f64 d10, d14		; CHECK-NEXT: vmov q7, q6
; CHECK-NEXT: vmov.f32 s15, s3		; CHECK-NEXT: add.w r0, r1, #64
; CHECK-NEXT: vdup.32 q0, r0		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vmov.f32 s21, s24		; CHECK-NEXT: vst40.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.32 r0, q2[1]		; CHECK-NEXT: vst41.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s22, s2		; CHECK-NEXT: vst42.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s23, s3		; CHECK-NEXT: vst43.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vdup.32 q0, r0		; CHECK-NEXT: vst40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f64 d8, d15		; CHECK-NEXT: vst41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.32 r0, q2[2]		; CHECK-NEXT: vst42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vdup.32 q1, r0		; CHECK-NEXT: vst43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.32 r0, q2[3]
; CHECK-NEXT: vdup.32 q2, r0
; CHECK-NEXT: vstrw.32 q5, [r1, #64]
; CHECK-NEXT: vstrw.32 q3, [r1, #48]
; CHECK-NEXT: vmov.f32 s0, s29
; CHECK-NEXT: vmov.f32 s17, s26
; CHECK-NEXT: vmov.f32 s24, s29
; CHECK-NEXT: vmov.f32 s1, s25
; CHECK-NEXT: vstrw.32 q0, [r1, #80]
; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s18, s6
; CHECK-NEXT: vmov.f32 s19, s7
; CHECK-NEXT: vstrw.32 q0, [r1, #32]
; CHECK-NEXT: vmov.f32 s4, s31
; CHECK-NEXT: vldrw.u32 q0, [sp, #32] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s5, s27
; CHECK-NEXT: vstrw.32 q4, [r1, #96]
; CHECK-NEXT: vmov.f32 s6, s10
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s7, s11
; CHECK-NEXT: vstrw.32 q1, [r1, #112]
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: add sp, #48
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <8 x i32>, <8 x i32>* %src, i32 0		%s1 = getelementptr <8 x i32>, <8 x i32>* %src, i32 0
%l1 = load <8 x i32>, <8 x i32>* %s1, align 4		%l1 = load <8 x i32>, <8 x i32>* %s1, align 4
%s2 = getelementptr <8 x i32>, <8 x i32>* %src, i32 1		%s2 = getelementptr <8 x i32>, <8 x i32>* %src, i32 1
%l2 = load <8 x i32>, <8 x i32>* %s2, align 4		%l2 = load <8 x i32>, <8 x i32>* %s2, align 4
%s3 = getelementptr <8 x i32>, <8 x i32>* %src, i32 2		%s3 = getelementptr <8 x i32>, <8 x i32>* %src, i32 2
%l3 = load <8 x i32>, <8 x i32>* %s3, align 4		%l3 = load <8 x i32>, <8 x i32>* %s3, align 4
%s4 = getelementptr <8 x i32>, <8 x i32>* %src, i32 3		%s4 = getelementptr <8 x i32>, <8 x i32>* %src, i32 3
%l4 = load <8 x i32>, <8 x i32>* %s3, align 4		%l4 = load <8 x i32>, <8 x i32>* %s3, align 4
%t1 = shufflevector <8 x i32> %l1, <8 x i32> %l2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t1 = shufflevector <8 x i32> %l1, <8 x i32> %l2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%t2 = shufflevector <8 x i32> %l3, <8 x i32> %l4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t2 = shufflevector <8 x i32> %l3, <8 x i32> %l4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%s = shufflevector <16 x i32> %t1, <16 x i32> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>		%s = shufflevector <16 x i32> %t1, <16 x i32> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>
store <32 x i32> %s, <32 x i32> *%dst		store <32 x i32> %s, <32 x i32> *%dst
ret void		ret void
}		}

define void @vst4_v16i32(<16 x i32> %src, <64 x i32> %dst) {		define void @vst4_v16i32(<16 x i32> %src, <64 x i32> %dst) {
; CHECK-LABEL: vst4_v16i32:		; CHECK-LABEL: vst4_v16i32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: .save {r4, r5}
		; CHECK-NEXT: push {r4, r5}
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #160		; CHECK-NEXT: .pad #152
; CHECK-NEXT: sub sp, #160		; CHECK-NEXT: sub sp, #152
; CHECK-NEXT: vldrw.u32 q2, [r0, #48]		; CHECK-NEXT: vldrw.u32 q2, [r0, #176]
; CHECK-NEXT: vldrw.u32 q1, [r0, #176]		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q0, [r0, #112]		; CHECK-NEXT: vstmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vmov.32 r2, q1[2]		; CHECK-NEXT: vldrw.u32 q2, [r0, #160]
; CHECK-NEXT: vmov.f64 d8, d5		; CHECK-NEXT: vldrw.u32 q6, [r0, #128]
; CHECK-NEXT: vdup.32 q3, r2		; CHECK-NEXT: vstmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vldrw.u32 q6, [r0, #160]		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q5, [r0, #128]		; CHECK-NEXT: vldrw.u32 q5, [r0, #64]
; CHECK-NEXT: vldrw.u32 q7, [r0, #64]		; CHECK-NEXT: vldmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vstrw.32 q6, [sp, #144] @ 16-byte Spill		; CHECK-NEXT: vldrw.u32 q4, [r0]
; CHECK-NEXT: vldrw.u32 q6, [r0, #96]		; CHECK-NEXT: vldrw.u32 q1, [r0, #112]
; CHECK-NEXT: vmov.f32 s17, s2		; CHECK-NEXT: vmov q7, q6
; CHECK-NEXT: vmov.f32 s18, s14		; CHECK-NEXT: vstmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vstrw.32 q6, [sp, #128] @ 16-byte Spill		; CHECK-NEXT: vldmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vmov.f32 s19, s15		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q3, [r0, #144]		; CHECK-NEXT: vldrw.u32 q1, [r0, #96]
; CHECK-NEXT: vldrw.u32 q6, [r0, #32]		; CHECK-NEXT: vstmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vstrw.32 q3, [sp, #96] @ 16-byte Spill		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q3, [r0, #80]		; CHECK-NEXT: vldmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vstrw.32 q6, [sp, #112] @ 16-byte Spill		; CHECK-NEXT: vldrw.u32 q0, [r0, #48]
; CHECK-NEXT: vstrw.32 q3, [sp, #80] @ 16-byte Spill		; CHECK-NEXT: vstmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vldrw.u32 q3, [r0, #16]		; CHECK-NEXT: vldmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vstrw.32 q3, [sp, #64] @ 16-byte Spill		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q3, [r0]		; CHECK-NEXT: vldrw.u32 q0, [r0, #32]
; CHECK-NEXT: vstrw.32 q4, [r1, #224]		; CHECK-NEXT: vstmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vmov.f32 s16, s11		; CHECK-NEXT: vldrw.u32 q2, [r0, #144]
; CHECK-NEXT: vmov.32 r0, q1[3]		; CHECK-NEXT: vldrw.u32 q1, [r0, #80]
; CHECK-NEXT: vmov.f32 s17, s3		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vdup.32 q6, r0		; CHECK-NEXT: vldrw.u32 q0, [r0, #16]
; CHECK-NEXT: vmov.f32 s18, s26		; CHECK-NEXT: add.w r0, r1, #64
; CHECK-NEXT: vmov.f32 s19, s27		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vstrw.32 q4, [r1, #240]		; CHECK-NEXT: vst40.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f64 d8, d4		; CHECK-NEXT: vst41.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.32 r0, q1[0]		; CHECK-NEXT: vst42.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vdup.32 q6, r0		; CHECK-NEXT: vst43.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s17, s0		; CHECK-NEXT: vst40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s18, s26		; CHECK-NEXT: vst41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s19, s27		; CHECK-NEXT: vst42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vstrw.32 q4, [r1, #192]		; CHECK-NEXT: vst43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.32 r0, q1[1]		; CHECK-NEXT: add.w r0, r1, #192
; CHECK-NEXT: vmov.f32 s0, s9		; CHECK-NEXT: vldmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vdup.32 q1, r0		; CHECK-NEXT: adds r1, #128
; CHECK-NEXT: vmov.f32 s2, s6		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vldrw.u32 q4, [sp, #112] @ 16-byte Reload		; CHECK-NEXT: vldmia r2, {d8, d9, d10, d11, d12, d13, d14, d15} @ 64-byte Reload
; CHECK-NEXT: vldrw.u32 q2, [sp, #144] @ 16-byte Reload		; CHECK-NEXT: vmov q7, q6
; CHECK-NEXT: vmov.f32 s3, s7		; CHECK-NEXT: vst40.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vldrw.u32 q6, [sp, #128] @ 16-byte Reload		; CHECK-NEXT: vst41.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vstrw.32 q0, [r1, #208]		; CHECK-NEXT: vst42.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f64 d0, d9		; CHECK-NEXT: vst43.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.32 r0, q2[2]		; CHECK-NEXT: vst40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vdup.32 q1, r0		; CHECK-NEXT: vst41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s1, s26		; CHECK-NEXT: vst42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s2, s6		; CHECK-NEXT: vst43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s3, s7		; CHECK-NEXT: add sp, #152
; CHECK-NEXT: vstrw.32 q0, [r1, #160]
; CHECK-NEXT: vmov.f32 s0, s19
; CHECK-NEXT: vmov.32 r0, q2[3]
; CHECK-NEXT: vmov.f32 s1, s27
; CHECK-NEXT: vdup.32 q1, r0
; CHECK-NEXT: vmov.f32 s8, s15
; CHECK-NEXT: vmov.f32 s2, s6
; CHECK-NEXT: vmov.f32 s3, s7
; CHECK-NEXT: vmov.f64 d2, d6
; CHECK-NEXT: vstrw.32 q0, [r1, #176]
; CHECK-NEXT: vmov.32 r0, q5[0]
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s5, s28
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vstrw.32 q1, [sp, #48] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r0, q5[1]
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s28, s13
; CHECK-NEXT: vmov.f32 s0, s13
; CHECK-NEXT: vmov.f32 s1, s29
; CHECK-NEXT: vmov.f64 d2, d7
; CHECK-NEXT: vstrw.32 q0, [sp, #32] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r0, q5[2]
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.32 r0, q5[3]
; CHECK-NEXT: vldrw.u32 q3, [sp, #96] @ 16-byte Reload
; CHECK-NEXT: vldrw.u32 q5, [sp, #80] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s5, s30
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s9, s31
; CHECK-NEXT: vstrw.32 q1, [sp, #16] @ 16-byte Spill
; CHECK-NEXT: vmov.f32 s10, s2
; CHECK-NEXT: vmov.f32 s11, s3
; CHECK-NEXT: vstrw.32 q2, [sp] @ 16-byte Spill
; CHECK-NEXT: vldrw.u32 q2, [sp, #64] @ 16-byte Reload
; CHECK-NEXT: vmov.32 r0, q3[0]
; CHECK-NEXT: vmov.f64 d12, d4
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.32 r0, q3[1]
; CHECK-NEXT: vdup.32 q7, r0
; CHECK-NEXT: vmov.32 r0, q3[2]
; CHECK-NEXT: vmov.f32 s25, s20
; CHECK-NEXT: vmov.f32 s26, s2
; CHECK-NEXT: vmov.f64 d8, d5
; CHECK-NEXT: vmov.f32 s27, s3
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s20, s9
; CHECK-NEXT: vmov.32 r0, q3[3]
; CHECK-NEXT: vmov.f32 s17, s22
; CHECK-NEXT: vldrw.u32 q3, [sp, #128] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s18, s2
; CHECK-NEXT: vmov.f32 s4, s11
; CHECK-NEXT: vmov.f32 s28, s9
; CHECK-NEXT: vldrw.u32 q2, [sp, #112] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s19, s3
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s5, s23
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vmov.f64 d0, d4
; CHECK-NEXT: vmov.f32 s1, s12
; CHECK-NEXT: vmov.f32 s12, s9
; CHECK-NEXT: vldrw.u32 q2, [sp, #144] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s29, s21
; CHECK-NEXT: vmov.32 r0, q2[0]
; CHECK-NEXT: vdup.32 q5, r0
; CHECK-NEXT: vmov.32 r0, q2[1]
; CHECK-NEXT: vmov.f32 s2, s22
; CHECK-NEXT: vdup.32 q2, r0
; CHECK-NEXT: vmov.f32 s3, s23
; CHECK-NEXT: vstrw.32 q4, [r1, #96]
; CHECK-NEXT: vstrw.32 q0, [r1, #128]
; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s14, s10
; CHECK-NEXT: vstrw.32 q1, [r1, #112]
; CHECK-NEXT: vstrw.32 q0, [r1, #32]
; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s15, s11
; CHECK-NEXT: vstrw.32 q6, [r1, #64]
; CHECK-NEXT: vstrw.32 q0, [r1, #48]
; CHECK-NEXT: vldrw.u32 q0, [sp, #48] @ 16-byte Reload
; CHECK-NEXT: vstrw.32 q3, [r1, #144]
; CHECK-NEXT: vstrw.32 q7, [r1, #80]
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vldrw.u32 q0, [sp, #32] @ 16-byte Reload
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: add sp, #160
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
		; CHECK-NEXT: pop {r4, r5}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x i32>, <16 x i32>* %src, i32 0		%s1 = getelementptr <16 x i32>, <16 x i32>* %src, i32 0
%l1 = load <16 x i32>, <16 x i32>* %s1, align 4		%l1 = load <16 x i32>, <16 x i32>* %s1, align 4
%s2 = getelementptr <16 x i32>, <16 x i32>* %src, i32 1		%s2 = getelementptr <16 x i32>, <16 x i32>* %src, i32 1
%l2 = load <16 x i32>, <16 x i32>* %s2, align 4		%l2 = load <16 x i32>, <16 x i32>* %s2, align 4
%s3 = getelementptr <16 x i32>, <16 x i32>* %src, i32 2		%s3 = getelementptr <16 x i32>, <16 x i32>* %src, i32 2
%l3 = load <16 x i32>, <16 x i32>* %s3, align 4		%l3 = load <16 x i32>, <16 x i32>* %s3, align 4
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	entry:
%s = shufflevector <8 x i16> %t1, <8 x i16> %t2, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>		%s = shufflevector <8 x i16> %t1, <8 x i16> %t2, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
store <16 x i16> %s, <16 x i16> *%dst		store <16 x i16> %s, <16 x i16> *%dst
ret void		ret void
}		}

define void @vst4_v8i16(<8 x i16> %src, <32 x i16> %dst) {		define void @vst4_v8i16(<8 x i16> %src, <32 x i16> %dst) {
; CHECK-LABEL: vst4_v8i16:		; CHECK-LABEL: vst4_v8i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vldrw.u32 q2, [r0, #32]
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]		; CHECK-NEXT: vst40.16 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.u16 r2, q1[2]		; CHECK-NEXT: vst41.16 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.16 q4[0], r2		; CHECK-NEXT: vst42.16 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.u16 r2, q2[2]		; CHECK-NEXT: vst43.16 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.16 q4[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[3]
; CHECK-NEXT: vmov.u16 r0, q3[2]
; CHECK-NEXT: vmov.16 q4[4], r2
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vmov.16 q4[5], r2
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[3]
; CHECK-NEXT: vmov.u16 r2, q4[0]
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov.16 q0[0], r2
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vmov.u16 r2, q4[1]
; CHECK-NEXT: vmov.16 q0[1], r2
; CHECK-NEXT: vmov.u16 r0, q5[2]
; CHECK-NEXT: vmov.16 q0[2], r0
; CHECK-NEXT: vmov.u16 r0, q5[3]
; CHECK-NEXT: vmov.16 q0[3], r0
; CHECK-NEXT: vmov.u16 r0, q4[4]
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov.u16 r0, q4[5]
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vmov.u16 r0, q5[6]
; CHECK-NEXT: vmov.16 q0[6], r0
; CHECK-NEXT: vmov.u16 r0, q5[7]
; CHECK-NEXT: vmov.16 q0[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[0]
; CHECK-NEXT: vmov.16 q5[0], r0
; CHECK-NEXT: vmov.u16 r0, q2[0]
; CHECK-NEXT: vmov.16 q5[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[1]
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[1]
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.u16 r0, q5[0]
; CHECK-NEXT: vmov.16 q4[0], r0
; CHECK-NEXT: vmov.u16 r0, q5[1]
; CHECK-NEXT: vmov.16 q4[1], r0
; CHECK-NEXT: vmov.u16 r0, q3[0]
; CHECK-NEXT: vmov.16 q6[2], r0
; CHECK-NEXT: vmov.16 q6[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[1]
; CHECK-NEXT: vmov.16 q6[6], r0
; CHECK-NEXT: vmov.16 q6[7], r0
; CHECK-NEXT: vmov.u16 r0, q6[2]
; CHECK-NEXT: vmov.16 q4[2], r0
; CHECK-NEXT: vmov.u16 r0, q6[3]
; CHECK-NEXT: vmov.16 q4[3], r0
; CHECK-NEXT: vmov.u16 r0, q5[4]
; CHECK-NEXT: vmov.16 q4[4], r0
; CHECK-NEXT: vmov.u16 r0, q5[5]
; CHECK-NEXT: vmov.16 q4[5], r0
; CHECK-NEXT: vmov.u16 r0, q6[6]
; CHECK-NEXT: vmov.16 q4[6], r0
; CHECK-NEXT: vmov.u16 r0, q6[7]
; CHECK-NEXT: vmov.16 q4[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[6]
; CHECK-NEXT: vmov.16 q6[0], r0
; CHECK-NEXT: vmov.u16 r0, q2[6]
; CHECK-NEXT: vmov.16 q6[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[7]
; CHECK-NEXT: vmov.16 q6[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[7]
; CHECK-NEXT: vmov.16 q6[5], r0
; CHECK-NEXT: vstrw.32 q4, [r1]
; CHECK-NEXT: vmov.u16 r0, q6[0]
; CHECK-NEXT: vmov.16 q5[0], r0
; CHECK-NEXT: vmov.u16 r0, q6[1]
; CHECK-NEXT: vmov.16 q5[1], r0
; CHECK-NEXT: vmov.u16 r0, q3[6]
; CHECK-NEXT: vmov.16 q7[2], r0
; CHECK-NEXT: vmov.16 q7[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[7]
; CHECK-NEXT: vmov.16 q7[6], r0
; CHECK-NEXT: vmov.16 q7[7], r0
; CHECK-NEXT: vmov.u16 r0, q7[2]
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov.u16 r0, q7[3]
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov.u16 r0, q6[4]
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov.u16 r0, q6[5]
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov.u16 r0, q7[6]
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov.u16 r0, q7[7]
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[4]
; CHECK-NEXT: vmov.16 q6[0], r0
; CHECK-NEXT: vmov.u16 r0, q2[4]
; CHECK-NEXT: vmov.16 q6[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[5]
; CHECK-NEXT: vmov.16 q6[4], r0
; CHECK-NEXT: vmov.u16 r0, q2[5]
; CHECK-NEXT: vmov.16 q6[5], r0
; CHECK-NEXT: vstrw.32 q5, [r1, #48]
; CHECK-NEXT: vmov.u16 r0, q6[0]
; CHECK-NEXT: vmov.16 q1[0], r0
; CHECK-NEXT: vmov.u16 r0, q6[1]
; CHECK-NEXT: vmov.16 q1[1], r0
; CHECK-NEXT: vmov.u16 r0, q3[4]
; CHECK-NEXT: vmov.16 q2[2], r0
; CHECK-NEXT: vmov.16 q2[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[5]
; CHECK-NEXT: vmov.16 q2[6], r0
; CHECK-NEXT: vmov.16 q2[7], r0
; CHECK-NEXT: vmov.u16 r0, q2[2]
; CHECK-NEXT: vmov.16 q1[2], r0
; CHECK-NEXT: vmov.u16 r0, q2[3]
; CHECK-NEXT: vmov.16 q1[3], r0
; CHECK-NEXT: vmov.u16 r0, q6[4]
; CHECK-NEXT: vmov.16 q1[4], r0
; CHECK-NEXT: vmov.u16 r0, q6[5]
; CHECK-NEXT: vmov.16 q1[5], r0
; CHECK-NEXT: vmov.u16 r0, q2[6]
; CHECK-NEXT: vmov.16 q1[6], r0
; CHECK-NEXT: vmov.u16 r0, q2[7]
; CHECK-NEXT: vmov.16 q1[7], r0
; CHECK-NEXT: vstrw.32 q1, [r1, #32]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <8 x i16>, <8 x i16>* %src, i32 0		%s1 = getelementptr <8 x i16>, <8 x i16>* %src, i32 0
%l1 = load <8 x i16>, <8 x i16>* %s1, align 4		%l1 = load <8 x i16>, <8 x i16>* %s1, align 4
%s2 = getelementptr <8 x i16>, <8 x i16>* %src, i32 1		%s2 = getelementptr <8 x i16>, <8 x i16>* %src, i32 1
%l2 = load <8 x i16>, <8 x i16>* %s2, align 4		%l2 = load <8 x i16>, <8 x i16>* %s2, align 4
%s3 = getelementptr <8 x i16>, <8 x i16>* %src, i32 2		%s3 = getelementptr <8 x i16>, <8 x i16>* %src, i32 2
%l3 = load <8 x i16>, <8 x i16>* %s3, align 4		%l3 = load <8 x i16>, <8 x i16>* %s3, align 4
%s4 = getelementptr <8 x i16>, <8 x i16>* %src, i32 3		%s4 = getelementptr <8 x i16>, <8 x i16>* %src, i32 3
%l4 = load <8 x i16>, <8 x i16>* %s3, align 4		%l4 = load <8 x i16>, <8 x i16>* %s3, align 4
%t1 = shufflevector <8 x i16> %l1, <8 x i16> %l2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t1 = shufflevector <8 x i16> %l1, <8 x i16> %l2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%t2 = shufflevector <8 x i16> %l3, <8 x i16> %l4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t2 = shufflevector <8 x i16> %l3, <8 x i16> %l4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%s = shufflevector <16 x i16> %t1, <16 x i16> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>		%s = shufflevector <16 x i16> %t1, <16 x i16> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>
store <32 x i16> %s, <32 x i16> *%dst		store <32 x i16> %s, <32 x i16> *%dst
ret void		ret void
}		}

define void @vst4_v16i16(<16 x i16> %src, <64 x i16> %dst) {		define void @vst4_v16i16(<16 x i16> %src, <64 x i16> %dst) {
; CHECK-LABEL: vst4_v16i16:		; CHECK-LABEL: vst4_v16i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #64		; CHECK-NEXT: vldrw.u32 q2, [r0, #80]
; CHECK-NEXT: sub sp, #64		; CHECK-NEXT: vldrw.u32 q6, [r0, #64]
; CHECK-NEXT: vldrw.u32 q3, [r0]		; CHECK-NEXT: vldrw.u32 q5, [r0, #32]
; CHECK-NEXT: vldrw.u32 q4, [r0, #32]		; CHECK-NEXT: vldrw.u32 q1, [r0, #48]
; CHECK-NEXT: vldrw.u32 q1, [r0, #64]		; CHECK-NEXT: vldrw.u32 q4, [r0]
; CHECK-NEXT: vldrw.u32 q6, [r0, #48]		; CHECK-NEXT: vldrw.u32 q0, [r0, #16]
; CHECK-NEXT: vmov.u16 r2, q3[0]		; CHECK-NEXT: vmov q7, q6
; CHECK-NEXT: vmov.16 q0[0], r2		; CHECK-NEXT: add.w r0, r1, #64
; CHECK-NEXT: vmov.u16 r2, q4[0]		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vmov.16 q0[1], r2		; CHECK-NEXT: vst40.16 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.u16 r2, q3[1]		; CHECK-NEXT: vst41.16 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.16 q0[4], r2		; CHECK-NEXT: vst42.16 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.u16 r2, q4[1]		; CHECK-NEXT: vst43.16 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.16 q0[5], r2		; CHECK-NEXT: vst40.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.u16 r2, q0[0]		; CHECK-NEXT: vst41.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.16 q5[0], r2		; CHECK-NEXT: vst42.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.u16 r2, q0[1]		; CHECK-NEXT: vst43.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.16 q5[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[0]
; CHECK-NEXT: vmov.16 q2[2], r2
; CHECK-NEXT: vmov.16 q2[3], r2
; CHECK-NEXT: vmov.u16 r2, q1[1]
; CHECK-NEXT: vmov.16 q2[6], r2
; CHECK-NEXT: vmov.16 q2[7], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q5[2], r2
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vmov.16 q5[3], r2
; CHECK-NEXT: vmov.u16 r2, q0[4]
; CHECK-NEXT: vmov.16 q5[4], r2
; CHECK-NEXT: vmov.u16 r2, q0[5]
; CHECK-NEXT: vmov.16 q5[5], r2
; CHECK-NEXT: vmov.u16 r2, q2[6]
; CHECK-NEXT: vmov.16 q5[6], r2
; CHECK-NEXT: vmov.u16 r2, q2[7]
; CHECK-NEXT: vmov.16 q5[7], r2
; CHECK-NEXT: vmov.u16 r2, q3[2]
; CHECK-NEXT: vmov.16 q0[0], r2
; CHECK-NEXT: vmov.u16 r2, q4[2]
; CHECK-NEXT: vmov.16 q0[1], r2
; CHECK-NEXT: vmov.u16 r2, q3[3]
; CHECK-NEXT: vmov.16 q0[4], r2
; CHECK-NEXT: vmov.u16 r2, q4[3]
; CHECK-NEXT: vmov.16 q0[5], r2
; CHECK-NEXT: vstrw.32 q5, [sp, #32] @ 16-byte Spill
; CHECK-NEXT: vmov.u16 r2, q0[0]
; CHECK-NEXT: vmov.16 q5[0], r2
; CHECK-NEXT: vmov.u16 r2, q0[1]
; CHECK-NEXT: vmov.16 q5[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[2]
; CHECK-NEXT: vmov.16 q2[2], r2
; CHECK-NEXT: vmov.16 q2[3], r2
; CHECK-NEXT: vmov.u16 r2, q1[3]
; CHECK-NEXT: vmov.16 q2[6], r2
; CHECK-NEXT: vmov.16 q2[7], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q5[2], r2
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vmov.16 q5[3], r2
; CHECK-NEXT: vmov.u16 r2, q0[4]
; CHECK-NEXT: vmov.16 q5[4], r2
; CHECK-NEXT: vmov.u16 r2, q0[5]
; CHECK-NEXT: vmov.16 q5[5], r2
; CHECK-NEXT: vmov.u16 r2, q2[6]
; CHECK-NEXT: vmov.16 q5[6], r2
; CHECK-NEXT: vmov.u16 r2, q2[7]
; CHECK-NEXT: vmov.16 q5[7], r2
; CHECK-NEXT: vmov.u16 r2, q3[4]
; CHECK-NEXT: vmov.16 q0[0], r2
; CHECK-NEXT: vmov.u16 r2, q4[4]
; CHECK-NEXT: vmov.16 q0[1], r2
; CHECK-NEXT: vmov.u16 r2, q3[5]
; CHECK-NEXT: vmov.16 q0[4], r2
; CHECK-NEXT: vmov.u16 r2, q4[5]
; CHECK-NEXT: vmov.16 q0[5], r2
; CHECK-NEXT: vstrw.32 q5, [sp, #16] @ 16-byte Spill
; CHECK-NEXT: vmov.u16 r2, q0[0]
; CHECK-NEXT: vmov.16 q5[0], r2
; CHECK-NEXT: vmov.u16 r2, q0[1]
; CHECK-NEXT: vmov.16 q5[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[4]
; CHECK-NEXT: vmov.16 q2[2], r2
; CHECK-NEXT: vmov.16 q2[3], r2
; CHECK-NEXT: vmov.u16 r2, q1[5]
; CHECK-NEXT: vmov.16 q2[6], r2
; CHECK-NEXT: vmov.16 q2[7], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q5[2], r2
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vmov.16 q5[3], r2
; CHECK-NEXT: vmov.u16 r2, q0[4]
; CHECK-NEXT: vmov.16 q5[4], r2
; CHECK-NEXT: vmov.u16 r2, q0[5]
; CHECK-NEXT: vmov.16 q5[5], r2
; CHECK-NEXT: vmov.u16 r2, q2[6]
; CHECK-NEXT: vmov.16 q5[6], r2
; CHECK-NEXT: vmov.u16 r2, q2[7]
; CHECK-NEXT: vmov.16 q5[7], r2
; CHECK-NEXT: vmov.u16 r2, q3[6]
; CHECK-NEXT: vmov.16 q0[0], r2
; CHECK-NEXT: vmov.u16 r2, q4[6]
; CHECK-NEXT: vmov.16 q0[1], r2
; CHECK-NEXT: vmov.u16 r2, q3[7]
; CHECK-NEXT: vmov.16 q0[4], r2
; CHECK-NEXT: vmov.u16 r2, q4[7]
; CHECK-NEXT: vmov.16 q0[5], r2
; CHECK-NEXT: vldrw.u32 q3, [r0, #80]
; CHECK-NEXT: vmov.u16 r2, q0[0]
; CHECK-NEXT: vstrw.32 q5, [sp] @ 16-byte Spill
; CHECK-NEXT: vmov.16 q7[0], r2
; CHECK-NEXT: vmov.u16 r2, q0[1]
; CHECK-NEXT: vmov.16 q7[1], r2
; CHECK-NEXT: vmov.u16 r2, q1[6]
; CHECK-NEXT: vmov.16 q2[2], r2
; CHECK-NEXT: vmov.16 q2[3], r2
; CHECK-NEXT: vmov.u16 r2, q1[7]
; CHECK-NEXT: vmov.16 q2[6], r2
; CHECK-NEXT: vmov.16 q2[7], r2
; CHECK-NEXT: vmov.u16 r2, q2[2]
; CHECK-NEXT: vmov.16 q7[2], r2
; CHECK-NEXT: vmov.u16 r2, q2[3]
; CHECK-NEXT: vmov.16 q7[3], r2
; CHECK-NEXT: vmov.u16 r2, q0[4]
; CHECK-NEXT: vmov.16 q7[4], r2
; CHECK-NEXT: vmov.u16 r2, q0[5]
; CHECK-NEXT: vmov.16 q7[5], r2
; CHECK-NEXT: vmov.u16 r2, q2[6]
; CHECK-NEXT: vmov.16 q7[6], r2
; CHECK-NEXT: vmov.u16 r2, q2[7]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]
; CHECK-NEXT: vmov.16 q7[7], r2
; CHECK-NEXT: vmov.u16 r0, q3[0]
; CHECK-NEXT: vstrw.32 q7, [r1, #48]
; CHECK-NEXT: vmov.u16 r2, q2[0]
; CHECK-NEXT: vmov.16 q1[2], r0
; CHECK-NEXT: vmov.16 q0[0], r2
; CHECK-NEXT: vmov.u16 r2, q6[0]
; CHECK-NEXT: vmov.16 q0[1], r2
; CHECK-NEXT: vmov.u16 r2, q2[1]
; CHECK-NEXT: vmov.16 q0[4], r2
; CHECK-NEXT: vmov.u16 r2, q6[1]
; CHECK-NEXT: vmov.16 q0[5], r2
; CHECK-NEXT: vmov.16 q1[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[1]
; CHECK-NEXT: vmov.u16 r2, q0[0]
; CHECK-NEXT: vmov.16 q1[6], r0
; CHECK-NEXT: vmov.16 q5[0], r2
; CHECK-NEXT: vmov.16 q1[7], r0
; CHECK-NEXT: vmov.u16 r2, q0[1]
; CHECK-NEXT: vmov.16 q5[1], r2
; CHECK-NEXT: vmov.u16 r0, q1[2]
; CHECK-NEXT: vmov.16 q5[2], r0
; CHECK-NEXT: vmov.u16 r0, q1[3]
; CHECK-NEXT: vmov.16 q5[3], r0
; CHECK-NEXT: vmov.u16 r0, q0[4]
; CHECK-NEXT: vmov.16 q5[4], r0
; CHECK-NEXT: vmov.u16 r0, q0[5]
; CHECK-NEXT: vmov.16 q5[5], r0
; CHECK-NEXT: vmov.u16 r0, q1[6]
; CHECK-NEXT: vmov.16 q5[6], r0
; CHECK-NEXT: vmov.u16 r0, q1[7]
; CHECK-NEXT: vmov.16 q5[7], r0
; CHECK-NEXT: vmov.u16 r0, q2[2]
; CHECK-NEXT: vmov.16 q0[0], r0
; CHECK-NEXT: vmov.u16 r0, q6[2]
; CHECK-NEXT: vmov.16 q0[1], r0
; CHECK-NEXT: vmov.u16 r0, q2[3]
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov.u16 r0, q6[3]
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vstrw.32 q2, [sp, #48] @ 16-byte Spill
; CHECK-NEXT: vmov.u16 r0, q0[0]
; CHECK-NEXT: vstrw.32 q5, [r1, #64]
; CHECK-NEXT: vmov.16 q4[0], r0
; CHECK-NEXT: vmov.u16 r0, q0[1]
; CHECK-NEXT: vmov.16 q4[1], r0
; CHECK-NEXT: vmov.u16 r0, q3[2]
; CHECK-NEXT: vmov.16 q1[2], r0
; CHECK-NEXT: vmov.16 q1[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[3]
; CHECK-NEXT: vmov.16 q1[6], r0
; CHECK-NEXT: vmov.16 q1[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[2]
; CHECK-NEXT: vmov.16 q4[2], r0
; CHECK-NEXT: vmov.u16 r0, q1[3]
; CHECK-NEXT: vmov.16 q4[3], r0
; CHECK-NEXT: vmov.u16 r0, q0[4]
; CHECK-NEXT: vmov.16 q4[4], r0
; CHECK-NEXT: vmov.u16 r0, q0[5]
; CHECK-NEXT: vmov.16 q4[5], r0
; CHECK-NEXT: vmov.u16 r0, q1[6]
; CHECK-NEXT: vldrw.u32 q0, [sp, #48] @ 16-byte Reload
; CHECK-NEXT: vmov.16 q4[6], r0
; CHECK-NEXT: vmov.u16 r0, q1[7]
; CHECK-NEXT: vmov.16 q4[7], r0
; CHECK-NEXT: vmov.u16 r0, q0[4]
; CHECK-NEXT: vmov.16 q1[0], r0
; CHECK-NEXT: vmov.u16 r0, q6[4]
; CHECK-NEXT: vmov.16 q1[1], r0
; CHECK-NEXT: vmov.u16 r0, q0[5]
; CHECK-NEXT: vmov.16 q1[4], r0
; CHECK-NEXT: vmov.u16 r0, q6[5]
; CHECK-NEXT: vmov.16 q1[5], r0
; CHECK-NEXT: vstrw.32 q4, [r1, #80]
; CHECK-NEXT: vmov.u16 r0, q1[0]
; CHECK-NEXT: vmov.16 q2[0], r0
; CHECK-NEXT: vmov.u16 r0, q1[1]
; CHECK-NEXT: vmov.16 q2[1], r0
; CHECK-NEXT: vmov.u16 r0, q3[4]
; CHECK-NEXT: vmov.16 q0[2], r0
; CHECK-NEXT: vmov.16 q0[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[5]
; CHECK-NEXT: vmov.16 q0[6], r0
; CHECK-NEXT: vmov.16 q0[7], r0
; CHECK-NEXT: vmov.u16 r0, q0[2]
; CHECK-NEXT: vmov.16 q2[2], r0
; CHECK-NEXT: vmov.u16 r0, q0[3]
; CHECK-NEXT: vmov.16 q2[3], r0
; CHECK-NEXT: vmov.u16 r0, q1[4]
; CHECK-NEXT: vmov.16 q2[4], r0
; CHECK-NEXT: vmov.u16 r0, q1[5]
; CHECK-NEXT: vmov.16 q2[5], r0
; CHECK-NEXT: vmov.u16 r0, q0[6]
; CHECK-NEXT: vldrw.u32 q1, [sp, #48] @ 16-byte Reload
; CHECK-NEXT: vmov.16 q2[6], r0
; CHECK-NEXT: vmov.u16 r0, q0[7]
; CHECK-NEXT: vmov.16 q2[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[6]
; CHECK-NEXT: vmov.16 q0[0], r0
; CHECK-NEXT: vmov.u16 r0, q6[6]
; CHECK-NEXT: vmov.16 q0[1], r0
; CHECK-NEXT: vmov.u16 r0, q1[7]
; CHECK-NEXT: vmov.16 q0[4], r0
; CHECK-NEXT: vmov.u16 r0, q6[7]
; CHECK-NEXT: vmov.16 q0[5], r0
; CHECK-NEXT: vstrw.32 q2, [r1, #96]
; CHECK-NEXT: vmov.u16 r0, q0[0]
; CHECK-NEXT: vmov.16 q6[0], r0
; CHECK-NEXT: vmov.u16 r0, q0[1]
; CHECK-NEXT: vmov.16 q6[1], r0
; CHECK-NEXT: vmov.u16 r0, q3[6]
; CHECK-NEXT: vmov.16 q1[2], r0
; CHECK-NEXT: vmov.16 q1[3], r0
; CHECK-NEXT: vmov.u16 r0, q3[7]
; CHECK-NEXT: vmov.16 q1[6], r0
; CHECK-NEXT: vmov.16 q1[7], r0
; CHECK-NEXT: vmov.u16 r0, q1[2]
; CHECK-NEXT: vmov.16 q6[2], r0
; CHECK-NEXT: vmov.u16 r0, q1[3]
; CHECK-NEXT: vmov.16 q6[3], r0
; CHECK-NEXT: vmov.u16 r0, q0[4]
; CHECK-NEXT: vmov.16 q6[4], r0
; CHECK-NEXT: vmov.u16 r0, q0[5]
; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
; CHECK-NEXT: vmov.16 q6[5], r0
; CHECK-NEXT: vmov.u16 r0, q1[6]
; CHECK-NEXT: vstrw.32 q0, [r1, #32]
; CHECK-NEXT: vldrw.u32 q0, [sp, #32] @ 16-byte Reload
; CHECK-NEXT: vmov.16 q6[6], r0
; CHECK-NEXT: vmov.u16 r0, q1[7]
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
; CHECK-NEXT: vmov.16 q6[7], r0
; CHECK-NEXT: vstrw.32 q6, [r1, #112]
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: add sp, #64
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x i16>, <16 x i16>* %src, i32 0		%s1 = getelementptr <16 x i16>, <16 x i16>* %src, i32 0
%l1 = load <16 x i16>, <16 x i16>* %s1, align 4		%l1 = load <16 x i16>, <16 x i16>* %s1, align 4
%s2 = getelementptr <16 x i16>, <16 x i16>* %src, i32 1		%s2 = getelementptr <16 x i16>, <16 x i16>* %src, i32 1
%l2 = load <16 x i16>, <16 x i16>* %s2, align 4		%l2 = load <16 x i16>, <16 x i16>* %s2, align 4
%s3 = getelementptr <16 x i16>, <16 x i16>* %src, i32 2		%s3 = getelementptr <16 x i16>, <16 x i16>* %src, i32 2
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	entry:
%s = shufflevector <16 x i8> %t1, <16 x i8> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>		%s = shufflevector <16 x i8> %t1, <16 x i8> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>
store <32 x i8> %s, <32 x i8> *%dst		store <32 x i8> %s, <32 x i8> *%dst
ret void		ret void
}		}

define void @vst4_v16i8(<16 x i8> %src, <64 x i8> %dst) {		define void @vst4_v16i8(<16 x i8> %src, <64 x i8> %dst) {
; CHECK-LABEL: vst4_v16i8:		; CHECK-LABEL: vst4_v16i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vldrw.u32 q2, [r0, #32]
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vldrw.u32 q1, [r0]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vldrw.u32 q2, [r0, #16]		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]		; CHECK-NEXT: vst40.8 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.u8 r2, q1[4]		; CHECK-NEXT: vst41.8 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.8 q4[0], r2		; CHECK-NEXT: vst42.8 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.u8 r2, q2[4]		; CHECK-NEXT: vst43.8 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.8 q4[1], r2
; CHECK-NEXT: vmov.u8 r2, q1[5]
; CHECK-NEXT: vmov.u8 r0, q3[4]
; CHECK-NEXT: vmov.8 q4[4], r2
; CHECK-NEXT: vmov.8 q5[2], r0
; CHECK-NEXT: vmov.u8 r2, q2[5]
; CHECK-NEXT: vmov.8 q4[5], r2
; CHECK-NEXT: vmov.u8 r2, q1[6]
; CHECK-NEXT: vmov.8 q5[3], r0
; CHECK-NEXT: vmov.u8 r0, q3[5]
; CHECK-NEXT: vmov.8 q5[6], r0
; CHECK-NEXT: vmov.8 q4[8], r2
; CHECK-NEXT: vmov.u8 r2, q2[6]
; CHECK-NEXT: vmov.8 q5[7], r0
; CHECK-NEXT: vmov.8 q4[9], r2
; CHECK-NEXT: vmov.u8 r2, q1[7]
; CHECK-NEXT: vmov.u8 r0, q3[6]
; CHECK-NEXT: vmov.8 q4[12], r2
; CHECK-NEXT: vmov.8 q5[10], r0
; CHECK-NEXT: vmov.u8 r2, q2[7]
; CHECK-NEXT: vmov.8 q4[13], r2
; CHECK-NEXT: vmov.8 q5[11], r0
; CHECK-NEXT: vmov.u8 r0, q3[7]
; CHECK-NEXT: vmov.u8 r2, q4[0]
; CHECK-NEXT: vmov.8 q5[14], r0
; CHECK-NEXT: vmov.8 q0[0], r2
; CHECK-NEXT: vmov.8 q5[15], r0
; CHECK-NEXT: vmov.u8 r2, q4[1]
; CHECK-NEXT: vmov.8 q0[1], r2
; CHECK-NEXT: vmov.u8 r0, q5[2]
; CHECK-NEXT: vmov.8 q0[2], r0
; CHECK-NEXT: vmov.u8 r0, q5[3]
; CHECK-NEXT: vmov.8 q0[3], r0
; CHECK-NEXT: vmov.u8 r0, q4[4]
; CHECK-NEXT: vmov.8 q0[4], r0
; CHECK-NEXT: vmov.u8 r0, q4[5]
; CHECK-NEXT: vmov.8 q0[5], r0
; CHECK-NEXT: vmov.u8 r0, q5[6]
; CHECK-NEXT: vmov.8 q0[6], r0
; CHECK-NEXT: vmov.u8 r0, q5[7]
; CHECK-NEXT: vmov.8 q0[7], r0
; CHECK-NEXT: vmov.u8 r0, q4[8]
; CHECK-NEXT: vmov.8 q0[8], r0
; CHECK-NEXT: vmov.u8 r0, q4[9]
; CHECK-NEXT: vmov.8 q0[9], r0
; CHECK-NEXT: vmov.u8 r0, q5[10]
; CHECK-NEXT: vmov.8 q0[10], r0
; CHECK-NEXT: vmov.u8 r0, q5[11]
; CHECK-NEXT: vmov.8 q0[11], r0
; CHECK-NEXT: vmov.u8 r0, q4[12]
; CHECK-NEXT: vmov.8 q0[12], r0
; CHECK-NEXT: vmov.u8 r0, q4[13]
; CHECK-NEXT: vmov.8 q0[13], r0
; CHECK-NEXT: vmov.u8 r0, q5[14]
; CHECK-NEXT: vmov.8 q0[14], r0
; CHECK-NEXT: vmov.u8 r0, q5[15]
; CHECK-NEXT: vmov.8 q0[15], r0
; CHECK-NEXT: vmov.u8 r0, q1[0]
; CHECK-NEXT: vmov.8 q5[0], r0
; CHECK-NEXT: vmov.u8 r0, q2[0]
; CHECK-NEXT: vmov.8 q5[1], r0
; CHECK-NEXT: vmov.u8 r0, q1[1]
; CHECK-NEXT: vmov.8 q5[4], r0
; CHECK-NEXT: vmov.u8 r0, q2[1]
; CHECK-NEXT: vmov.8 q5[5], r0
; CHECK-NEXT: vmov.u8 r0, q1[2]
; CHECK-NEXT: vmov.8 q5[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[2]
; CHECK-NEXT: vmov.8 q5[9], r0
; CHECK-NEXT: vmov.u8 r0, q1[3]
; CHECK-NEXT: vmov.8 q5[12], r0
; CHECK-NEXT: vmov.u8 r0, q2[3]
; CHECK-NEXT: vmov.8 q5[13], r0
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.u8 r0, q5[0]
; CHECK-NEXT: vmov.8 q4[0], r0
; CHECK-NEXT: vmov.u8 r0, q5[1]
; CHECK-NEXT: vmov.8 q4[1], r0
; CHECK-NEXT: vmov.u8 r0, q3[0]
; CHECK-NEXT: vmov.8 q6[2], r0
; CHECK-NEXT: vmov.8 q6[3], r0
; CHECK-NEXT: vmov.u8 r0, q3[1]
; CHECK-NEXT: vmov.8 q6[6], r0
; CHECK-NEXT: vmov.8 q6[7], r0
; CHECK-NEXT: vmov.u8 r0, q3[2]
; CHECK-NEXT: vmov.8 q6[10], r0
; CHECK-NEXT: vmov.8 q6[11], r0
; CHECK-NEXT: vmov.u8 r0, q3[3]
; CHECK-NEXT: vmov.8 q6[14], r0
; CHECK-NEXT: vmov.8 q6[15], r0
; CHECK-NEXT: vmov.u8 r0, q6[2]
; CHECK-NEXT: vmov.8 q4[2], r0
; CHECK-NEXT: vmov.u8 r0, q6[3]
; CHECK-NEXT: vmov.8 q4[3], r0
; CHECK-NEXT: vmov.u8 r0, q5[4]
; CHECK-NEXT: vmov.8 q4[4], r0
; CHECK-NEXT: vmov.u8 r0, q5[5]
; CHECK-NEXT: vmov.8 q4[5], r0
; CHECK-NEXT: vmov.u8 r0, q6[6]
; CHECK-NEXT: vmov.8 q4[6], r0
; CHECK-NEXT: vmov.u8 r0, q6[7]
; CHECK-NEXT: vmov.8 q4[7], r0
; CHECK-NEXT: vmov.u8 r0, q5[8]
; CHECK-NEXT: vmov.8 q4[8], r0
; CHECK-NEXT: vmov.u8 r0, q5[9]
; CHECK-NEXT: vmov.8 q4[9], r0
; CHECK-NEXT: vmov.u8 r0, q6[10]
; CHECK-NEXT: vmov.8 q4[10], r0
; CHECK-NEXT: vmov.u8 r0, q6[11]
; CHECK-NEXT: vmov.8 q4[11], r0
; CHECK-NEXT: vmov.u8 r0, q5[12]
; CHECK-NEXT: vmov.8 q4[12], r0
; CHECK-NEXT: vmov.u8 r0, q5[13]
; CHECK-NEXT: vmov.8 q4[13], r0
; CHECK-NEXT: vmov.u8 r0, q6[14]
; CHECK-NEXT: vmov.8 q4[14], r0
; CHECK-NEXT: vmov.u8 r0, q6[15]
; CHECK-NEXT: vmov.8 q4[15], r0
; CHECK-NEXT: vmov.u8 r0, q1[12]
; CHECK-NEXT: vmov.8 q6[0], r0
; CHECK-NEXT: vmov.u8 r0, q2[12]
; CHECK-NEXT: vmov.8 q6[1], r0
; CHECK-NEXT: vmov.u8 r0, q1[13]
; CHECK-NEXT: vmov.8 q6[4], r0
; CHECK-NEXT: vmov.u8 r0, q2[13]
; CHECK-NEXT: vmov.8 q6[5], r0
; CHECK-NEXT: vmov.u8 r0, q1[14]
; CHECK-NEXT: vmov.8 q6[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[14]
; CHECK-NEXT: vmov.8 q6[9], r0
; CHECK-NEXT: vmov.u8 r0, q1[15]
; CHECK-NEXT: vmov.8 q6[12], r0
; CHECK-NEXT: vmov.u8 r0, q2[15]
; CHECK-NEXT: vmov.8 q6[13], r0
; CHECK-NEXT: vstrw.32 q4, [r1]
; CHECK-NEXT: vmov.u8 r0, q6[0]
; CHECK-NEXT: vmov.8 q5[0], r0
; CHECK-NEXT: vmov.u8 r0, q6[1]
; CHECK-NEXT: vmov.8 q5[1], r0
; CHECK-NEXT: vmov.u8 r0, q3[12]
; CHECK-NEXT: vmov.8 q7[2], r0
; CHECK-NEXT: vmov.8 q7[3], r0
; CHECK-NEXT: vmov.u8 r0, q3[13]
; CHECK-NEXT: vmov.8 q7[6], r0
; CHECK-NEXT: vmov.8 q7[7], r0
; CHECK-NEXT: vmov.u8 r0, q3[14]
; CHECK-NEXT: vmov.8 q7[10], r0
; CHECK-NEXT: vmov.8 q7[11], r0
; CHECK-NEXT: vmov.u8 r0, q3[15]
; CHECK-NEXT: vmov.8 q7[14], r0
; CHECK-NEXT: vmov.8 q7[15], r0
; CHECK-NEXT: vmov.u8 r0, q7[2]
; CHECK-NEXT: vmov.8 q5[2], r0
; CHECK-NEXT: vmov.u8 r0, q7[3]
; CHECK-NEXT: vmov.8 q5[3], r0
; CHECK-NEXT: vmov.u8 r0, q6[4]
; CHECK-NEXT: vmov.8 q5[4], r0
; CHECK-NEXT: vmov.u8 r0, q6[5]
; CHECK-NEXT: vmov.8 q5[5], r0
; CHECK-NEXT: vmov.u8 r0, q7[6]
; CHECK-NEXT: vmov.8 q5[6], r0
; CHECK-NEXT: vmov.u8 r0, q7[7]
; CHECK-NEXT: vmov.8 q5[7], r0
; CHECK-NEXT: vmov.u8 r0, q6[8]
; CHECK-NEXT: vmov.8 q5[8], r0
; CHECK-NEXT: vmov.u8 r0, q6[9]
; CHECK-NEXT: vmov.8 q5[9], r0
; CHECK-NEXT: vmov.u8 r0, q7[10]
; CHECK-NEXT: vmov.8 q5[10], r0
; CHECK-NEXT: vmov.u8 r0, q7[11]
; CHECK-NEXT: vmov.8 q5[11], r0
; CHECK-NEXT: vmov.u8 r0, q6[12]
; CHECK-NEXT: vmov.8 q5[12], r0
; CHECK-NEXT: vmov.u8 r0, q6[13]
; CHECK-NEXT: vmov.8 q5[13], r0
; CHECK-NEXT: vmov.u8 r0, q7[14]
; CHECK-NEXT: vmov.8 q5[14], r0
; CHECK-NEXT: vmov.u8 r0, q7[15]
; CHECK-NEXT: vmov.8 q5[15], r0
; CHECK-NEXT: vmov.u8 r0, q1[8]
; CHECK-NEXT: vmov.8 q6[0], r0
; CHECK-NEXT: vmov.u8 r0, q2[8]
; CHECK-NEXT: vmov.8 q6[1], r0
; CHECK-NEXT: vmov.u8 r0, q1[9]
; CHECK-NEXT: vmov.8 q6[4], r0
; CHECK-NEXT: vmov.u8 r0, q2[9]
; CHECK-NEXT: vmov.8 q6[5], r0
; CHECK-NEXT: vmov.u8 r0, q1[10]
; CHECK-NEXT: vmov.8 q6[8], r0
; CHECK-NEXT: vmov.u8 r0, q2[10]
; CHECK-NEXT: vmov.8 q6[9], r0
; CHECK-NEXT: vmov.u8 r0, q1[11]
; CHECK-NEXT: vmov.8 q6[12], r0
; CHECK-NEXT: vmov.u8 r0, q2[11]
; CHECK-NEXT: vmov.8 q6[13], r0
; CHECK-NEXT: vstrw.32 q5, [r1, #48]
; CHECK-NEXT: vmov.u8 r0, q6[0]
; CHECK-NEXT: vmov.8 q1[0], r0
; CHECK-NEXT: vmov.u8 r0, q6[1]
; CHECK-NEXT: vmov.8 q1[1], r0
; CHECK-NEXT: vmov.u8 r0, q3[8]
; CHECK-NEXT: vmov.8 q2[2], r0
; CHECK-NEXT: vmov.8 q2[3], r0
; CHECK-NEXT: vmov.u8 r0, q3[9]
; CHECK-NEXT: vmov.8 q2[6], r0
; CHECK-NEXT: vmov.8 q2[7], r0
; CHECK-NEXT: vmov.u8 r0, q3[10]
; CHECK-NEXT: vmov.8 q2[10], r0
; CHECK-NEXT: vmov.8 q2[11], r0
; CHECK-NEXT: vmov.u8 r0, q3[11]
; CHECK-NEXT: vmov.8 q2[14], r0
; CHECK-NEXT: vmov.8 q2[15], r0
; CHECK-NEXT: vmov.u8 r0, q2[2]
; CHECK-NEXT: vmov.8 q1[2], r0
; CHECK-NEXT: vmov.u8 r0, q2[3]
; CHECK-NEXT: vmov.8 q1[3], r0
; CHECK-NEXT: vmov.u8 r0, q6[4]
; CHECK-NEXT: vmov.8 q1[4], r0
; CHECK-NEXT: vmov.u8 r0, q6[5]
; CHECK-NEXT: vmov.8 q1[5], r0
; CHECK-NEXT: vmov.u8 r0, q2[6]
; CHECK-NEXT: vmov.8 q1[6], r0
; CHECK-NEXT: vmov.u8 r0, q2[7]
; CHECK-NEXT: vmov.8 q1[7], r0
; CHECK-NEXT: vmov.u8 r0, q6[8]
; CHECK-NEXT: vmov.8 q1[8], r0
; CHECK-NEXT: vmov.u8 r0, q6[9]
; CHECK-NEXT: vmov.8 q1[9], r0
; CHECK-NEXT: vmov.u8 r0, q2[10]
; CHECK-NEXT: vmov.8 q1[10], r0
; CHECK-NEXT: vmov.u8 r0, q2[11]
; CHECK-NEXT: vmov.8 q1[11], r0
; CHECK-NEXT: vmov.u8 r0, q6[12]
; CHECK-NEXT: vmov.8 q1[12], r0
; CHECK-NEXT: vmov.u8 r0, q6[13]
; CHECK-NEXT: vmov.8 q1[13], r0
; CHECK-NEXT: vmov.u8 r0, q2[14]
; CHECK-NEXT: vmov.8 q1[14], r0
; CHECK-NEXT: vmov.u8 r0, q2[15]
; CHECK-NEXT: vmov.8 q1[15], r0
; CHECK-NEXT: vstrw.32 q1, [r1, #32]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x i8>, <16 x i8>* %src, i32 0		%s1 = getelementptr <16 x i8>, <16 x i8>* %src, i32 0
%l1 = load <16 x i8>, <16 x i8>* %s1, align 4		%l1 = load <16 x i8>, <16 x i8>* %s1, align 4
%s2 = getelementptr <16 x i8>, <16 x i8>* %src, i32 1		%s2 = getelementptr <16 x i8>, <16 x i8>* %src, i32 1
%l2 = load <16 x i8>, <16 x i8>* %s2, align 4		%l2 = load <16 x i8>, <16 x i8>* %s2, align 4
%s3 = getelementptr <16 x i8>, <16 x i8>* %src, i32 2		%s3 = getelementptr <16 x i8>, <16 x i8>* %src, i32 2
%l3 = load <16 x i8>, <16 x i8>* %s3, align 4		%l3 = load <16 x i8>, <16 x i8>* %s3, align 4
▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	entry:
%s = shufflevector <4 x float> %t1, <4 x float> %t2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>		%s = shufflevector <4 x float> %t1, <4 x float> %t2, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>
store <8 x float> %s, <8 x float> *%dst		store <8 x float> %s, <8 x float> *%dst
ret void		ret void
}		}

define void @vst4_v4f32(<4 x float> %src, <16 x float> %dst) {		define void @vst4_v4f32(<4 x float> %src, <16 x float> %dst) {
; CHECK-LABEL: vst4_v4f32:		; CHECK-LABEL: vst4_v4f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13}		; CHECK-NEXT: vldrw.u32 q2, [r0, #32]
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: vldrw.u32 q2, [r0]
; CHECK-NEXT: vldrw.u32 q3, [r0, #32]
; CHECK-NEXT: vldrw.u32 q1, [r0, #16]		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vmov.f32 s0, s9		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vmov.32 r0, q3[1]		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vdup.32 q4, r0		; CHECK-NEXT: vst40.32 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.f32 s1, s5		; CHECK-NEXT: vst41.32 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.f32 s2, s18		; CHECK-NEXT: vst42.32 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.32 r0, q3[0]		; CHECK-NEXT: vst43.32 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.f32 s3, s19
; CHECK-NEXT: vdup.32 q4, r0
; CHECK-NEXT: vmov.f32 s9, s4
; CHECK-NEXT: vmov.32 r0, q3[3]
; CHECK-NEXT: vmov.f32 s16, s8
; CHECK-NEXT: vdup.32 q6, r0
; CHECK-NEXT: vmov.f32 s20, s11
; CHECK-NEXT: vmov.32 r0, q3[2]
; CHECK-NEXT: vmov.f32 s8, s10
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: vmov.f32 s21, s7
; CHECK-NEXT: vmov.f32 s17, s4
; CHECK-NEXT: vmov.f32 s9, s6
; CHECK-NEXT: vdup.32 q1, r0
; CHECK-NEXT: vmov.f32 s22, s26
; CHECK-NEXT: vstrw.32 q4, [r1]
; CHECK-NEXT: vmov.f32 s10, s6
; CHECK-NEXT: vmov.f32 s23, s27
; CHECK-NEXT: vmov.f32 s11, s7
; CHECK-NEXT: vstrw.32 q5, [r1, #48]
; CHECK-NEXT: vstrw.32 q2, [r1, #32]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <4 x float>, <4 x float>* %src, i32 0		%s1 = getelementptr <4 x float>, <4 x float>* %src, i32 0
%l1 = load <4 x float>, <4 x float>* %s1, align 4		%l1 = load <4 x float>, <4 x float>* %s1, align 4
%s2 = getelementptr <4 x float>, <4 x float>* %src, i32 1		%s2 = getelementptr <4 x float>, <4 x float>* %src, i32 1
%l2 = load <4 x float>, <4 x float>* %s2, align 4		%l2 = load <4 x float>, <4 x float>* %s2, align 4
%s3 = getelementptr <4 x float>, <4 x float>* %src, i32 2		%s3 = getelementptr <4 x float>, <4 x float>* %src, i32 2
%l3 = load <4 x float>, <4 x float>* %s3, align 4		%l3 = load <4 x float>, <4 x float>* %s3, align 4
%s4 = getelementptr <4 x float>, <4 x float>* %src, i32 3		%s4 = getelementptr <4 x float>, <4 x float>* %src, i32 3
%l4 = load <4 x float>, <4 x float>* %s3, align 4		%l4 = load <4 x float>, <4 x float>* %s3, align 4
%t1 = shufflevector <4 x float> %l1, <4 x float> %l2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%t1 = shufflevector <4 x float> %l1, <4 x float> %l2, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%t2 = shufflevector <4 x float> %l3, <4 x float> %l4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%t2 = shufflevector <4 x float> %l3, <4 x float> %l4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%s = shufflevector <8 x float> %t1, <8 x float> %t2, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>		%s = shufflevector <8 x float> %t1, <8 x float> %t2, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
store <16 x float> %s, <16 x float> *%dst		store <16 x float> %s, <16 x float> *%dst
ret void		ret void
}		}

define void @vst4_v8f32(<8 x float> %src, <32 x float> %dst) {		define void @vst4_v8f32(<8 x float> %src, <32 x float> %dst) {
; CHECK-LABEL: vst4_v8f32:		; CHECK-LABEL: vst4_v8f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #48
; CHECK-NEXT: sub sp, #48
; CHECK-NEXT: vldrw.u32 q3, [r0]
; CHECK-NEXT: vldrw.u32 q5, [r0, #64]
; CHECK-NEXT: vldrw.u32 q4, [r0, #32]
; CHECK-NEXT: vmov.f64 d2, d6
; CHECK-NEXT: vmov.32 r2, q5[0]
; CHECK-NEXT: vdup.32 q0, r2
; CHECK-NEXT: vmov.f32 s5, s16
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vstrw.32 q1, [sp, #32] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r2, q5[1]
; CHECK-NEXT: vdup.32 q0, r2
; CHECK-NEXT: vmov.f32 s16, s13
; CHECK-NEXT: vmov.f32 s0, s13
; CHECK-NEXT: vmov.f32 s1, s17
; CHECK-NEXT: vmov.f64 d2, d7
; CHECK-NEXT: vstrw.32 q0, [sp, #16] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r2, q5[2]
; CHECK-NEXT: vdup.32 q0, r2
; CHECK-NEXT: vmov.f32 s5, s18
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vmov.f32 s12, s15
; CHECK-NEXT: vstrw.32 q1, [sp] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r2, q5[3]
; CHECK-NEXT: vldrw.u32 q7, [r0, #16]
; CHECK-NEXT: vldrw.u32 q2, [r0, #80]		; CHECK-NEXT: vldrw.u32 q2, [r0, #80]
; CHECK-NEXT: vldrw.u32 q6, [r0, #48]		; CHECK-NEXT: vldrw.u32 q6, [r0, #64]
; CHECK-NEXT: vmov.f32 s13, s19		; CHECK-NEXT: vldrw.u32 q5, [r0, #32]
; CHECK-NEXT: vdup.32 q0, r2		; CHECK-NEXT: vldrw.u32 q1, [r0, #48]
; CHECK-NEXT: vmov.f32 s14, s2		; CHECK-NEXT: vldrw.u32 q4, [r0]
; CHECK-NEXT: vmov.32 r0, q2[0]		; CHECK-NEXT: vldrw.u32 q0, [r0, #16]
; CHECK-NEXT: vmov.f64 d10, d14		; CHECK-NEXT: vmov q7, q6
; CHECK-NEXT: vmov.f32 s15, s3		; CHECK-NEXT: add.w r0, r1, #64
; CHECK-NEXT: vdup.32 q0, r0		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vmov.f32 s21, s24		; CHECK-NEXT: vst40.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.32 r0, q2[1]		; CHECK-NEXT: vst41.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s22, s2		; CHECK-NEXT: vst42.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s23, s3		; CHECK-NEXT: vst43.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vdup.32 q0, r0		; CHECK-NEXT: vst40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f64 d8, d15		; CHECK-NEXT: vst41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.32 r0, q2[2]		; CHECK-NEXT: vst42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vdup.32 q1, r0		; CHECK-NEXT: vst43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.32 r0, q2[3]
; CHECK-NEXT: vdup.32 q2, r0
; CHECK-NEXT: vstrw.32 q5, [r1, #64]
; CHECK-NEXT: vstrw.32 q3, [r1, #48]
; CHECK-NEXT: vmov.f32 s0, s29
; CHECK-NEXT: vmov.f32 s17, s26
; CHECK-NEXT: vmov.f32 s24, s29
; CHECK-NEXT: vmov.f32 s1, s25
; CHECK-NEXT: vstrw.32 q0, [r1, #80]
; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s18, s6
; CHECK-NEXT: vmov.f32 s19, s7
; CHECK-NEXT: vstrw.32 q0, [r1, #32]
; CHECK-NEXT: vmov.f32 s4, s31
; CHECK-NEXT: vldrw.u32 q0, [sp, #32] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s5, s27
; CHECK-NEXT: vstrw.32 q4, [r1, #96]
; CHECK-NEXT: vmov.f32 s6, s10
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s7, s11
; CHECK-NEXT: vstrw.32 q1, [r1, #112]
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: add sp, #48
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <8 x float>, <8 x float>* %src, i32 0		%s1 = getelementptr <8 x float>, <8 x float>* %src, i32 0
%l1 = load <8 x float>, <8 x float>* %s1, align 4		%l1 = load <8 x float>, <8 x float>* %s1, align 4
%s2 = getelementptr <8 x float>, <8 x float>* %src, i32 1		%s2 = getelementptr <8 x float>, <8 x float>* %src, i32 1
%l2 = load <8 x float>, <8 x float>* %s2, align 4		%l2 = load <8 x float>, <8 x float>* %s2, align 4
%s3 = getelementptr <8 x float>, <8 x float>* %src, i32 2		%s3 = getelementptr <8 x float>, <8 x float>* %src, i32 2
%l3 = load <8 x float>, <8 x float>* %s3, align 4		%l3 = load <8 x float>, <8 x float>* %s3, align 4
%s4 = getelementptr <8 x float>, <8 x float>* %src, i32 3		%s4 = getelementptr <8 x float>, <8 x float>* %src, i32 3
%l4 = load <8 x float>, <8 x float>* %s3, align 4		%l4 = load <8 x float>, <8 x float>* %s3, align 4
%t1 = shufflevector <8 x float> %l1, <8 x float> %l2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t1 = shufflevector <8 x float> %l1, <8 x float> %l2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%t2 = shufflevector <8 x float> %l3, <8 x float> %l4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t2 = shufflevector <8 x float> %l3, <8 x float> %l4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%s = shufflevector <16 x float> %t1, <16 x float> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>		%s = shufflevector <16 x float> %t1, <16 x float> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>
store <32 x float> %s, <32 x float> *%dst		store <32 x float> %s, <32 x float> *%dst
ret void		ret void
}		}

define void @vst4_v16f32(<16 x float> %src, <64 x float> %dst) {		define void @vst4_v16f32(<16 x float> %src, <64 x float> %dst) {
; CHECK-LABEL: vst4_v16f32:		; CHECK-LABEL: vst4_v16f32:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
		; CHECK-NEXT: .save {r4, r5}
		; CHECK-NEXT: push {r4, r5}
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: .pad #160		; CHECK-NEXT: .pad #152
; CHECK-NEXT: sub sp, #160		; CHECK-NEXT: sub sp, #152
; CHECK-NEXT: vldrw.u32 q2, [r0, #48]		; CHECK-NEXT: vldrw.u32 q2, [r0, #176]
; CHECK-NEXT: vldrw.u32 q1, [r0, #176]		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q0, [r0, #112]		; CHECK-NEXT: vstmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vmov.32 r2, q1[2]		; CHECK-NEXT: vldrw.u32 q2, [r0, #160]
; CHECK-NEXT: vmov.f64 d8, d5		; CHECK-NEXT: vldrw.u32 q6, [r0, #128]
; CHECK-NEXT: vdup.32 q3, r2		; CHECK-NEXT: vstmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vldrw.u32 q6, [r0, #160]		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q5, [r0, #128]		; CHECK-NEXT: vldrw.u32 q5, [r0, #64]
; CHECK-NEXT: vldrw.u32 q7, [r0, #64]		; CHECK-NEXT: vldmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vstrw.32 q6, [sp, #144] @ 16-byte Spill		; CHECK-NEXT: vldrw.u32 q4, [r0]
; CHECK-NEXT: vldrw.u32 q6, [r0, #96]		; CHECK-NEXT: vldrw.u32 q1, [r0, #112]
; CHECK-NEXT: vmov.f32 s17, s2		; CHECK-NEXT: vmov q7, q6
; CHECK-NEXT: vmov.f32 s18, s14		; CHECK-NEXT: vstmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vstrw.32 q6, [sp, #128] @ 16-byte Spill		; CHECK-NEXT: vldmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vmov.f32 s19, s15		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q3, [r0, #144]		; CHECK-NEXT: vldrw.u32 q1, [r0, #96]
; CHECK-NEXT: vldrw.u32 q6, [r0, #32]		; CHECK-NEXT: vstmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vstrw.32 q3, [sp, #96] @ 16-byte Spill		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q3, [r0, #80]		; CHECK-NEXT: vldmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vstrw.32 q6, [sp, #112] @ 16-byte Spill		; CHECK-NEXT: vldrw.u32 q0, [r0, #48]
; CHECK-NEXT: vstrw.32 q3, [sp, #80] @ 16-byte Spill		; CHECK-NEXT: vstmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vldrw.u32 q3, [r0, #16]		; CHECK-NEXT: vldmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vstrw.32 q3, [sp, #64] @ 16-byte Spill		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vldrw.u32 q3, [r0]		; CHECK-NEXT: vldrw.u32 q0, [r0, #32]
; CHECK-NEXT: vstrw.32 q4, [r1, #224]		; CHECK-NEXT: vstmia r2, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Spill
; CHECK-NEXT: vmov.f32 s16, s11		; CHECK-NEXT: vldrw.u32 q2, [r0, #144]
; CHECK-NEXT: vmov.32 r0, q1[3]		; CHECK-NEXT: vldrw.u32 q1, [r0, #80]
; CHECK-NEXT: vmov.f32 s17, s3		; CHECK-NEXT: add r2, sp, #64
; CHECK-NEXT: vdup.32 q6, r0		; CHECK-NEXT: vldrw.u32 q0, [r0, #16]
; CHECK-NEXT: vmov.f32 s18, s26		; CHECK-NEXT: add.w r0, r1, #64
; CHECK-NEXT: vmov.f32 s19, s27		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vstrw.32 q4, [r1, #240]		; CHECK-NEXT: vst40.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f64 d8, d4		; CHECK-NEXT: vst41.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.32 r0, q1[0]		; CHECK-NEXT: vst42.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vdup.32 q6, r0		; CHECK-NEXT: vst43.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f32 s17, s0		; CHECK-NEXT: vst40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s18, s26		; CHECK-NEXT: vst41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s19, s27		; CHECK-NEXT: vst42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vstrw.32 q4, [r1, #192]		; CHECK-NEXT: vst43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.32 r0, q1[1]		; CHECK-NEXT: add.w r0, r1, #192
; CHECK-NEXT: vmov.f32 s0, s9		; CHECK-NEXT: vldmia sp, {d0, d1, d2, d3, d4, d5, d6, d7} @ 64-byte Reload
; CHECK-NEXT: vdup.32 q1, r0		; CHECK-NEXT: adds r1, #128
; CHECK-NEXT: vmov.f32 s2, s6		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vldrw.u32 q4, [sp, #112] @ 16-byte Reload		; CHECK-NEXT: vldmia r2, {d8, d9, d10, d11, d12, d13, d14, d15} @ 64-byte Reload
; CHECK-NEXT: vldrw.u32 q2, [sp, #144] @ 16-byte Reload		; CHECK-NEXT: vmov q7, q6
; CHECK-NEXT: vmov.f32 s3, s7		; CHECK-NEXT: vst40.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vldrw.u32 q6, [sp, #128] @ 16-byte Reload		; CHECK-NEXT: vst41.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vstrw.32 q0, [r1, #208]		; CHECK-NEXT: vst42.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.f64 d0, d9		; CHECK-NEXT: vst43.32 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.32 r0, q2[2]		; CHECK-NEXT: vst40.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vdup.32 q1, r0		; CHECK-NEXT: vst41.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s1, s26		; CHECK-NEXT: vst42.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s2, s6		; CHECK-NEXT: vst43.32 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.f32 s3, s7		; CHECK-NEXT: add sp, #152
; CHECK-NEXT: vstrw.32 q0, [r1, #160]
; CHECK-NEXT: vmov.f32 s0, s19
; CHECK-NEXT: vmov.32 r0, q2[3]
; CHECK-NEXT: vmov.f32 s1, s27
; CHECK-NEXT: vdup.32 q1, r0
; CHECK-NEXT: vmov.f32 s8, s15
; CHECK-NEXT: vmov.f32 s2, s6
; CHECK-NEXT: vmov.f32 s3, s7
; CHECK-NEXT: vmov.f64 d2, d6
; CHECK-NEXT: vstrw.32 q0, [r1, #176]
; CHECK-NEXT: vmov.32 r0, q5[0]
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s5, s28
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vstrw.32 q1, [sp, #48] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r0, q5[1]
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s28, s13
; CHECK-NEXT: vmov.f32 s0, s13
; CHECK-NEXT: vmov.f32 s1, s29
; CHECK-NEXT: vmov.f64 d2, d7
; CHECK-NEXT: vstrw.32 q0, [sp, #32] @ 16-byte Spill
; CHECK-NEXT: vmov.32 r0, q5[2]
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.32 r0, q5[3]
; CHECK-NEXT: vldrw.u32 q3, [sp, #96] @ 16-byte Reload
; CHECK-NEXT: vldrw.u32 q5, [sp, #80] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s5, s30
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s9, s31
; CHECK-NEXT: vstrw.32 q1, [sp, #16] @ 16-byte Spill
; CHECK-NEXT: vmov.f32 s10, s2
; CHECK-NEXT: vmov.f32 s11, s3
; CHECK-NEXT: vstrw.32 q2, [sp] @ 16-byte Spill
; CHECK-NEXT: vldrw.u32 q2, [sp, #64] @ 16-byte Reload
; CHECK-NEXT: vmov.32 r0, q3[0]
; CHECK-NEXT: vmov.f64 d12, d4
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.32 r0, q3[1]
; CHECK-NEXT: vdup.32 q7, r0
; CHECK-NEXT: vmov.32 r0, q3[2]
; CHECK-NEXT: vmov.f32 s25, s20
; CHECK-NEXT: vmov.f32 s26, s2
; CHECK-NEXT: vmov.f64 d8, d5
; CHECK-NEXT: vmov.f32 s27, s3
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s20, s9
; CHECK-NEXT: vmov.32 r0, q3[3]
; CHECK-NEXT: vmov.f32 s17, s22
; CHECK-NEXT: vldrw.u32 q3, [sp, #128] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s18, s2
; CHECK-NEXT: vmov.f32 s4, s11
; CHECK-NEXT: vmov.f32 s28, s9
; CHECK-NEXT: vldrw.u32 q2, [sp, #112] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s19, s3
; CHECK-NEXT: vdup.32 q0, r0
; CHECK-NEXT: vmov.f32 s5, s23
; CHECK-NEXT: vmov.f32 s6, s2
; CHECK-NEXT: vmov.f32 s7, s3
; CHECK-NEXT: vmov.f64 d0, d4
; CHECK-NEXT: vmov.f32 s1, s12
; CHECK-NEXT: vmov.f32 s12, s9
; CHECK-NEXT: vldrw.u32 q2, [sp, #144] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s29, s21
; CHECK-NEXT: vmov.32 r0, q2[0]
; CHECK-NEXT: vdup.32 q5, r0
; CHECK-NEXT: vmov.32 r0, q2[1]
; CHECK-NEXT: vmov.f32 s2, s22
; CHECK-NEXT: vdup.32 q2, r0
; CHECK-NEXT: vmov.f32 s3, s23
; CHECK-NEXT: vstrw.32 q4, [r1, #96]
; CHECK-NEXT: vstrw.32 q0, [r1, #128]
; CHECK-NEXT: vldrw.u32 q0, [sp, #16] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s14, s10
; CHECK-NEXT: vstrw.32 q1, [r1, #112]
; CHECK-NEXT: vstrw.32 q0, [r1, #32]
; CHECK-NEXT: vldrw.u32 q0, [sp] @ 16-byte Reload
; CHECK-NEXT: vmov.f32 s15, s11
; CHECK-NEXT: vstrw.32 q6, [r1, #64]
; CHECK-NEXT: vstrw.32 q0, [r1, #48]
; CHECK-NEXT: vldrw.u32 q0, [sp, #48] @ 16-byte Reload
; CHECK-NEXT: vstrw.32 q3, [r1, #144]
; CHECK-NEXT: vstrw.32 q7, [r1, #80]
; CHECK-NEXT: vstrw.32 q0, [r1]
; CHECK-NEXT: vldrw.u32 q0, [sp, #32] @ 16-byte Reload
; CHECK-NEXT: vstrw.32 q0, [r1, #16]
; CHECK-NEXT: add sp, #160
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
		; CHECK-NEXT: pop {r4, r5}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x float>, <16 x float>* %src, i32 0		%s1 = getelementptr <16 x float>, <16 x float>* %src, i32 0
%l1 = load <16 x float>, <16 x float>* %s1, align 4		%l1 = load <16 x float>, <16 x float>* %s1, align 4
%s2 = getelementptr <16 x float>, <16 x float>* %src, i32 1		%s2 = getelementptr <16 x float>, <16 x float>* %src, i32 1
%l2 = load <16 x float>, <16 x float>* %s2, align 4		%l2 = load <16 x float>, <16 x float>* %s2, align 4
%s3 = getelementptr <16 x float>, <16 x float>* %src, i32 2		%s3 = getelementptr <16 x float>, <16 x float>* %src, i32 2
%l3 = load <16 x float>, <16 x float>* %s3, align 4		%l3 = load <16 x float>, <16 x float>* %s3, align 4
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	entry:
%s = shufflevector <8 x half> %t1, <8 x half> %t2, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>		%s = shufflevector <8 x half> %t1, <8 x half> %t2, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
store <16 x half> %s, <16 x half> *%dst		store <16 x half> %s, <16 x half> *%dst
ret void		ret void
}		}

define void @vst4_v8f16(<8 x half> %src, <32 x half> %dst) {		define void @vst4_v8f16(<8 x half> %src, <32 x half> %dst) {
; CHECK-LABEL: vst4_v8f16:		; CHECK-LABEL: vst4_v8f16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8}		; CHECK-NEXT: vldrw.u32 q2, [r0, #32]
; CHECK-NEXT: vpush {d8}		; CHECK-NEXT: vldrw.u32 q1, [r0, #16]
; CHECK-NEXT: vldrw.u32 q2, [r0]		; CHECK-NEXT: vldrw.u32 q0, [r0]
; CHECK-NEXT: vldrw.u32 q0, [r0, #16]		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vldrw.u32 q1, [r0, #32]		; CHECK-NEXT: vst40.16 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov r2, s11		; CHECK-NEXT: vst41.16 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmovx.f16 s16, s11		; CHECK-NEXT: vst42.16 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov.16 q3[0], r2		; CHECK-NEXT: vst43.16 {q0, q1, q2, q3}, [r1]
; CHECK-NEXT: vmov r3, s3
; CHECK-NEXT: vmov.16 q3[1], r3
; CHECK-NEXT: vmov r0, s7
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov r2, s2
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s3
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s7
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmovx.f16 s16, s10
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vmov r0, s10
; CHECK-NEXT: vstrw.32 q3, [r1, #48]
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r0, s6
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov r2, s1
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s2
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s6
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmovx.f16 s16, s9
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vmov r0, s9
; CHECK-NEXT: vstrw.32 q3, [r1, #32]
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r0, s5
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov r2, s0
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s1
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s5
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmovx.f16 s0, s0
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vstrw.32 q3, [r1, #16]
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmovx.f16 s8, s8
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmovx.f16 s0, s4
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vstrw.32 q3, [r1]
; CHECK-NEXT: vpop {d8}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <8 x half>, <8 x half>* %src, i32 0		%s1 = getelementptr <8 x half>, <8 x half>* %src, i32 0
%l1 = load <8 x half>, <8 x half>* %s1, align 4		%l1 = load <8 x half>, <8 x half>* %s1, align 4
%s2 = getelementptr <8 x half>, <8 x half>* %src, i32 1		%s2 = getelementptr <8 x half>, <8 x half>* %src, i32 1
%l2 = load <8 x half>, <8 x half>* %s2, align 4		%l2 = load <8 x half>, <8 x half>* %s2, align 4
%s3 = getelementptr <8 x half>, <8 x half>* %src, i32 2		%s3 = getelementptr <8 x half>, <8 x half>* %src, i32 2
%l3 = load <8 x half>, <8 x half>* %s3, align 4		%l3 = load <8 x half>, <8 x half>* %s3, align 4
%s4 = getelementptr <8 x half>, <8 x half>* %src, i32 3		%s4 = getelementptr <8 x half>, <8 x half>* %src, i32 3
%l4 = load <8 x half>, <8 x half>* %s3, align 4		%l4 = load <8 x half>, <8 x half>* %s3, align 4
%t1 = shufflevector <8 x half> %l1, <8 x half> %l2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t1 = shufflevector <8 x half> %l1, <8 x half> %l2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%t2 = shufflevector <8 x half> %l3, <8 x half> %l4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%t2 = shufflevector <8 x half> %l3, <8 x half> %l4, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%s = shufflevector <16 x half> %t1, <16 x half> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>		%s = shufflevector <16 x half> %t1, <16 x half> %t2, <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>
store <32 x half> %s, <32 x half> *%dst		store <32 x half> %s, <32 x half> *%dst
ret void		ret void
}		}

define void @vst4_v16f16(<16 x half> %src, <64 x half> %dst) {		define void @vst4_v16f16(<16 x half> %src, <64 x half> %dst) {
; CHECK-LABEL: vst4_v16f16:		; CHECK-LABEL: vst4_v16f16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14}		; CHECK-NEXT: .vsave {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14}		; CHECK-NEXT: vpush {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vldrw.u32 q5, [r0, #16]		; CHECK-NEXT: vldrw.u32 q2, [r0, #80]
; CHECK-NEXT: vldrw.u32 q3, [r0, #48]		; CHECK-NEXT: vldrw.u32 q6, [r0, #64]
; CHECK-NEXT: vldrw.u32 q4, [r0, #80]		; CHECK-NEXT: vldrw.u32 q1, [r0, #48]
; CHECK-NEXT: vldrw.u32 q1, [r0, #32]		; CHECK-NEXT: vldrw.u32 q5, [r0, #32]
; CHECK-NEXT: vmov r3, s22		; CHECK-NEXT: vldrw.u32 q0, [r0, #16]
; CHECK-NEXT: vmovx.f16 s0, s22		; CHECK-NEXT: vldrw.u32 q4, [r0]
; CHECK-NEXT: vmov r2, s14		; CHECK-NEXT: vmov q7, q6
; CHECK-NEXT: vmov.16 q6[0], r3		; CHECK-NEXT: vmov q3, q2
; CHECK-NEXT: vmov.16 q6[1], r2		; CHECK-NEXT: add.w r0, r1, #64
; CHECK-NEXT: vmov r2, s18		; CHECK-NEXT: vst40.16 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.16 q6[2], r2		; CHECK-NEXT: vst41.16 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vldrw.u32 q2, [r0]		; CHECK-NEXT: vst42.16 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov.16 q6[3], r2		; CHECK-NEXT: vst43.16 {q4, q5, q6, q7}, [r1]
; CHECK-NEXT: vmov r2, s0		; CHECK-NEXT: vst40.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmovx.f16 s0, s14		; CHECK-NEXT: vst41.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov.16 q6[4], r2		; CHECK-NEXT: vst42.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmov r2, s0		; CHECK-NEXT: vst43.16 {q0, q1, q2, q3}, [r0]
; CHECK-NEXT: vmovx.f16 s0, s18		; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14, d15}
; CHECK-NEXT: vmov.16 q6[5], r2
; CHECK-NEXT: vmov r2, s0
; CHECK-NEXT: vmov.16 q6[6], r2
; CHECK-NEXT: vldrw.u32 q0, [r0, #64]
; CHECK-NEXT: vmov.16 q6[7], r2
; CHECK-NEXT: vmov r0, s23
; CHECK-NEXT: vstrw.32 q6, [r1, #96]
; CHECK-NEXT: vmov.16 q6[0], r0
; CHECK-NEXT: vmov r2, s15
; CHECK-NEXT: vmovx.f16 s28, s23
; CHECK-NEXT: vmov.16 q6[1], r2
; CHECK-NEXT: vmov r0, s19
; CHECK-NEXT: vmov.16 q6[2], r0
; CHECK-NEXT: vmov r2, s12
; CHECK-NEXT: vmov.16 q6[3], r0
; CHECK-NEXT: vmov r0, s28
; CHECK-NEXT: vmovx.f16 s28, s15
; CHECK-NEXT: vmov.16 q6[4], r0
; CHECK-NEXT: vmov r0, s28
; CHECK-NEXT: vmovx.f16 s28, s19
; CHECK-NEXT: vmov.16 q6[5], r0
; CHECK-NEXT: vmov r0, s28
; CHECK-NEXT: vmov.16 q6[6], r0
; CHECK-NEXT: vmovx.f16 s28, s20
; CHECK-NEXT: vmov.16 q6[7], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vstrw.32 q6, [r1, #112]
; CHECK-NEXT: vmov.16 q6[0], r0
; CHECK-NEXT: vmov.16 q6[1], r2
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmov.16 q6[2], r0
; CHECK-NEXT: vmov r2, s13
; CHECK-NEXT: vmov.16 q6[3], r0
; CHECK-NEXT: vmov r0, s28
; CHECK-NEXT: vmovx.f16 s28, s12
; CHECK-NEXT: vmov.16 q6[4], r0
; CHECK-NEXT: vmov r0, s28
; CHECK-NEXT: vmovx.f16 s28, s16
; CHECK-NEXT: vmov.16 q6[5], r0
; CHECK-NEXT: vmov r0, s28
; CHECK-NEXT: vmov.16 q6[6], r0
; CHECK-NEXT: vmovx.f16 s20, s21
; CHECK-NEXT: vmov.16 q6[7], r0
; CHECK-NEXT: vmov r0, s21
; CHECK-NEXT: vstrw.32 q6, [r1, #64]
; CHECK-NEXT: vmov.16 q6[0], r0
; CHECK-NEXT: vmov.16 q6[1], r2
; CHECK-NEXT: vmov r0, s17
; CHECK-NEXT: vmov.16 q6[2], r0
; CHECK-NEXT: vmovx.f16 s12, s13
; CHECK-NEXT: vmov.16 q6[3], r0
; CHECK-NEXT: vmov r0, s20
; CHECK-NEXT: vmov.16 q6[4], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s12, s17
; CHECK-NEXT: vmov.16 q6[5], r0
; CHECK-NEXT: vmov r0, s12
; CHECK-NEXT: vmovx.f16 s16, s10
; CHECK-NEXT: vmov.16 q6[6], r0
; CHECK-NEXT: vmov r2, s6
; CHECK-NEXT: vmov.16 q6[7], r0
; CHECK-NEXT: vmov r0, s10
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov r0, s2
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r2, s7
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vstrw.32 q6, [r1, #80]
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s6
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s2
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmovx.f16 s16, s11
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vmov r0, s11
; CHECK-NEXT: vstrw.32 q3, [r1, #32]
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r0, s3
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov r2, s4
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s7
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s3
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmovx.f16 s16, s8
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vstrw.32 q3, [r1, #48]
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmov r2, s5
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s4
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmovx.f16 s16, s0
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s16
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmovx.f16 s8, s9
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vmov r0, s9
; CHECK-NEXT: vstrw.32 q3, [r1]
; CHECK-NEXT: vmov.16 q3[0], r0
; CHECK-NEXT: vmov.16 q3[1], r2
; CHECK-NEXT: vmov r0, s1
; CHECK-NEXT: vmov.16 q3[2], r0
; CHECK-NEXT: vmovx.f16 s4, s5
; CHECK-NEXT: vmov.16 q3[3], r0
; CHECK-NEXT: vmov r0, s8
; CHECK-NEXT: vmov.16 q3[4], r0
; CHECK-NEXT: vmov r0, s4
; CHECK-NEXT: vmovx.f16 s0, s1
; CHECK-NEXT: vmov.16 q3[5], r0
; CHECK-NEXT: vmov r0, s0
; CHECK-NEXT: vmov.16 q3[6], r0
; CHECK-NEXT: vmov.16 q3[7], r0
; CHECK-NEXT: vstrw.32 q3, [r1, #16]
; CHECK-NEXT: vpop {d8, d9, d10, d11, d12, d13, d14}
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s1 = getelementptr <16 x half>, <16 x half>* %src, i32 0		%s1 = getelementptr <16 x half>, <16 x half>* %src, i32 0
%l1 = load <16 x half>, <16 x half>* %s1, align 4		%l1 = load <16 x half>, <16 x half>* %s1, align 4
%s2 = getelementptr <16 x half>, <16 x half>* %src, i32 1		%s2 = getelementptr <16 x half>, <16 x half>* %src, i32 1
%l2 = load <16 x half>, <16 x half>* %s2, align 4		%l2 = load <16 x half>, <16 x half>* %s2, align 4
%s3 = getelementptr <16 x half>, <16 x half>* %src, i32 2		%s3 = getelementptr <16 x half>, <16 x half>* %src, i32 2
%l3 = load <16 x half>, <16 x half>* %s3, align 4		%l3 = load <16 x half>, <16 x half>* %s3, align 4
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/test/Transforms/InterleavedAccess/ARM/interleaved-accesses.ll

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
; CHECK-NEON-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } @llvm.arm.neon.vld4.v4i32.p0i8(i8 [[TMP1]], i32 4)		; CHECK-NEON-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } @llvm.arm.neon.vld4.v4i32.p0i8(i8 [[TMP1]], i32 4)
; CHECK-NEON-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 3		; CHECK-NEON-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 3
; CHECK-NEON-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 2		; CHECK-NEON-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 2
; CHECK-NEON-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 1		; CHECK-NEON-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 1
; CHECK-NEON-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 0		; CHECK-NEON-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 0
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @load_factor4(		; CHECK-MVE-LABEL: @load_factor4(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP1:%.]] = bitcast <16 x i32> [[PTR:%.]] to i32
; CHECK-MVE-NEXT: [[V0:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>		; CHECK-MVE-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } @llvm.arm.mve.vld4q.v4i32.p0i32(i32 [[TMP1]])
; CHECK-MVE-NEXT: [[V1:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>		; CHECK-MVE-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 3
; CHECK-MVE-NEXT: [[V2:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>		; CHECK-MVE-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 2
; CHECK-MVE-NEXT: [[V3:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>		; CHECK-MVE-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 1
		; CHECK-MVE-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 0
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @load_factor4(		; CHECK-NONE-LABEL: @load_factor4(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>		; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>		; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>
; CHECK-NONE-NEXT: [[V2:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>		; CHECK-NONE-NEXT: [[V2:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>
; CHECK-NONE-NEXT: [[V3:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>		; CHECK-NONE-NEXT: [[V3:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = load <16 x i32>, <16 x i32>* %ptr, align 4		%interleaved.vec = load <16 x i32>, <16 x i32>* %ptr, align 4
%v0 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>		%v0 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
%v1 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>		%v1 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>
%v2 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>		%v2 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>
%v3 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>		%v3 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>
ret void		ret void
}		}

define void @store_factor2(<16 x i8>* %ptr, <8 x i8> %v0, <8 x i8> %v1) {		define void @store_factor2(<16 x i8>* %ptr, <8 x i8> %v0, <8 x i8> %v1) {
; CHECK-NEON-LABEL: @store_factor2(		; CHECK-NEON-LABEL: @store_factor2(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <16 x i8> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <8 x i8> [[V0:%.]], <8 x i8> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <8 x i8> [[V0:%.]], <8 x i8> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <8 x i8> [[V0]], <8 x i8> [[V1]], <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i8> [[V0]], <8 x i8> [[V1]], <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEON-NEXT: [[TMP3:%.]] = bitcast <16 x i8> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v8i8(i8* [[TMP1]], <8 x i8> [[TMP2]], <8 x i8> [[TMP3]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v8i8(i8* [[TMP3]], <8 x i8> [[TMP1]], <8 x i8> [[TMP2]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_factor2(		; CHECK-MVE-LABEL: @store_factor2(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <8 x i8> [[V0:%.]], <8 x i8> [[V1:%.*]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <8 x i8> [[V0:%.]], <8 x i8> [[V1:%.*]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
; CHECK-MVE-NEXT: store <16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_factor2(		; CHECK-NONE-LABEL: @store_factor2(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <8 x i8> [[V0:%.]], <8 x i8> [[V1:%.*]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <8 x i8> [[V0:%.]], <8 x i8> [[V1:%.*]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
; CHECK-NONE-NEXT: store <16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <8 x i8> %v0, <8 x i8> %v1, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%interleaved.vec = shufflevector <8 x i8> %v0, <8 x i8> %v1, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
store <16 x i8> %interleaved.vec, <16 x i8>* %ptr, align 4		store <16 x i8> %interleaved.vec, <16 x i8>* %ptr, align 4
ret void		ret void
}		}

define void @store_factor3(<12 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2) {		define void @store_factor3(<12 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2) {
; CHECK-NEON-LABEL: @store_factor3(		; CHECK-NEON-LABEL: @store_factor3(
; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>		; CHECK-NEON-NEXT: [[TMP4:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP4]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_factor3(		; CHECK-MVE-LABEL: @store_factor3(
; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>
; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
Show All 11 Lines	;
store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4		store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_factor4(<16 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3) {		define void @store_factor4(<16 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3) {
; CHECK-NEON-LABEL: @store_factor4(		; CHECK-NEON-LABEL: @store_factor4(
; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <16 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>		; CHECK-NEON-NEXT: [[TMP5:%.]] = bitcast <16 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_factor4(		; CHECK-MVE-LABEL: @store_factor4(
; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>		; CHECK-MVE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-MVE-NEXT: store <16 x i32> [[INTERLEAVED_VEC]], <16 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; CHECK-MVE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; CHECK-MVE-NEXT: [[TMP5:%.]] = bitcast <16 x i32> [[PTR:%.]] to i32
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 0)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 1)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 2)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 3)
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_factor4(		; CHECK-NONE-LABEL: @store_factor4(
; CHECK-NONE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NONE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NONE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NONE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
; CHECK-NONE-NEXT: store <16 x i32> [[INTERLEAVED_VEC]], <16 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <16 x i32> [[INTERLEAVED_VEC]], <16 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	;
%v3 = shufflevector <8 x i32> %interleaved.vec, <8 x i32> undef, <2 x i32> <i32 3, i32 7>		%v3 = shufflevector <8 x i32> %interleaved.vec, <8 x i32> undef, <2 x i32> <i32 3, i32 7>
ret void		ret void
}		}

define void @store_ptrvec_factor2(<4 x i32> %ptr, <2 x i32> %v0, <2 x i32> %v1) {		define void @store_ptrvec_factor2(<4 x i32> %ptr, <2 x i32> %v0, <2 x i32> %v1) {
; CHECK-NEON-LABEL: @store_ptrvec_factor2(		; CHECK-NEON-LABEL: @store_ptrvec_factor2(
; CHECK-NEON-NEXT: [[TMP1:%.]] = ptrtoint <2 x i32> [[V0:%.*]] to <2 x i32>		; CHECK-NEON-NEXT: [[TMP1:%.]] = ptrtoint <2 x i32> [[V0:%.*]] to <2 x i32>
; CHECK-NEON-NEXT: [[TMP2:%.]] = ptrtoint <2 x i32> [[V1:%.*]] to <2 x i32>		; CHECK-NEON-NEXT: [[TMP2:%.]] = ptrtoint <2 x i32> [[V1:%.*]] to <2 x i32>
; CHECK-NEON-NEXT: [[TMP3:%.]] = bitcast <4 x i32>* [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 1>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 1>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP5:%.]] = bitcast <4 x i32>* [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v2i32(i8* [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v2i32(i8* [[TMP5]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_ptrvec_factor2(		; CHECK-MVE-LABEL: @store_ptrvec_factor2(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 2, i32 1, i32 3>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 2, i32 1, i32 3>
; CHECK-MVE-NEXT: store <4 x i32> [[INTERLEAVED_VEC]], <4 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <4 x i32> [[INTERLEAVED_VEC]], <4 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_ptrvec_factor2(		; CHECK-NONE-LABEL: @store_ptrvec_factor2(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 2, i32 1, i32 3>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 2, i32 1, i32 3>
; CHECK-NONE-NEXT: store <4 x i32> [[INTERLEAVED_VEC]], <4 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <4 x i32> [[INTERLEAVED_VEC]], <4 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <2 x i32> %v0, <2 x i32> %v1, <4 x i32> <i32 0, i32 2, i32 1, i32 3>		%interleaved.vec = shufflevector <2 x i32> %v0, <2 x i32> %v1, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
store <4 x i32> %interleaved.vec, <4 x i32>* %ptr, align 4		store <4 x i32> %interleaved.vec, <4 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_ptrvec_factor3(<6 x i32> %ptr, <2 x i32> %v0, <2 x i32> %v1, <2 x i32*> %v2) {		define void @store_ptrvec_factor3(<6 x i32> %ptr, <2 x i32> %v0, <2 x i32> %v1, <2 x i32*> %v2) {
; CHECK-NEON-LABEL: @store_ptrvec_factor3(		; CHECK-NEON-LABEL: @store_ptrvec_factor3(
; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <2 x i32> [[V2:%.]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <2 x i32> [[V2:%.]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEON-NEXT: [[TMP1:%.]] = ptrtoint <4 x i32> [[S0]] to <4 x i32>		; CHECK-NEON-NEXT: [[TMP1:%.]] = ptrtoint <4 x i32> [[S0]] to <4 x i32>
; CHECK-NEON-NEXT: [[TMP2:%.]] = ptrtoint <4 x i32> [[S1]] to <4 x i32>		; CHECK-NEON-NEXT: [[TMP2:%.]] = ptrtoint <4 x i32> [[S1]] to <4 x i32>
; CHECK-NEON-NEXT: [[TMP3:%.]] = bitcast <6 x i32>* [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 0, i32 1>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 0, i32 1>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 4, i32 5>
; CHECK-NEON-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 4, i32 5>		; CHECK-NEON-NEXT: [[TMP6:%.]] = bitcast <6 x i32>* [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v2i32(i8* [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> [[TMP6]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v2i32(i8* [[TMP6]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_ptrvec_factor3(		; CHECK-MVE-LABEL: @store_ptrvec_factor3(
; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <2 x i32> [[V2:%.]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <2 x i32> [[V2:%.]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <4 x i32> [[S0]], <4 x i32*> [[S1]], <6 x i32> <i32 0, i32 2, i32 4, i32 1, i32 3, i32 5>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <4 x i32> [[S0]], <4 x i32*> [[S1]], <6 x i32> <i32 0, i32 2, i32 4, i32 1, i32 3, i32 5>
; CHECK-MVE-NEXT: store <6 x i32> [[INTERLEAVED_VEC]], <6 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <6 x i32> [[INTERLEAVED_VEC]], <6 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
Show All 13 Lines
}		}

define void @store_ptrvec_factor4(<8 x i32> %ptr, <2 x i32> %v0, <2 x i32> %v1, <2 x i32> %v2, <2 x i32> %v3) {		define void @store_ptrvec_factor4(<8 x i32> %ptr, <2 x i32> %v0, <2 x i32> %v1, <2 x i32> %v2, <2 x i32> %v3) {
; CHECK-NEON-LABEL: @store_ptrvec_factor4(		; CHECK-NEON-LABEL: @store_ptrvec_factor4(
; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <2 x i32> [[V2:%.]], <2 x i32> [[V3:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <2 x i32> [[V2:%.]], <2 x i32> [[V3:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP1:%.]] = ptrtoint <4 x i32> [[S0]] to <4 x i32>		; CHECK-NEON-NEXT: [[TMP1:%.]] = ptrtoint <4 x i32> [[S0]] to <4 x i32>
; CHECK-NEON-NEXT: [[TMP2:%.]] = ptrtoint <4 x i32> [[S1]] to <4 x i32>		; CHECK-NEON-NEXT: [[TMP2:%.]] = ptrtoint <4 x i32> [[S1]] to <4 x i32>
; CHECK-NEON-NEXT: [[TMP3:%.]] = bitcast <8 x i32>* [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 0, i32 1>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 0, i32 1>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 4, i32 5>
; CHECK-NEON-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 4, i32 5>		; CHECK-NEON-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <2 x i32> <i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP7:%.]] = bitcast <8 x i32>* [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP7]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> [[TMP6]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_ptrvec_factor4(		; CHECK-MVE-LABEL: @store_ptrvec_factor4(
; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <2 x i32> [[V2:%.]], <2 x i32> [[V3:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <2 x i32> [[V2:%.]], <2 x i32> [[V3:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <4 x i32> [[S0]], <4 x i32*> [[S1]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <4 x i32> [[S0]], <4 x i32*> [[S1]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 1, i32 3, i32 5, i32 7>
; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
Show All 16 Lines
; CHECK-NEON-LABEL: @load_undef_mask_factor2(		; CHECK-NEON-LABEL: @load_undef_mask_factor2(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.neon.vld2.v4i32.p0i8(i8 [[TMP1]], i32 4)		; CHECK-NEON-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.neon.vld2.v4i32.p0i8(i8 [[TMP1]], i32 4)
; CHECK-NEON-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 1		; CHECK-NEON-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 1
; CHECK-NEON-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 0		; CHECK-NEON-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 0
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @load_undef_mask_factor2(		; CHECK-MVE-LABEL: @load_undef_mask_factor2(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = load <8 x i32>, <8 x i32> [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP1:%.]] = bitcast <8 x i32> [[PTR:%.]] to i32
; CHECK-MVE-NEXT: [[V0:%.*]] = shufflevector <8 x i32> [[INTERLEAVED_VEC]], <8 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 6>		; CHECK-MVE-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vld2q.v4i32.p0i32(i32 [[TMP1]])
; CHECK-MVE-NEXT: [[V1:%.*]] = shufflevector <8 x i32> [[INTERLEAVED_VEC]], <8 x i32> undef, <4 x i32> <i32 undef, i32 3, i32 undef, i32 7>		; CHECK-MVE-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 1
		; CHECK-MVE-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 0
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @load_undef_mask_factor2(		; CHECK-NONE-LABEL: @load_undef_mask_factor2(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <8 x i32>, <8 x i32> [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <8 x i32>, <8 x i32> [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <8 x i32> [[INTERLEAVED_VEC]], <8 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 6>		; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <8 x i32> [[INTERLEAVED_VEC]], <8 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 6>
; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <8 x i32> [[INTERLEAVED_VEC]], <8 x i32> undef, <4 x i32> <i32 undef, i32 3, i32 undef, i32 7>		; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <8 x i32> [[INTERLEAVED_VEC]], <8 x i32> undef, <4 x i32> <i32 undef, i32 3, i32 undef, i32 7>
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
Show All 39 Lines
; CHECK-NEON-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } @llvm.arm.neon.vld4.v4i32.p0i8(i8 [[TMP1]], i32 4)		; CHECK-NEON-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } @llvm.arm.neon.vld4.v4i32.p0i8(i8 [[TMP1]], i32 4)
; CHECK-NEON-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 3		; CHECK-NEON-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 3
; CHECK-NEON-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 2		; CHECK-NEON-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 2
; CHECK-NEON-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 1		; CHECK-NEON-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 1
; CHECK-NEON-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 0		; CHECK-NEON-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 0
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @load_undef_mask_factor4(		; CHECK-MVE-LABEL: @load_undef_mask_factor4(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP1:%.]] = bitcast <16 x i32> [[PTR:%.]] to i32
; CHECK-MVE-NEXT: [[V0:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 undef, i32 undef>		; CHECK-MVE-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } @llvm.arm.mve.vld4q.v4i32.p0i32(i32 [[TMP1]])
; CHECK-MVE-NEXT: [[V1:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 undef, i32 undef>		; CHECK-MVE-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 3
; CHECK-MVE-NEXT: [[V2:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 undef, i32 undef>		; CHECK-MVE-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 2
; CHECK-MVE-NEXT: [[V3:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 undef, i32 undef>		; CHECK-MVE-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 1
		; CHECK-MVE-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 0
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @load_undef_mask_factor4(		; CHECK-NONE-LABEL: @load_undef_mask_factor4(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 undef, i32 undef>		; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 undef, i32 undef>
; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 undef, i32 undef>		; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 undef, i32 undef>
; CHECK-NONE-NEXT: [[V2:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 undef, i32 undef>		; CHECK-NONE-NEXT: [[V2:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 undef, i32 undef>
; CHECK-NONE-NEXT: [[V3:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 undef, i32 undef>		; CHECK-NONE-NEXT: [[V3:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 undef, i32 undef>
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = load <16 x i32>, <16 x i32>* %ptr, align 4		%interleaved.vec = load <16 x i32>, <16 x i32>* %ptr, align 4
%v0 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 undef, i32 undef>		%v0 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 undef, i32 undef>
%v1 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 undef, i32 undef>		%v1 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 undef, i32 undef>
%v2 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 undef, i32 undef>		%v2 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 undef, i32 undef>
%v3 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 undef, i32 undef>		%v3 = shufflevector <16 x i32> %interleaved.vec, <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 undef, i32 undef>
ret void		ret void
}		}

define void @store_undef_mask_factor2(<8 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1) {		define void @store_undef_mask_factor2(<8 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1) {
; CHECK-NEON-LABEL: @store_undef_mask_factor2(		; CHECK-NEON-LABEL: @store_undef_mask_factor2(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> [[V1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> [[V1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP3:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* [[TMP3]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_undef_mask_factor2(		; CHECK-MVE-LABEL: @store_undef_mask_factor2(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 2, i32 6, i32 3, i32 7>		; CHECK-MVE-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> [[V1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP3:%.]] = bitcast <8 x i32> [[PTR:%.]] to i32
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst2q.p0i32.v4i32(i32* [[TMP3]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], i32 0)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst2q.p0i32.v4i32(i32* [[TMP3]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], i32 1)
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_undef_mask_factor2(		; CHECK-NONE-LABEL: @store_undef_mask_factor2(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 2, i32 6, i32 3, i32 7>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 2, i32 6, i32 3, i32 7>
; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <4 x i32> %v0, <4 x i32> %v1, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 2, i32 6, i32 3, i32 7>		%interleaved.vec = shufflevector <4 x i32> %v0, <4 x i32> %v1, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4		store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_undef_mask_factor3(<12 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2) {		define void @store_undef_mask_factor3(<12 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2) {
; CHECK-NEON-LABEL: @store_undef_mask_factor3(		; CHECK-NEON-LABEL: @store_undef_mask_factor3(
; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>		; CHECK-NEON-NEXT: [[TMP4:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP4]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_undef_mask_factor3(		; CHECK-MVE-LABEL: @store_undef_mask_factor3(
; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <12 x i32> <i32 0, i32 4, i32 undef, i32 1, i32 undef, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <12 x i32> <i32 0, i32 4, i32 undef, i32 1, i32 undef, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>
; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
Show All 11 Lines	;
store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4		store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_undef_mask_factor4(<16 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3) {		define void @store_undef_mask_factor4(<16 x i32>* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3) {
; CHECK-NEON-LABEL: @store_undef_mask_factor4(		; CHECK-NEON-LABEL: @store_undef_mask_factor4(
; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <16 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>		; CHECK-NEON-NEXT: [[TMP5:%.]] = bitcast <16 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_undef_mask_factor4(		; CHECK-MVE-LABEL: @store_undef_mask_factor4(
; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <16 x i32> <i32 0, i32 4, i32 8, i32 undef, i32 undef, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>		; CHECK-MVE-NEXT: [[TMP1:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-MVE-NEXT: store <16 x i32> [[INTERLEAVED_VEC]], <16 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; CHECK-MVE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; CHECK-MVE-NEXT: [[TMP5:%.]] = bitcast <16 x i32> [[PTR:%.]] to i32
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 0)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 1)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 2)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP5]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 3)
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_undef_mask_factor4(		; CHECK-NONE-LABEL: @store_undef_mask_factor4(
; CHECK-NONE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NONE-NEXT: [[S0:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> [[V1:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NONE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NONE-NEXT: [[S1:%.]] = shufflevector <4 x i32> [[V2:%.]], <4 x i32> [[V3:%.*]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <16 x i32> <i32 0, i32 4, i32 8, i32 undef, i32 undef, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i32> [[S0]], <8 x i32> [[S1]], <16 x i32> <i32 0, i32 4, i32 8, i32 undef, i32 undef, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
; CHECK-NONE-NEXT: store <16 x i32> [[INTERLEAVED_VEC]], <16 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <16 x i32> [[INTERLEAVED_VEC]], <16 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
Show All 32 Lines	;
%v0 = shufflevector <8 x i32> %interleaved.vec, <8 x i32> undef, <2 x i32> <i32 0, i32 3>		%v0 = shufflevector <8 x i32> %interleaved.vec, <8 x i32> undef, <2 x i32> <i32 0, i32 3>
%v1 = shufflevector <8 x i32> %interleaved.vec, <8 x i32> undef, <2 x i32> <i32 1, i32 4>		%v1 = shufflevector <8 x i32> %interleaved.vec, <8 x i32> undef, <2 x i32> <i32 1, i32 4>
%v2 = shufflevector <8 x i32> %interleaved.vec, <8 x i32> undef, <2 x i32> <i32 2, i32 5>		%v2 = shufflevector <8 x i32> %interleaved.vec, <8 x i32> undef, <2 x i32> <i32 2, i32 5>
ret void		ret void
}		}

define void @store_address_space(<4 x i32> addrspace(1)* %ptr, <2 x i32> %v0, <2 x i32> %v1) {		define void @store_address_space(<4 x i32> addrspace(1)* %ptr, <2 x i32> %v0, <2 x i32> %v1) {
; CHECK-NEON-LABEL: @store_address_space(		; CHECK-NEON-LABEL: @store_address_space(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <4 x i32> addrspace(1) [[PTR:%.]] to i8 addrspace(1)		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <2 x i32> <i32 0, i32 1>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <2 x i32> <i32 0, i32 1>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[V0]], <2 x i32> [[V1]], <2 x i32> <i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[V0]], <2 x i32> [[V1]], <2 x i32> <i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP3:%.]] = bitcast <4 x i32> addrspace(1) [[PTR:%.]] to i8 addrspace(1)
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p1i8.v2i32(i8 addrspace(1)* [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], i32 0)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p1i8.v2i32(i8 addrspace(1)* [[TMP3]], <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], i32 0)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_address_space(		; CHECK-MVE-LABEL: @store_address_space(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 2, i32 1, i32 3>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 2, i32 1, i32 3>
; CHECK-MVE-NEXT: store <4 x i32> [[INTERLEAVED_VEC]], <4 x i32> addrspace(1)* [[PTR:%.*]]		; CHECK-MVE-NEXT: store <4 x i32> [[INTERLEAVED_VEC]], <4 x i32> addrspace(1)* [[PTR:%.*]]
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_address_space(		; CHECK-NONE-LABEL: @store_address_space(
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
;		;
%interleaved.vec = shufflevector <3 x float> %v0, <3 x float> undef, <3 x i32> <i32 0, i32 2, i32 undef>		%interleaved.vec = shufflevector <3 x float> %v0, <3 x float> undef, <3 x i32> <i32 0, i32 2, i32 undef>
store <3 x float> %interleaved.vec, <3 x float>* %ptr, align 16		store <3 x float> %interleaved.vec, <3 x float>* %ptr, align 16
ret void		ret void
}		}

define void @store_general_mask_factor4(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor4(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor4(		; CHECK-NEON-LABEL: @store_general_mask_factor4(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 16, i32 17>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 16, i32 17>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 32, i32 33>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 32, i32 33>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>		; CHECK-NEON-NEXT: [[TMP5:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP5]], <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor4(		; CHECK-MVE-LABEL: @store_general_mask_factor4(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor4(		; CHECK-NONE-LABEL: @store_general_mask_factor4(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>		%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4		store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_general_mask_factor4_undefbeg(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor4_undefbeg(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor4_undefbeg(		; CHECK-NEON-LABEL: @store_general_mask_factor4_undefbeg(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 16, i32 17>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 16, i32 17>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 32, i32 33>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 32, i32 33>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>		; CHECK-NEON-NEXT: [[TMP5:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP5]], <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor4_undefbeg(		; CHECK-MVE-LABEL: @store_general_mask_factor4_undefbeg(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 undef, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 undef, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor4_undefbeg(		; CHECK-NONE-LABEL: @store_general_mask_factor4_undefbeg(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 undef, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 undef, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 undef, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>		%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 undef, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 9>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4		store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_general_mask_factor4_undefend(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor4_undefend(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor4_undefend(		; CHECK-NEON-LABEL: @store_general_mask_factor4_undefend(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 16, i32 17>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 16, i32 17>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 32, i32 33>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 32, i32 33>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>		; CHECK-NEON-NEXT: [[TMP5:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP5]], <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor4_undefend(		; CHECK-MVE-LABEL: @store_general_mask_factor4_undefend(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 undef>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 undef>
; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor4_undefend(		; CHECK-NONE-LABEL: @store_general_mask_factor4_undefend(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 undef>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 undef>
; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 undef>		%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 16, i32 32, i32 8, i32 5, i32 17, i32 33, i32 undef>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4		store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_general_mask_factor4_undefmid(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor4_undefmid(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor4_undefmid(		; CHECK-NEON-LABEL: @store_general_mask_factor4_undefmid(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 16, i32 17>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 16, i32 17>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 32, i32 33>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 32, i32 33>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>		; CHECK-NEON-NEXT: [[TMP5:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP5]], <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor4_undefmid(		; CHECK-MVE-LABEL: @store_general_mask_factor4_undefmid(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 undef, i32 32, i32 8, i32 5, i32 17, i32 undef, i32 9>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 undef, i32 32, i32 8, i32 5, i32 17, i32 undef, i32 9>
; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor4_undefmid(		; CHECK-NONE-LABEL: @store_general_mask_factor4_undefmid(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 undef, i32 32, i32 8, i32 5, i32 17, i32 undef, i32 9>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 undef, i32 32, i32 8, i32 5, i32 17, i32 undef, i32 9>
; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 undef, i32 32, i32 8, i32 5, i32 17, i32 undef, i32 9>		%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 undef, i32 32, i32 8, i32 5, i32 17, i32 undef, i32 9>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4		store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_general_mask_factor4_undefmulti(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor4_undefmulti(<8 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor4_undefmulti(		; CHECK-NEON-LABEL: @store_general_mask_factor4_undefmulti(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <2 x i32> <i32 4, i32 5>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 0, i32 1>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 0, i32 1>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 0, i32 1>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 0, i32 1>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <2 x i32> <i32 8, i32 9>		; CHECK-NEON-NEXT: [[TMP5:%.]] = bitcast <8 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* [[TMP5]], <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor4_undefmulti(		; CHECK-MVE-LABEL: @store_general_mask_factor4_undefmulti(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 undef, i32 undef, i32 8, i32 undef, i32 undef, i32 undef, i32 9>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 undef, i32 undef, i32 8, i32 undef, i32 undef, i32 undef, i32 9>
; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor4_undefmulti(		; CHECK-NONE-LABEL: @store_general_mask_factor4_undefmulti(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 undef, i32 undef, i32 8, i32 undef, i32 undef, i32 undef, i32 9>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <8 x i32> <i32 4, i32 undef, i32 undef, i32 8, i32 undef, i32 undef, i32 undef, i32 9>
; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <8 x i32> [[INTERLEAVED_VEC]], <8 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 undef, i32 undef, i32 8, i32 undef, i32 undef, i32 undef, i32 9>		%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <8 x i32> <i32 4, i32 undef, i32 undef, i32 8, i32 undef, i32 undef, i32 undef, i32 9>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4		store <8 x i32> %interleaved.vec, <8 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_general_mask_factor3(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor3(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor3(		; CHECK-NEON-LABEL: @store_general_mask_factor3(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>		; CHECK-NEON-NEXT: [[TMP4:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP4]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor3(		; CHECK-MVE-LABEL: @store_general_mask_factor3(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 4, i32 32, i32 16, i32 5, i32 33, i32 17, i32 6, i32 34, i32 18, i32 7, i32 35, i32 19>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 4, i32 32, i32 16, i32 5, i32 33, i32 17, i32 6, i32 34, i32 18, i32 7, i32 35, i32 19>
; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor3(		; CHECK-NONE-LABEL: @store_general_mask_factor3(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 4, i32 32, i32 16, i32 5, i32 33, i32 17, i32 6, i32 34, i32 18, i32 7, i32 35, i32 19>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 4, i32 32, i32 16, i32 5, i32 33, i32 17, i32 6, i32 34, i32 18, i32 7, i32 35, i32 19>
; CHECK-NONE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 5, i32 33, i32 17, i32 6, i32 34, i32 18, i32 7, i32 35, i32 19>		%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 5, i32 33, i32 17, i32 6, i32 34, i32 18, i32 7, i32 35, i32 19>
store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4		store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_general_mask_factor3_undefmultimid(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor3_undefmultimid(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor3_undefmultimid(		; CHECK-NEON-LABEL: @store_general_mask_factor3_undefmultimid(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>		; CHECK-NEON-NEXT: [[TMP4:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP4]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor3_undefmultimid(		; CHECK-MVE-LABEL: @store_general_mask_factor3_undefmultimid(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 4, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 7, i32 35, i32 19>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 4, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 7, i32 35, i32 19>
; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor3_undefmultimid(		; CHECK-NONE-LABEL: @store_general_mask_factor3_undefmultimid(
Show All 24 Lines
;		;
%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 8, i32 35, i32 19>		%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 4, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 8, i32 35, i32 19>
store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4		store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_general_mask_factor3_undeflane(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor3_undeflane(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor3_undeflane(		; CHECK-NEON-LABEL: @store_general_mask_factor3_undeflane(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>		; CHECK-NEON-NEXT: [[TMP4:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP4]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor3_undeflane(		; CHECK-MVE-LABEL: @store_general_mask_factor3_undeflane(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>
; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor3_undeflane(		; CHECK-NONE-LABEL: @store_general_mask_factor3_undeflane(
Show All 24 Lines
;		;
%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 2, i32 35, i32 19>		%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 2, i32 35, i32 19>
store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4		store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_general_mask_factor3_endstart_pass(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor3_endstart_pass(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor3_endstart_pass(		; CHECK-NEON-LABEL: @store_general_mask_factor3_endstart_pass(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>		; CHECK-NEON-NEXT: [[TMP4:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP4]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor3_endstart_pass(		; CHECK-MVE-LABEL: @store_general_mask_factor3_endstart_pass(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 7, i32 35, i32 19>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 undef, i32 32, i32 16, i32 undef, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 7, i32 35, i32 19>
; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor3_endstart_pass(		; CHECK-NONE-LABEL: @store_general_mask_factor3_endstart_pass(
Show All 24 Lines
;		;
%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 0, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>		%interleaved.vec = shufflevector <32 x i32> %v0, <32 x i32> %v1, <12 x i32> <i32 undef, i32 32, i32 16, i32 0, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>
store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4		store <12 x i32> %interleaved.vec, <12 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_general_mask_factor3_midstart_pass(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {		define void @store_general_mask_factor3_midstart_pass(<12 x i32>* %ptr, <32 x i32> %v0, <32 x i32> %v1) {
; CHECK-NEON-LABEL: @store_general_mask_factor3_midstart_pass(		; CHECK-NEON-LABEL: @store_general_mask_factor3_midstart_pass(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8		; CHECK-NEON-NEXT: [[TMP1:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 32, i32 33, i32 34, i32 35>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <32 x i32> [[V0]], <32 x i32> [[V1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>		; CHECK-NEON-NEXT: [[TMP4:%.]] = bitcast <12 x i32> [[PTR:%.]] to i8
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP4]], <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_general_mask_factor3_midstart_pass(		; CHECK-MVE-LABEL: @store_general_mask_factor3_midstart_pass(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 undef, i32 32, i32 16, i32 1, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <32 x i32> [[V0:%.]], <32 x i32> [[V1:%.*]], <12 x i32> <i32 undef, i32 32, i32 16, i32 1, i32 33, i32 17, i32 undef, i32 34, i32 18, i32 undef, i32 35, i32 19>
; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <12 x i32> [[INTERLEAVED_VEC]], <12 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_general_mask_factor3_midstart_pass(		; CHECK-NONE-LABEL: @store_general_mask_factor3_midstart_pass(
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; CHECK-NEON-NEXT: [[VLDN1:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.neon.vld2.v4i32.p0i8(i8 [[TMP6]], i32 4)		; CHECK-NEON-NEXT: [[VLDN1:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.neon.vld2.v4i32.p0i8(i8 [[TMP6]], i32 4)
; CHECK-NEON-NEXT: [[TMP7:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 1		; CHECK-NEON-NEXT: [[TMP7:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 1
; CHECK-NEON-NEXT: [[TMP8:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 0		; CHECK-NEON-NEXT: [[TMP8:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 0
; CHECK-NEON-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @load_factor2_wide2(		; CHECK-MVE-LABEL: @load_factor2_wide2(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP1:%.]] = bitcast <16 x i32> [[PTR:%.]] to i32
; CHECK-MVE-NEXT: [[V0:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		; CHECK-MVE-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vld2q.v4i32.p0i32(i32 [[TMP1]])
; CHECK-MVE-NEXT: [[V1:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-MVE-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 1
		; CHECK-MVE-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 0
		; CHECK-MVE-NEXT: [[TMP4:%.]] = getelementptr i32, i32 [[TMP1]], i32 8
		; CHECK-MVE-NEXT: [[VLDN1:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vld2q.v4i32.p0i32(i32 [[TMP4]])
		; CHECK-MVE-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 1
		; CHECK-MVE-NEXT: [[TMP6:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 0
		; CHECK-MVE-NEXT: [[TMP7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @load_factor2_wide2(		; CHECK-NONE-LABEL: @load_factor2_wide2(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
Show All 24 Lines
; CHECK-NEON-NEXT: [[TMP14:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEON-NEXT: [[TMP14:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEON-NEXT: [[TMP15:%.*]] = shufflevector <8 x i32> [[TMP13]], <8 x i32> [[TMP14]], <12 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>		; CHECK-NEON-NEXT: [[TMP15:%.*]] = shufflevector <8 x i32> [[TMP13]], <8 x i32> [[TMP14]], <12 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
; CHECK-NEON-NEXT: [[TMP16:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP16:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP17:%.*]] = shufflevector <4 x i32> [[TMP12]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEON-NEXT: [[TMP17:%.*]] = shufflevector <4 x i32> [[TMP12]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEON-NEXT: [[TMP18:%.*]] = shufflevector <8 x i32> [[TMP16]], <8 x i32> [[TMP17]], <12 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>		; CHECK-NEON-NEXT: [[TMP18:%.*]] = shufflevector <8 x i32> [[TMP16]], <8 x i32> [[TMP17]], <12 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @load_factor2_wide3(		; CHECK-MVE-LABEL: @load_factor2_wide3(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = load <24 x i32>, <24 x i32> [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP1:%.]] = bitcast <24 x i32> [[PTR:%.]] to i32
; CHECK-MVE-NEXT: [[V0:%.*]] = shufflevector <24 x i32> [[INTERLEAVED_VEC]], <24 x i32> undef, <12 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22>		; CHECK-MVE-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vld2q.v4i32.p0i32(i32 [[TMP1]])
; CHECK-MVE-NEXT: [[V1:%.*]] = shufflevector <24 x i32> [[INTERLEAVED_VEC]], <24 x i32> undef, <12 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23>		; CHECK-MVE-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 1
		; CHECK-MVE-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 0
		; CHECK-MVE-NEXT: [[TMP4:%.]] = getelementptr i32, i32 [[TMP1]], i32 8
		; CHECK-MVE-NEXT: [[VLDN1:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vld2q.v4i32.p0i32(i32 [[TMP4]])
		; CHECK-MVE-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 1
		; CHECK-MVE-NEXT: [[TMP6:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 0
		; CHECK-MVE-NEXT: [[TMP7:%.]] = getelementptr i32, i32 [[TMP4]], i32 8
		; CHECK-MVE-NEXT: [[VLDN2:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vld2q.v4i32.p0i32(i32 [[TMP7]])
		; CHECK-MVE-NEXT: [[TMP8:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN2]], 1
		; CHECK-MVE-NEXT: [[TMP9:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN2]], 0
		; CHECK-MVE-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-MVE-NEXT: [[TMP12:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> [[TMP11]], <12 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
		; CHECK-MVE-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP14:%.*]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
		; CHECK-MVE-NEXT: [[TMP15:%.*]] = shufflevector <8 x i32> [[TMP13]], <8 x i32> [[TMP14]], <12 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @load_factor2_wide3(		; CHECK-NONE-LABEL: @load_factor2_wide3(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <24 x i32>, <24 x i32> [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <24 x i32>, <24 x i32> [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <24 x i32> [[INTERLEAVED_VEC]], <24 x i32> undef, <12 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22>		; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <24 x i32> [[INTERLEAVED_VEC]], <24 x i32> undef, <12 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32 22>
; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <24 x i32> [[INTERLEAVED_VEC]], <24 x i32> undef, <12 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23>		; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <24 x i32> [[INTERLEAVED_VEC]], <24 x i32> undef, <12 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15, i32 17, i32 19, i32 21, i32 23>
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
; CHECK-NEON-NEXT: [[TMP12:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN1]], 0		; CHECK-NEON-NEXT: [[TMP12:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN1]], 0
; CHECK-NEON-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP14:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP14:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP15:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP15:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP11]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP16:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP16:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @load_factor4_wide(		; CHECK-MVE-LABEL: @load_factor4_wide(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = load <32 x i32>, <32 x i32> [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP1:%.]] = bitcast <32 x i32> [[PTR:%.]] to i32
; CHECK-MVE-NEXT: [[V0:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>		; CHECK-MVE-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } @llvm.arm.mve.vld4q.v4i32.p0i32(i32 [[TMP1]])
; CHECK-MVE-NEXT: [[V1:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>		; CHECK-MVE-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 3
; CHECK-MVE-NEXT: [[V2:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>		; CHECK-MVE-NEXT: [[TMP3:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 2
; CHECK-MVE-NEXT: [[V3:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>		; CHECK-MVE-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 1
		; CHECK-MVE-NEXT: [[TMP5:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN]], 0
		; CHECK-MVE-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[TMP1]], i32 16
		; CHECK-MVE-NEXT: [[VLDN1:%.]] = call { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } @llvm.arm.mve.vld4q.v4i32.p0i32(i32 [[TMP6]])
		; CHECK-MVE-NEXT: [[TMP7:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN1]], 3
		; CHECK-MVE-NEXT: [[TMP8:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN1]], 2
		; CHECK-MVE-NEXT: [[TMP9:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN1]], 1
		; CHECK-MVE-NEXT: [[TMP10:%.*]] = extractvalue { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> } [[VLDN1]], 0
		; CHECK-MVE-NEXT: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP7]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP13:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP14:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @load_factor4_wide(		; CHECK-NONE-LABEL: @load_factor4_wide(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <32 x i32>, <32 x i32> [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <32 x i32>, <32 x i32> [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>		; CHECK-NONE-NEXT: [[V0:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>
; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>		; CHECK-NONE-NEXT: [[V1:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>
; CHECK-NONE-NEXT: [[V2:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>		; CHECK-NONE-NEXT: [[V2:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>
; CHECK-NONE-NEXT: [[V3:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>		; CHECK-NONE-NEXT: [[V3:%.*]] = shufflevector <32 x i32> [[INTERLEAVED_VEC]], <32 x i32> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = load <32 x i32>, <32 x i32>* %ptr, align 4		%interleaved.vec = load <32 x i32>, <32 x i32>* %ptr, align 4
%v0 = shufflevector <32 x i32> %interleaved.vec, <32 x i32> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>		%v0 = shufflevector <32 x i32> %interleaved.vec, <32 x i32> undef, <8 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 24, i32 28>
%v1 = shufflevector <32 x i32> %interleaved.vec, <32 x i32> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>		%v1 = shufflevector <32 x i32> %interleaved.vec, <32 x i32> undef, <8 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29>
%v2 = shufflevector <32 x i32> %interleaved.vec, <32 x i32> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>		%v2 = shufflevector <32 x i32> %interleaved.vec, <32 x i32> undef, <8 x i32> <i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 26, i32 30>
%v3 = shufflevector <32 x i32> %interleaved.vec, <32 x i32> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>		%v3 = shufflevector <32 x i32> %interleaved.vec, <32 x i32> undef, <8 x i32> <i32 3, i32 7, i32 11, i32 15, i32 19, i32 23, i32 27, i32 31>
ret void		ret void
}		}

define void @store_factor2_wide(<16 x i32>* %ptr, <8 x i32> %v0, <8 x i32> %v1) {		define void @store_factor2_wide(<16 x i32>* %ptr, <8 x i32> %v0, <8 x i32> %v1) {
; CHECK-NEON-LABEL: @store_factor2_wide(		; CHECK-NEON-LABEL: @store_factor2_wide(
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <16 x i32> [[PTR:%.]] to i32		; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <16 x i32> [[PTR:%.]] to i32
; CHECK-NEON-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to i8*		; CHECK-NEON-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP3:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[V0]], <8 x i32> [[V1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[V0]], <8 x i32> [[V1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>		; CHECK-NEON-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP1]] to i8*
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* [[TMP4]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 4)
; CHECK-NEON-NEXT: [[TMP5:%.]] = getelementptr i32, i32 [[TMP1]], i32 8		; CHECK-NEON-NEXT: [[TMP5:%.]] = getelementptr i32, i32 [[TMP1]], i32 8
; CHECK-NEON-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP5]] to i8*		; CHECK-NEON-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[V0]], <8 x i32> [[V1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[V0]], <8 x i32> [[V1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[V0]], <8 x i32> [[V1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; CHECK-NEON-NEXT: [[TMP8:%.*]] = shufflevector <8 x i32> [[V0]], <8 x i32> [[V1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>		; CHECK-NEON-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP5]] to i8*
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> [[TMP8]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* [[TMP8]], <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_factor2_wide(		; CHECK-MVE-LABEL: @store_factor2_wide(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		; CHECK-MVE-NEXT: [[TMP1:%.]] = bitcast <16 x i32> [[PTR:%.]] to i32
; CHECK-MVE-NEXT: store <16 x i32> [[INTERLEAVED_VEC]], <16 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP2:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; CHECK-MVE-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> [[V0]], <8 x i32> [[V1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst2q.p0i32.v4i32(i32* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 0)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst2q.p0i32.v4i32(i32* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], i32 1)
		; CHECK-MVE-NEXT: [[TMP4:%.]] = getelementptr i32, i32 [[TMP1]], i32 8
		; CHECK-MVE-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[V0]], <8 x i32> [[V1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[V0]], <8 x i32> [[V1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst2q.p0i32.v4i32(i32* [[TMP4]], <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], i32 0)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst2q.p0i32.v4i32(i32* [[TMP4]], <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], i32 1)
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_factor2_wide(		; CHECK-NONE-LABEL: @store_factor2_wide(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
; CHECK-NONE-NEXT: store <16 x i32> [[INTERLEAVED_VEC]], <16 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <16 x i32> [[INTERLEAVED_VEC]], <16 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
%interleaved.vec = shufflevector <8 x i32> %v0, <8 x i32> %v1, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%interleaved.vec = shufflevector <8 x i32> %v0, <8 x i32> %v1, <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
store <16 x i32> %interleaved.vec, <16 x i32>* %ptr, align 4		store <16 x i32> %interleaved.vec, <16 x i32>* %ptr, align 4
ret void		ret void
}		}

define void @store_factor3_wide(<24 x i32>* %ptr, <8 x i32> %v0, <8 x i32> %v1, <8 x i32> %v2) {		define void @store_factor3_wide(<24 x i32>* %ptr, <8 x i32> %v0, <8 x i32> %v1, <8 x i32> %v2) {
; CHECK-NEON-LABEL: @store_factor3_wide(		; CHECK-NEON-LABEL: @store_factor3_wide(
; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <24 x i32> [[PTR:%.]] to i32		; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <24 x i32> [[PTR:%.]] to i32
; CHECK-NEON-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to i8*		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>		; CHECK-NEON-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP1]] to i8*
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP5]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], i32 4)
; CHECK-NEON-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[TMP1]], i32 12		; CHECK-NEON-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[TMP1]], i32 12
; CHECK-NEON-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to i8*		; CHECK-NEON-NEXT: [[TMP7:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP8:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP8:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; CHECK-NEON-NEXT: [[TMP9:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>		; CHECK-NEON-NEXT: [[TMP9:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 20, i32 21, i32 22, i32 23>
; CHECK-NEON-NEXT: [[TMP10:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 20, i32 21, i32 22, i32 23>		; CHECK-NEON-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP6]] to i8*
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP7]], <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> [[TMP10]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* [[TMP10]], <4 x i32> [[TMP7]], <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_factor3_wide(		; CHECK-MVE-LABEL: @store_factor3_wide(
; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <24 x i32> <i32 0, i32 8, i32 16, i32 1, i32 9, i32 17, i32 2, i32 10, i32 18, i32 3, i32 11, i32 19, i32 4, i32 12, i32 20, i32 5, i32 13, i32 21, i32 6, i32 14, i32 22, i32 7, i32 15, i32 23>		; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <24 x i32> <i32 0, i32 8, i32 16, i32 1, i32 9, i32 17, i32 2, i32 10, i32 18, i32 3, i32 11, i32 19, i32 4, i32 12, i32 20, i32 5, i32 13, i32 21, i32 6, i32 14, i32 22, i32 7, i32 15, i32 23>
; CHECK-MVE-NEXT: store <24 x i32> [[INTERLEAVED_VEC]], <24 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: store <24 x i32> [[INTERLEAVED_VEC]], <24 x i32>* [[PTR:%.*]], align 4
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
Show All 12 Lines	;
ret void		ret void
}		}

define void @store_factor4_wide(<32 x i32>* %ptr, <8 x i32> %v0, <8 x i32> %v1, <8 x i32> %v2, <8 x i32> %v3) {		define void @store_factor4_wide(<32 x i32>* %ptr, <8 x i32> %v0, <8 x i32> %v1, <8 x i32> %v2, <8 x i32> %v3) {
; CHECK-NEON-LABEL: @store_factor4_wide(		; CHECK-NEON-LABEL: @store_factor4_wide(
; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEON-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> [[V3:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NEON-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> [[V3:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <32 x i32> [[PTR:%.]] to i32		; CHECK-NEON-NEXT: [[TMP1:%.]] = bitcast <32 x i32> [[PTR:%.]] to i32
; CHECK-NEON-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to i8*		; CHECK-NEON-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEON-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>		; CHECK-NEON-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>
; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>		; CHECK-NEON-NEXT: [[TMP5:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 24, i32 25, i32 26, i32 27>
; CHECK-NEON-NEXT: [[TMP6:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 24, i32 25, i32 26, i32 27>		; CHECK-NEON-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP1]] to i8*
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], <4 x i32> [[TMP6]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* [[TMP6]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 4)
; CHECK-NEON-NEXT: [[TMP7:%.]] = getelementptr i32, i32 [[TMP1]], i32 16		; CHECK-NEON-NEXT: [[TMP7:%.]] = getelementptr i32, i32 [[TMP1]], i32 16
; CHECK-NEON-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP7]] to i8*		; CHECK-NEON-NEXT: [[TMP8:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP9:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP9:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>
; CHECK-NEON-NEXT: [[TMP10:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>		; CHECK-NEON-NEXT: [[TMP10:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 20, i32 21, i32 22, i32 23>
; CHECK-NEON-NEXT: [[TMP11:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 20, i32 21, i32 22, i32 23>		; CHECK-NEON-NEXT: [[TMP11:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 28, i32 29, i32 30, i32 31>
; CHECK-NEON-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 28, i32 29, i32 30, i32 31>		; CHECK-NEON-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP7]] to i8*
; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], <4 x i32> [[TMP12]], i32 4)		; CHECK-NEON-NEXT: call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* [[TMP12]], <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> [[TMP10]], <4 x i32> [[TMP11]], i32 4)
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @store_factor4_wide(		; CHECK-MVE-LABEL: @store_factor4_wide(
; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-MVE-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> [[V3:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-MVE-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> [[V3:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>		; CHECK-MVE-NEXT: [[TMP1:%.]] = bitcast <32 x i32> [[PTR:%.]] to i32
; CHECK-MVE-NEXT: store <32 x i32> [[INTERLEAVED_VEC]], <32 x i32>* [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; CHECK-MVE-NEXT: [[TMP3:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 8, i32 9, i32 10, i32 11>
		; CHECK-MVE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 16, i32 17, i32 18, i32 19>
		; CHECK-MVE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 24, i32 25, i32 26, i32 27>
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 0)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 1)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 2)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], i32 3)
		; CHECK-MVE-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[TMP1]], i32 16
		; CHECK-MVE-NEXT: [[TMP7:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP8:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 12, i32 13, i32 14, i32 15>
		; CHECK-MVE-NEXT: [[TMP9:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 20, i32 21, i32 22, i32 23>
		; CHECK-MVE-NEXT: [[TMP10:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <4 x i32> <i32 28, i32 29, i32 30, i32 31>
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> [[TMP10]], i32 0)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> [[TMP10]], i32 1)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> [[TMP10]], i32 2)
		; CHECK-MVE-NEXT: call void @llvm.arm.mve.vst4q.p0i32.v4i32(i32* [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> [[TMP8]], <4 x i32> [[TMP9]], <4 x i32> [[TMP10]], i32 3)
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @store_factor4_wide(		; CHECK-NONE-LABEL: @store_factor4_wide(
; CHECK-NONE-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NONE-NEXT: [[S0:%.]] = shufflevector <8 x i32> [[V0:%.]], <8 x i32> [[V1:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NONE-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> [[V3:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		; CHECK-NONE-NEXT: [[S1:%.]] = shufflevector <8 x i32> [[V2:%.]], <8 x i32> [[V3:%.*]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <16 x i32> [[S0]], <16 x i32> [[S1]], <32 x i32> <i32 0, i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18, i32 26, i32 3, i32 11, i32 19, i32 27, i32 4, i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32 22, i32 30, i32 7, i32 15, i32 23, i32 31>
; CHECK-NONE-NEXT: store <32 x i32> [[INTERLEAVED_VEC]], <32 x i32>* [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: store <32 x i32> [[INTERLEAVED_VEC]], <32 x i32>* [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; CHECK-NEON-NEXT: [[TMP10:%.]] = inttoptr <4 x i32> [[TMP9]] to <4 x i32>		; CHECK-NEON-NEXT: [[TMP10:%.]] = inttoptr <4 x i32> [[TMP9]] to <4 x i32>
; CHECK-NEON-NEXT: [[TMP11:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 0		; CHECK-NEON-NEXT: [[TMP11:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 0
; CHECK-NEON-NEXT: [[TMP12:%.]] = inttoptr <4 x i32> [[TMP11]] to <4 x i32>		; CHECK-NEON-NEXT: [[TMP12:%.]] = inttoptr <4 x i32> [[TMP11]] to <4 x i32>
; CHECK-NEON-NEXT: [[TMP13:%.]] = shufflevector <4 x i32> [[TMP4]], <4 x i32*> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP13:%.]] = shufflevector <4 x i32> [[TMP4]], <4 x i32*> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: [[TMP14:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		; CHECK-NEON-NEXT: [[TMP14:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEON-NEXT: ret void		; CHECK-NEON-NEXT: ret void
;		;
; CHECK-MVE-LABEL: @load_factor2_wide_pointer(		; CHECK-MVE-LABEL: @load_factor2_wide_pointer(
; CHECK-MVE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4		; CHECK-MVE-NEXT: [[TMP1:%.]] = bitcast <16 x i32>* [[PTR:%.]] to i32
; CHECK-MVE-NEXT: [[V0:%.]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32*> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		; CHECK-MVE-NEXT: [[VLDN:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vld2q.v4i32.p0i32(i32 [[TMP1]])
; CHECK-MVE-NEXT: [[V1:%.]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32*> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-MVE-NEXT: [[TMP2:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 1
		; CHECK-MVE-NEXT: [[TMP3:%.]] = inttoptr <4 x i32> [[TMP2]] to <4 x i32>
		; CHECK-MVE-NEXT: [[TMP4:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN]], 0
		; CHECK-MVE-NEXT: [[TMP5:%.]] = inttoptr <4 x i32> [[TMP4]] to <4 x i32>
		; CHECK-MVE-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[TMP1]], i32 8
		; CHECK-MVE-NEXT: [[VLDN1:%.]] = call { <4 x i32>, <4 x i32> } @llvm.arm.mve.vld2q.v4i32.p0i32(i32 [[TMP6]])
		; CHECK-MVE-NEXT: [[TMP7:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 1
		; CHECK-MVE-NEXT: [[TMP8:%.]] = inttoptr <4 x i32> [[TMP7]] to <4 x i32>
		; CHECK-MVE-NEXT: [[TMP9:%.*]] = extractvalue { <4 x i32>, <4 x i32> } [[VLDN1]], 0
		; CHECK-MVE-NEXT: [[TMP10:%.]] = inttoptr <4 x i32> [[TMP9]] to <4 x i32>
		; CHECK-MVE-NEXT: [[TMP11:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
		; CHECK-MVE-NEXT: [[TMP12:%.]] = shufflevector <4 x i32> [[TMP5]], <4 x i32*> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-MVE-NEXT: ret void		; CHECK-MVE-NEXT: ret void
;		;
; CHECK-NONE-LABEL: @load_factor2_wide_pointer(		; CHECK-NONE-LABEL: @load_factor2_wide_pointer(
; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4		; CHECK-NONE-NEXT: [[INTERLEAVED_VEC:%.]] = load <16 x i32>, <16 x i32> [[PTR:%.*]], align 4
; CHECK-NONE-NEXT: [[V0:%.]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32*> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		; CHECK-NONE-NEXT: [[V0:%.]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32*> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
; CHECK-NONE-NEXT: [[V1:%.]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32*> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		; CHECK-NONE-NEXT: [[V1:%.]] = shufflevector <16 x i32> [[INTERLEAVED_VEC]], <16 x i32*> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
; CHECK-NONE-NEXT: ret void		; CHECK-NONE-NEXT: ret void
;		;
Show All 33 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-interleaved-cost.ll

Show All 14 Lines	entry:
br label %for.body		br label %for.body

; VF_2-LABEL: Checking a loop in "i8_factor_2"		; VF_2-LABEL: Checking a loop in "i8_factor_2"
; VF_2: Found an estimated cost of 20 for VF 2 For instruction: %tmp2 = load i8, i8* %tmp0, align 1		; VF_2: Found an estimated cost of 20 for VF 2 For instruction: %tmp2 = load i8, i8* %tmp0, align 1
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i8, i8* %tmp1, align 1		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i8 0, i8* %tmp0, align 1		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i8 0, i8* %tmp0, align 1
; VF_2-NEXT: Found an estimated cost of 12 for VF 2 For instruction: store i8 0, i8* %tmp1, align 1		; VF_2-NEXT: Found an estimated cost of 12 for VF 2 For instruction: store i8 0, i8* %tmp1, align 1
; VF_4-LABEL: Checking a loop in "i8_factor_2"		; VF_4-LABEL: Checking a loop in "i8_factor_2"
; VF_4: Found an estimated cost of 72 for VF 4 For instruction: %tmp2 = load i8, i8* %tmp0, align 1		; VF_4: Found an estimated cost of 4 for VF 4 For instruction: %tmp2 = load i8, i8* %tmp0, align 1
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load i8, i8* %tmp1, align 1		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i8 0, i8* %tmp0, align 1		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i8 0, i8* %tmp0, align 1
; VF_4-NEXT: Found an estimated cost of 40 for VF 4 For instruction: store i8 0, i8* %tmp1, align 1		; VF_4-NEXT: Found an estimated cost of 4 for VF 4 For instruction: store i8 0, i8* %tmp1, align 1
; VF_8-LABEL: Checking a loop in "i8_factor_2"		; VF_8-LABEL: Checking a loop in "i8_factor_2"
; VF_8: Found an estimated cost of 2 for VF 8 For instruction: %tmp2 = load i8, i8* %tmp0, align 1		; VF_8: Found an estimated cost of 4 for VF 8 For instruction: %tmp2 = load i8, i8* %tmp0, align 1
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i8, i8* %tmp1, align 1		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1
; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1		; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1
; VF_16-LABEL: Checking a loop in "i8_factor_2"		; VF_16-LABEL: Checking a loop in "i8_factor_2"
; VF_16: Found an estimated cost of 2 for VF 16 For instruction: %tmp2 = load i8, i8* %tmp0, align 1		; VF_16: Found an estimated cost of 4 for VF 16 For instruction: %tmp2 = load i8, i8* %tmp0, align 1
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp1, align 1		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i8, i8* %tmp1, align 1
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1
; VF_16-NEXT: Found an estimated cost of 2 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1		; VF_16-NEXT: Found an estimated cost of 4 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %i8.2, %i8.2* %data, i64 %i, i32 1
%tmp2 = load i8, i8* %tmp0, align 1		%tmp2 = load i8, i8* %tmp0, align 1
%tmp3 = load i8, i8* %tmp1, align 1		%tmp3 = load i8, i8* %tmp1, align 1
store i8 0, i8* %tmp0, align 1		store i8 0, i8* %tmp0, align 1
store i8 0, i8* %tmp1, align 1		store i8 0, i8* %tmp1, align 1
Show All 11 Lines	entry:
br label %for.body		br label %for.body

; VF_2-LABEL: Checking a loop in "i16_factor_2"		; VF_2-LABEL: Checking a loop in "i16_factor_2"
; VF_2: Found an estimated cost of 20 for VF 2 For instruction: %tmp2 = load i16, i16* %tmp0, align 2		; VF_2: Found an estimated cost of 20 for VF 2 For instruction: %tmp2 = load i16, i16* %tmp0, align 2
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i16, i16* %tmp1, align 2		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i16 0, i16* %tmp0, align 2		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i16 0, i16* %tmp0, align 2
; VF_2-NEXT: Found an estimated cost of 12 for VF 2 For instruction: store i16 0, i16* %tmp1, align 2		; VF_2-NEXT: Found an estimated cost of 12 for VF 2 For instruction: store i16 0, i16* %tmp1, align 2
; VF_4-LABEL: Checking a loop in "i16_factor_2"		; VF_4-LABEL: Checking a loop in "i16_factor_2"
; VF_4: Found an estimated cost of 2 for VF 4 For instruction: %tmp2 = load i16, i16* %tmp0, align 2		; VF_4: Found an estimated cost of 4 for VF 4 For instruction: %tmp2 = load i16, i16* %tmp0, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load i16, i16* %tmp1, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i16 0, i16* %tmp0, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i16 0, i16* %tmp0, align 2
; VF_4-NEXT: Found an estimated cost of 2 for VF 4 For instruction: store i16 0, i16* %tmp1, align 2		; VF_4-NEXT: Found an estimated cost of 4 for VF 4 For instruction: store i16 0, i16* %tmp1, align 2
; VF_8-LABEL: Checking a loop in "i16_factor_2"		; VF_8-LABEL: Checking a loop in "i16_factor_2"
; VF_8: Found an estimated cost of 2 for VF 8 For instruction: %tmp2 = load i16, i16* %tmp0, align 2		; VF_8: Found an estimated cost of 4 for VF 8 For instruction: %tmp2 = load i16, i16* %tmp0, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i16, i16* %tmp1, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2
; VF_8-NEXT: Found an estimated cost of 2 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2		; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2
; VF_16-LABEL: Checking a loop in "i16_factor_2"		; VF_16-LABEL: Checking a loop in "i16_factor_2"
; VF_16: Found an estimated cost of 4 for VF 16 For instruction: %tmp2 = load i16, i16* %tmp0, align 2		; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load i16, i16* %tmp0, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp1, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i16, i16* %tmp1, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2
; VF_16-NEXT: Found an estimated cost of 4 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2		; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %i16.2, %i16.2* %data, i64 %i, i32 1
%tmp2 = load i16, i16* %tmp0, align 2		%tmp2 = load i16, i16* %tmp0, align 2
%tmp3 = load i16, i16* %tmp1, align 2		%tmp3 = load i16, i16* %tmp1, align 2
store i16 0, i16* %tmp0, align 2		store i16 0, i16* %tmp0, align 2
store i16 0, i16* %tmp1, align 2		store i16 0, i16* %tmp1, align 2
%i.next = add nuw nsw i64 %i, 1		%i.next = add nuw nsw i64 %i, 1
%cond = icmp slt i64 %i.next, %n		%cond = icmp slt i64 %i.next, %n
br i1 %cond, label %for.body, label %for.end		br i1 %cond, label %for.body, label %for.end

for.end:		for.end:
ret void		ret void
}		}

%i32.2 = type {i32, i32}		%i32.2 = type {i32, i32}
define void @i32_factor_2(%i32.2* %data, i64 %n) #0 {		define void @i32_factor_2(%i32.2* %data, i64 %n) #0 {
entry:		entry:
br label %for.body		br label %for.body

; VF_2-LABEL: Checking a loop in "i32_factor_2"		; VF_2-LABEL: Checking a loop in "i32_factor_2"
; VF_2: Found an estimated cost of 2 for VF 2 For instruction: %tmp2 = load i32, i32* %tmp0, align 4		; VF_2: Found an estimated cost of 20 for VF 2 For instruction: %tmp2 = load i32, i32* %tmp0, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i32, i32* %tmp1, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i32 0, i32* %tmp0, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i32 0, i32* %tmp0, align 4
; VF_2-NEXT: Found an estimated cost of 2 for VF 2 For instruction: store i32 0, i32* %tmp1, align 4		; VF_2-NEXT: Found an estimated cost of 12 for VF 2 For instruction: store i32 0, i32* %tmp1, align 4
; VF_4-LABEL: Checking a loop in "i32_factor_2"		; VF_4-LABEL: Checking a loop in "i32_factor_2"
; VF_4: Found an estimated cost of 2 for VF 4 For instruction: %tmp2 = load i32, i32* %tmp0, align 4		; VF_4: Found an estimated cost of 4 for VF 4 For instruction: %tmp2 = load i32, i32* %tmp0, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load i32, i32* %tmp1, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i32 0, i32* %tmp0, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i32 0, i32* %tmp0, align 4
; VF_4-NEXT: Found an estimated cost of 2 for VF 4 For instruction: store i32 0, i32* %tmp1, align 4		; VF_4-NEXT: Found an estimated cost of 4 for VF 4 For instruction: store i32 0, i32* %tmp1, align 4
; VF_8-LABEL: Checking a loop in "i32_factor_2"		; VF_8-LABEL: Checking a loop in "i32_factor_2"
; VF_8: Found an estimated cost of 4 for VF 8 For instruction: %tmp2 = load i32, i32* %tmp0, align 4		; VF_8: Found an estimated cost of 8 for VF 8 For instruction: %tmp2 = load i32, i32* %tmp0, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i32, i32* %tmp1, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4
; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4		; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4
; VF_16-LABEL: Checking a loop in "i32_factor_2"		; VF_16-LABEL: Checking a loop in "i32_factor_2"
; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load i32, i32* %tmp0, align 4		; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp2 = load i32, i32* %tmp0, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp1, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load i32, i32* %tmp1, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4
; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4		; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %i32.2, %i32.2* %data, i64 %i, i32 1
%tmp2 = load i32, i32* %tmp0, align 4		%tmp2 = load i32, i32* %tmp0, align 4
%tmp3 = load i32, i32* %tmp1, align 4		%tmp3 = load i32, i32* %tmp1, align 4
store i32 0, i32* %tmp0, align 4		store i32 0, i32* %tmp0, align 4
store i32 0, i32* %tmp1, align 4		store i32 0, i32* %tmp1, align 4
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store half 0xH0000, half* %tmp0, align 2		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store half 0xH0000, half* %tmp0, align 2
; VF_2-NEXT: Found an estimated cost of 12 for VF 2 For instruction: store half 0xH0000, half* %tmp1, align 2		; VF_2-NEXT: Found an estimated cost of 12 for VF 2 For instruction: store half 0xH0000, half* %tmp1, align 2
; VF_4-LABEL: Checking a loop in "f16_factor_2"		; VF_4-LABEL: Checking a loop in "f16_factor_2"
; VF_4: Found an estimated cost of 72 for VF 4 For instruction: %tmp2 = load half, half* %tmp0, align 2		; VF_4: Found an estimated cost of 72 for VF 4 For instruction: %tmp2 = load half, half* %tmp0, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2
; VF_4-NEXT: Found an estimated cost of 40 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2		; VF_4-NEXT: Found an estimated cost of 40 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2
; VF_8-LABEL: Checking a loop in "f16_factor_2"		; VF_8-LABEL: Checking a loop in "f16_factor_2"
; VF_8: Found an estimated cost of 272 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2		; VF_8: Found an estimated cost of 4 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2
; VF_8-NEXT: Found an estimated cost of 144 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2		; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2
; VF_16-LABEL: Checking a loop in "f16_factor_2"		; VF_16-LABEL: Checking a loop in "f16_factor_2"
; VF_16: Found an estimated cost of 1056 for VF 16 For instruction: %tmp2 = load half, half* %tmp0, align 2		; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load half, half* %tmp0, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load half, half* %tmp1, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load half, half* %tmp1, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2
; VF_16-NEXT: Found an estimated cost of 544 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2		; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %f16.2, %f16.2* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %f16.2, %f16.2* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %f16.2, %f16.2* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %f16.2, %f16.2* %data, i64 %i, i32 1
%tmp2 = load half, half* %tmp0, align 2		%tmp2 = load half, half* %tmp0, align 2
%tmp3 = load half, half* %tmp1, align 2		%tmp3 = load half, half* %tmp1, align 2
store half 0.0, half* %tmp0, align 2		store half 0.0, half* %tmp0, align 2
store half 0.0, half* %tmp1, align 2		store half 0.0, half* %tmp1, align 2
%i.next = add nuw nsw i64 %i, 1		%i.next = add nuw nsw i64 %i, 1
%cond = icmp slt i64 %i.next, %n		%cond = icmp slt i64 %i.next, %n
br i1 %cond, label %for.body, label %for.end		br i1 %cond, label %for.body, label %for.end

for.end:		for.end:
ret void		ret void
}		}

%f32.2 = type {float, float}		%f32.2 = type {float, float}
define void @f32_factor_2(%f32.2* %data, i64 %n) #0 {		define void @f32_factor_2(%f32.2* %data, i64 %n) #0 {
entry:		entry:
br label %for.body		br label %for.body

; VF_2-LABEL: Checking a loop in "f32_factor_2"		; VF_2-LABEL: Checking a loop in "f32_factor_2"
; VF_2: Found an estimated cost of 2 for VF 2 For instruction: %tmp2 = load float, float* %tmp0, align 4		; VF_2: Found an estimated cost of 20 for VF 2 For instruction: %tmp2 = load float, float* %tmp0, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load float, float* %tmp1, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load float, float* %tmp1, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store float 0.000000e+00, float* %tmp0, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store float 0.000000e+00, float* %tmp0, align 4
; VF_2-NEXT: Found an estimated cost of 2 for VF 2 For instruction: store float 0.000000e+00, float* %tmp1, align 4		; VF_2-NEXT: Found an estimated cost of 12 for VF 2 For instruction: store float 0.000000e+00, float* %tmp1, align 4
; VF_4-LABEL: Checking a loop in "f32_factor_2"		; VF_4-LABEL: Checking a loop in "f32_factor_2"
; VF_4: Found an estimated cost of 2 for VF 4 For instruction: %tmp2 = load float, float* %tmp0, align 4		; VF_4: Found an estimated cost of 4 for VF 4 For instruction: %tmp2 = load float, float* %tmp0, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load float, float* %tmp1, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load float, float* %tmp1, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store float 0.000000e+00, float* %tmp0, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store float 0.000000e+00, float* %tmp0, align 4
; VF_4-NEXT: Found an estimated cost of 2 for VF 4 For instruction: store float 0.000000e+00, float* %tmp1, align 4		; VF_4-NEXT: Found an estimated cost of 4 for VF 4 For instruction: store float 0.000000e+00, float* %tmp1, align 4
; VF_8-LABEL: Checking a loop in "f32_factor_2"		; VF_8-LABEL: Checking a loop in "f32_factor_2"
; VF_8: Found an estimated cost of 4 for VF 8 For instruction: %tmp2 = load float, float* %tmp0, align 4		; VF_8: Found an estimated cost of 8 for VF 8 For instruction: %tmp2 = load float, float* %tmp0, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load float, float* %tmp1, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load float, float* %tmp1, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp0, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp0, align 4
; VF_8-NEXT: Found an estimated cost of 4 for VF 8 For instruction: store float 0.000000e+00, float* %tmp1, align 4		; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store float 0.000000e+00, float* %tmp1, align 4
; VF_16-LABEL: Checking a loop in "f32_factor_2"		; VF_16-LABEL: Checking a loop in "f32_factor_2"
; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp2 = load float, float* %tmp0, align 4		; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp2 = load float, float* %tmp0, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load float, float* %tmp1, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp3 = load float, float* %tmp1, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4
; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4		; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %f32.2, %f32.2* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %f32.2, %f32.2* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %f32.2, %f32.2* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %f32.2, %f32.2* %data, i64 %i, i32 1
%tmp2 = load float, float* %tmp0, align 4		%tmp2 = load float, float* %tmp0, align 4
%tmp3 = load float, float* %tmp1, align 4		%tmp3 = load float, float* %tmp1, align 4
store float 0.0, float* %tmp0, align 4		store float 0.0, float* %tmp0, align 4
store float 0.0, float* %tmp1, align 4		store float 0.0, float* %tmp1, align 4
▲ Show 20 Lines • Show All 445 Lines • ▼ Show 20 Lines
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load i8, i8* %tmp1, align 1		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load i8, i8* %tmp1, align 1
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load i8, i8* %tmp2, align 1		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load i8, i8* %tmp2, align 1
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load i8, i8* %tmp3, align 1		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load i8, i8* %tmp3, align 1
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp0, align 1
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp1, align 1
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp2, align 1		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i8 0, i8* %tmp2, align 1
; VF_8-NEXT: Found an estimated cost of 288 for VF 8 For instruction: store i8 0, i8* %tmp3, align 1		; VF_8-NEXT: Found an estimated cost of 288 for VF 8 For instruction: store i8 0, i8* %tmp3, align 1
; VF_16-LABEL: Checking a loop in "i8_factor_4"		; VF_16-LABEL: Checking a loop in "i8_factor_4"
; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load i8, i8* %tmp0, align 1		; VF_16: Found an estimated cost of 8 for VF 16 For instruction: %tmp4 = load i8, i8* %tmp0, align 1
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i8, i8* %tmp1, align 1		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i8, i8* %tmp1, align 1
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i8, i8* %tmp2, align 1		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i8, i8* %tmp2, align 1
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i8, i8* %tmp3, align 1		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i8, i8* %tmp3, align 1
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp0, align 1
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp1, align 1
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp2, align 1		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i8 0, i8* %tmp2, align 1
; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store i8 0, i8* %tmp3, align 1		; VF_16-NEXT: Found an estimated cost of 8 for VF 16 For instruction: store i8 0, i8* %tmp3, align 1
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 1
%tmp2 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 2		%tmp2 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 2
%tmp3 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 3		%tmp3 = getelementptr inbounds %i8.4, %i8.4* %data, i64 %i, i32 3
%tmp4 = load i8, i8* %tmp0, align 1		%tmp4 = load i8, i8* %tmp0, align 1
%tmp5 = load i8, i8* %tmp1, align 1		%tmp5 = load i8, i8* %tmp1, align 1
Show All 30 Lines
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp5 = load i16, i16* %tmp1, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp5 = load i16, i16* %tmp1, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp6 = load i16, i16* %tmp2, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp6 = load i16, i16* %tmp2, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp7 = load i16, i16* %tmp3, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp7 = load i16, i16* %tmp3, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i16 0, i16* %tmp0, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i16 0, i16* %tmp0, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i16 0, i16* %tmp1, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i16 0, i16* %tmp1, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i16 0, i16* %tmp2, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i16 0, i16* %tmp2, align 2
; VF_4-NEXT: Found an estimated cost of 80 for VF 4 For instruction: store i16 0, i16* %tmp3, align 2		; VF_4-NEXT: Found an estimated cost of 80 for VF 4 For instruction: store i16 0, i16* %tmp3, align 2
; VF_8-LABEL: Checking a loop in "i16_factor_4"		; VF_8-LABEL: Checking a loop in "i16_factor_4"
; VF_8: Found an estimated cost of 544 for VF 8 For instruction: %tmp4 = load i16, i16* %tmp0, align 2		; VF_8: Found an estimated cost of 8 for VF 8 For instruction: %tmp4 = load i16, i16* %tmp0, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load i16, i16* %tmp1, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load i16, i16* %tmp1, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load i16, i16* %tmp2, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load i16, i16* %tmp2, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load i16, i16* %tmp3, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load i16, i16* %tmp3, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp0, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp1, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp2, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i16 0, i16* %tmp2, align 2
; VF_8-NEXT: Found an estimated cost of 288 for VF 8 For instruction: store i16 0, i16* %tmp3, align 2		; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store i16 0, i16* %tmp3, align 2
; VF_16-LABEL: Checking a loop in "i16_factor_4"		; VF_16-LABEL: Checking a loop in "i16_factor_4"
; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load i16, i16* %tmp0, align 2		; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp4 = load i16, i16* %tmp0, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i16, i16* %tmp1, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i16, i16* %tmp1, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i16, i16* %tmp2, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i16, i16* %tmp2, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i16, i16* %tmp3, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i16, i16* %tmp3, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp0, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp1, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp2, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i16 0, i16* %tmp2, align 2
; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store i16 0, i16* %tmp3, align 2		; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store i16 0, i16* %tmp3, align 2
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 1
%tmp2 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 2		%tmp2 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 2
%tmp3 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 3		%tmp3 = getelementptr inbounds %i16.4, %i16.4* %data, i64 %i, i32 3
%tmp4 = load i16, i16* %tmp0, align 2		%tmp4 = load i16, i16* %tmp0, align 2
%tmp5 = load i16, i16* %tmp1, align 2		%tmp5 = load i16, i16* %tmp1, align 2
Show All 21 Lines
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp5 = load i32, i32* %tmp1, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp5 = load i32, i32* %tmp1, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp6 = load i32, i32* %tmp2, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp6 = load i32, i32* %tmp2, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp7 = load i32, i32* %tmp3, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp7 = load i32, i32* %tmp3, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i32 0, i32* %tmp0, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i32 0, i32* %tmp0, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i32 0, i32* %tmp1, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i32 0, i32* %tmp1, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i32 0, i32* %tmp2, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store i32 0, i32* %tmp2, align 4
; VF_2-NEXT: Found an estimated cost of 24 for VF 2 For instruction: store i32 0, i32* %tmp3, align 4		; VF_2-NEXT: Found an estimated cost of 24 for VF 2 For instruction: store i32 0, i32* %tmp3, align 4
; VF_4-LABEL: Checking a loop in "i32_factor_4"		; VF_4-LABEL: Checking a loop in "i32_factor_4"
; VF_4: Found an estimated cost of 144 for VF 4 For instruction: %tmp4 = load i32, i32* %tmp0, align 4		; VF_4: Found an estimated cost of 8 for VF 4 For instruction: %tmp4 = load i32, i32* %tmp0, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp5 = load i32, i32* %tmp1, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp5 = load i32, i32* %tmp1, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp6 = load i32, i32* %tmp2, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp6 = load i32, i32* %tmp2, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp7 = load i32, i32* %tmp3, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp7 = load i32, i32* %tmp3, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i32 0, i32* %tmp0, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i32 0, i32* %tmp0, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i32 0, i32* %tmp1, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i32 0, i32* %tmp1, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i32 0, i32* %tmp2, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store i32 0, i32* %tmp2, align 4
; VF_4-NEXT: Found an estimated cost of 80 for VF 4 For instruction: store i32 0, i32* %tmp3, align 4		; VF_4-NEXT: Found an estimated cost of 8 for VF 4 For instruction: store i32 0, i32* %tmp3, align 4
; VF_8-LABEL: Checking a loop in "i32_factor_4"		; VF_8-LABEL: Checking a loop in "i32_factor_4"
; VF_8: Found an estimated cost of 544 for VF 8 For instruction: %tmp4 = load i32, i32* %tmp0, align 4		; VF_8: Found an estimated cost of 16 for VF 8 For instruction: %tmp4 = load i32, i32* %tmp0, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load i32, i32* %tmp1, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load i32, i32* %tmp1, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load i32, i32* %tmp2, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load i32, i32* %tmp2, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load i32, i32* %tmp3, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load i32, i32* %tmp3, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp0, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp1, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp2, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store i32 0, i32* %tmp2, align 4
; VF_8-NEXT: Found an estimated cost of 288 for VF 8 For instruction: store i32 0, i32* %tmp3, align 4		; VF_8-NEXT: Found an estimated cost of 16 for VF 8 For instruction: store i32 0, i32* %tmp3, align 4
; VF_16-LABEL: Checking a loop in "i32_factor_4"		; VF_16-LABEL: Checking a loop in "i32_factor_4"
; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load i32, i32* %tmp0, align 4		; VF_16: Found an estimated cost of 32 for VF 16 For instruction: %tmp4 = load i32, i32* %tmp0, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i32, i32* %tmp1, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load i32, i32* %tmp1, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i32, i32* %tmp2, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load i32, i32* %tmp2, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i32, i32* %tmp3, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load i32, i32* %tmp3, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp0, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp1, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp2, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store i32 0, i32* %tmp2, align 4
; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store i32 0, i32* %tmp3, align 4		; VF_16-NEXT: Found an estimated cost of 32 for VF 16 For instruction: store i32 0, i32* %tmp3, align 4
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 1
%tmp2 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 2		%tmp2 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 2
%tmp3 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 3		%tmp3 = getelementptr inbounds %i32.4, %i32.4* %data, i64 %i, i32 3
%tmp4 = load i32, i32* %tmp0, align 4		%tmp4 = load i32, i32* %tmp0, align 4
%tmp5 = load i32, i32* %tmp1, align 4		%tmp5 = load i32, i32* %tmp1, align 4
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp5 = load half, half* %tmp1, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp5 = load half, half* %tmp1, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp6 = load half, half* %tmp2, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp6 = load half, half* %tmp2, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp7 = load half, half* %tmp3, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp7 = load half, half* %tmp3, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp2, align 2		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp2, align 2
; VF_4-NEXT: Found an estimated cost of 80 for VF 4 For instruction: store half 0xH0000, half* %tmp3, align 2		; VF_4-NEXT: Found an estimated cost of 80 for VF 4 For instruction: store half 0xH0000, half* %tmp3, align 2
; VF_8-LABEL: Checking a loop in "f16_factor_4"		; VF_8-LABEL: Checking a loop in "f16_factor_4"
; VF_8: Found an estimated cost of 544 for VF 8 For instruction: %tmp4 = load half, half* %tmp0, align 2		; VF_8: Found an estimated cost of 8 for VF 8 For instruction: %tmp4 = load half, half* %tmp0, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load half, half* %tmp1, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load half, half* %tmp1, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load half, half* %tmp2, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load half, half* %tmp2, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load half, half* %tmp3, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load half, half* %tmp3, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp2, align 2		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp2, align 2
; VF_8-NEXT: Found an estimated cost of 288 for VF 8 For instruction: store half 0xH0000, half* %tmp3, align 2		; VF_8-NEXT: Found an estimated cost of 8 for VF 8 For instruction: store half 0xH0000, half* %tmp3, align 2
; VF_16-LABEL: Checking a loop in "f16_factor_4"		; VF_16-LABEL: Checking a loop in "f16_factor_4"
; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load half, half* %tmp0, align 2		; VF_16: Found an estimated cost of 16 for VF 16 For instruction: %tmp4 = load half, half* %tmp0, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load half, half* %tmp1, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load half, half* %tmp1, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load half, half* %tmp2, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load half, half* %tmp2, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load half, half* %tmp3, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load half, half* %tmp3, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp0, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp1, align 2
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp2, align 2		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store half 0xH0000, half* %tmp2, align 2
; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store half 0xH0000, half* %tmp3, align 2		; VF_16-NEXT: Found an estimated cost of 16 for VF 16 For instruction: store half 0xH0000, half* %tmp3, align 2
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 1
%tmp2 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 2		%tmp2 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 2
%tmp3 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 3		%tmp3 = getelementptr inbounds %f16.4, %f16.4* %data, i64 %i, i32 3
%tmp4 = load half, half* %tmp0, align 2		%tmp4 = load half, half* %tmp0, align 2
%tmp5 = load half, half* %tmp1, align 2		%tmp5 = load half, half* %tmp1, align 2
Show All 21 Lines
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp5 = load float, float* %tmp1, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp5 = load float, float* %tmp1, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp6 = load float, float* %tmp2, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp6 = load float, float* %tmp2, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp7 = load float, float* %tmp3, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp7 = load float, float* %tmp3, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store float 0.000000e+00, float* %tmp0, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store float 0.000000e+00, float* %tmp0, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store float 0.000000e+00, float* %tmp1, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store float 0.000000e+00, float* %tmp1, align 4
; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store float 0.000000e+00, float* %tmp2, align 4		; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: store float 0.000000e+00, float* %tmp2, align 4
; VF_2-NEXT: Found an estimated cost of 24 for VF 2 For instruction: store float 0.000000e+00, float* %tmp3, align 4		; VF_2-NEXT: Found an estimated cost of 24 for VF 2 For instruction: store float 0.000000e+00, float* %tmp3, align 4
; VF_4-LABEL: Checking a loop in "f32_factor_4"		; VF_4-LABEL: Checking a loop in "f32_factor_4"
; VF_4: Found an estimated cost of 144 for VF 4 For instruction: %tmp4 = load float, float* %tmp0, align 4		; VF_4: Found an estimated cost of 8 for VF 4 For instruction: %tmp4 = load float, float* %tmp0, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp5 = load float, float* %tmp1, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp5 = load float, float* %tmp1, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp6 = load float, float* %tmp2, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp6 = load float, float* %tmp2, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp7 = load float, float* %tmp3, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp7 = load float, float* %tmp3, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store float 0.000000e+00, float* %tmp0, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store float 0.000000e+00, float* %tmp0, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store float 0.000000e+00, float* %tmp1, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store float 0.000000e+00, float* %tmp1, align 4
; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store float 0.000000e+00, float* %tmp2, align 4		; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store float 0.000000e+00, float* %tmp2, align 4
; VF_4-NEXT: Found an estimated cost of 80 for VF 4 For instruction: store float 0.000000e+00, float* %tmp3, align 4		; VF_4-NEXT: Found an estimated cost of 8 for VF 4 For instruction: store float 0.000000e+00, float* %tmp3, align 4
; VF_8-LABEL: Checking a loop in "f32_factor_4"		; VF_8-LABEL: Checking a loop in "f32_factor_4"
; VF_8: Found an estimated cost of 544 for VF 8 For instruction: %tmp4 = load float, float* %tmp0, align 4		; VF_8: Found an estimated cost of 16 for VF 8 For instruction: %tmp4 = load float, float* %tmp0, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load float, float* %tmp1, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp5 = load float, float* %tmp1, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load float, float* %tmp2, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp6 = load float, float* %tmp2, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load float, float* %tmp3, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp7 = load float, float* %tmp3, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp0, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp0, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp1, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp1, align 4
; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp2, align 4		; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store float 0.000000e+00, float* %tmp2, align 4
; VF_8-NEXT: Found an estimated cost of 288 for VF 8 For instruction: store float 0.000000e+00, float* %tmp3, align 4		; VF_8-NEXT: Found an estimated cost of 16 for VF 8 For instruction: store float 0.000000e+00, float* %tmp3, align 4
; VF_16-LABEL: Checking a loop in "f32_factor_4"		; VF_16-LABEL: Checking a loop in "f32_factor_4"
; VF_16: Found an estimated cost of 2112 for VF 16 For instruction: %tmp4 = load float, float* %tmp0, align 4		; VF_16: Found an estimated cost of 32 for VF 16 For instruction: %tmp4 = load float, float* %tmp0, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load float, float* %tmp1, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp5 = load float, float* %tmp1, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load float, float* %tmp2, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp6 = load float, float* %tmp2, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load float, float* %tmp3, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: %tmp7 = load float, float* %tmp3, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp0, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp1, align 4
; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp2, align 4		; VF_16-NEXT: Found an estimated cost of 0 for VF 16 For instruction: store float 0.000000e+00, float* %tmp2, align 4
; VF_16-NEXT: Found an estimated cost of 1088 for VF 16 For instruction: store float 0.000000e+00, float* %tmp3, align 4		; VF_16-NEXT: Found an estimated cost of 32 for VF 16 For instruction: store float 0.000000e+00, float* %tmp3, align 4
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
%tmp0 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 0		%tmp0 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 0
%tmp1 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 1		%tmp1 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 1
%tmp2 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 2		%tmp2 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 2
%tmp3 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 3		%tmp3 = getelementptr inbounds %f32.4, %f32.4* %data, i64 %i, i32 3
%tmp4 = load float, float* %tmp0, align 4		%tmp4 = load float, float* %tmp0, align 4
%tmp5 = load float, float* %tmp1, align 4		%tmp5 = load float, float* %tmp1, align 4
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] MVE interleaving load and stores.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 230109

llvm/lib/Target/ARM/ARMISelLowering.h

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/test/CodeGen/Thumb2/mve-vld2.ll

llvm/test/CodeGen/Thumb2/mve-vld4.ll

llvm/test/CodeGen/Thumb2/mve-vst2.ll

llvm/test/CodeGen/Thumb2/mve-vst4.ll

llvm/test/Transforms/InterleavedAccess/ARM/interleaved-accesses.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-interleaved-cost.ll

[ARM] MVE interleaving load and stores.
ClosedPublic