This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsARM.td
-
lib/
-
IR/
-
AutoUpgrade.cpp
-
Target/ARM/
-
ARM/
-
ARMISelLowering.cpp
-
test/
-
Analysis/
-
BasicAA/
-
cs-cs.ll
-
intrinsics.ll
-
TypeBasedAliasAnalysis/
-
intrinsics.ll
-
CodeGen/
-
ARM/
-
2010-05-20-NEONSpillCrash.ll
-
2010-05-21-BuildVector.ll
-
2010-06-11-vmovdrr-bitcast.ll
-
2010-06-29-PartialRedefFastAlloc.ll
-
2011-08-12-vmovqqqq-pseudo.ll
-
2012-01-24-RegSequenceLiveRange.ll
-
2012-05-10-PreferVMOVtoVDUP32.ll
-
2012-08-27-CopyPhysRegCrash.ll
-
2013-10-11-select-stalls.ll
-
2014-01-09-pseudo_expand_implicit_reg.ll
-
arm-interleaved-accesses.ll
-
coalesce-subregs.ll
-
dagcombine-concatvector.ll
-
neon_spill.ll
-
out-of-registers.ll
-
reg_sequence.ll
-
spill-q.ll
-
vcge.ll
-
vector-DAGCombine.ll
-
vld-vst-upgrade.ll
-
vld1.ll
-
vld2.ll
-
vld3.ll
-
vld4.ll
-
vlddup.ll
-
vldlane.ll
-
vmov.ll
-
vmul.ll
-
vst1.ll
-
vst2.ll
-
vst3.ll
-
vst4.ll
-
vstlane.ll
-
Thumb2/
-
crash.ll
-
machine-licm.ll
-
thumb2-spill-q.ll
-
v8_IT_1.ll
-
Transforms/
-
InstCombine/
-
neon-intrinsics.ll
-
LoopStrengthReduce/ARM/
-
ARM/
-
ivchain-ARM.ll

Differential D12985

[ARM] Take into account address spaces in interleaved access vectorization
ClosedPublic

Authored by jketema on Sep 18 2015, 2:31 PM.

Download Raw Diff

Details

Reviewers

rengolin
sbaranga

Commits

rGab99b59e8ca2: [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|…
rL248887: [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|…

Summary

The vector being loaded might not be in address space 0. In this case the vldn/vstn call creation will trigger an assert in CallInst:Create, because the argument type and the parameter are in different address spaces. Resolve this by adding an appropriate address space cast.

Diff Detail

Repository: rL LLVM

Event Timeline

jketema updated this revision to Diff 35137.Sep 18 2015, 2:31 PM

jketema retitled this revision from to [ARM] Take into account address spaces in interleaved access vectorization.

jketema updated this object.

jketema added reviewers: sbaranga, rengolin.

jketema added a subscriber: llvm-commits.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptSep 18 2015, 2:31 PM

sbaranga added inline comments.Sep 21 2015, 2:40 AM

lib/Target/ARM/ARMISelLowering.cpp
11708 ↗	(On Diff #35137)	This is interesting.. We ideally wouldn't want to introduce address space casts when not necessary (I can imagine some users using the address space qualifier to help with alias analysis). We should be able to get a variant of the vldn intrinsic that operates on the correct address space.

jketema added inline comments.Sep 21 2015, 9:30 AM

lib/Target/ARM/ARMISelLowering.cpp
11708 ↗	(On Diff #35137)	Maybe. I would need some pointers on how to change the code in that case, because this would require changes to `IntrinsicsARM.td`, which I'm not very familiar with. I'm also wondering: is alias analysis information still being used during instruction lowering or later?

sbaranga added inline comments.Sep 22 2015, 2:37 AM

lib/Target/ARM/ARMISelLowering.cpp
11708 ↗	(On Diff #35137)	I've looked into this and changing IntrinisicsARM.td should be fairly simple, all that's needed is to replace in definitions like int_arm_neon_vld2 the llvm_ptr_ty argument with a llvm_anyptr_ty. However it turns out that this changes the intrinisic name mangling (which might be a problem with backward compatibility) and also requires clang changes. This is of course a bit complicated..

jketema added inline comments.Sep 22 2015, 4:42 AM

lib/Target/ARM/ARMISelLowering.cpp
11708 ↗	(On Diff #35137)	Thanks for the pointer regarding llvm_anyptr_ty. Maybe map, say, `__builtin_neon_vld3_v` to `@llvm.arm.neon.vld3.v2i32.p0i8` instead of `@llvm.arm.neon.vld3.v2i32` and keep that as the only exposed version? Or are there objections to that solution too?

sbaranga added inline comments.Sep 22 2015, 4:49 AM

lib/Target/ARM/ARMISelLowering.cpp
11708 ↗	(On Diff #35137)	Yes, we would have to do that and also add auto-upgrade support for the old forms of the intrinsics. Perhaps an email should be sent to llvm-dev as well.

jketema added inline comments.Sep 22 2015, 6:16 AM

lib/Target/ARM/ARMISelLowering.cpp
11708 ↗	(On Diff #35137)	I'm not sure what you're saying. Sticking with `vld3` Do you mean you both want to expose `@llvm.arm.neon.vld3.v2i32.p0i8` and `@llvm.arm.neon.vld3.v2i32`? Or do you want more exposed? I'm also not quite sure what you mean by auto-upgrade support?

sbaranga added inline comments.Sep 23 2015, 2:07 AM

lib/Target/ARM/ARMISelLowering.cpp
11708 ↗	(On Diff #35137)	I think we should make the change to IntrinsicARM.td (replace llvm_ptr_ty with llvm_anyptr_ty for the vldN intrinsics). At that point we will only have the new style intrinsics in llvm IR (like @llvm.arm.neon.vld3.v2i32.p0i8) and the old ones ( like @llvm.arm.neon.vld3.v2i32) will not be supported anymore. We therefore need to add a case in lib/IR/AutoUpgrade.cpp for the old style vldN intrinsics so that we can convert old style intrinsics to the new one and not break backwards compatibility. Regarding the clang changes (__builtin_neon_vld*..), yes, these should be emitting @llvm.arm.neon.vld3.v2i32.p0i8.

jketema added inline comments.Sep 23 2015, 2:21 AM

lib/Target/ARM/ARMISelLowering.cpp
11708 ↗	(On Diff #35137)	Got it, thanks for the explanation. I was not aware of lib/IR/AutoUpgrade.cpp. I'll update this patch to take your comments into account and whip up an associated patch for clang. Once that's done I'll also send a message to llvm-dev to ask for more feedback.

This now upgrades the vld[234] and vst[234] intrinsics to take into account an address space, as discussed.

Open issues as far as I'm concerned:

Should vld1 and vst1 also be upgraded for uniformity? Similarly for the vld[234]lane and vst[234]lane instructions?
Should all the tests referring to vld[234] and vst[234] be updated to use the new intrinsic names, or are they allowed to depend on the auto-upgrade code?

Thanks, Jeroen!

In D12985#252574, @jketema wrote:

This now upgrades the vld[234] and vst[234] intrinsics to take into account an address space, as discussed.

Open issues as far as I'm concerned:

Should vld1 and vst1 also be upgraded for uniformity? Similarly for the vld[234]lane and vst[234]lane instructions?

They seem to have the same issues, so I would say yes.

Should all the tests referring to vld[234] and vst[234] be updated to use the new intrinsic names, or are they allowed to depend on the auto-upgrade code?

I think they should use the new names, and have different tests to check that the old ones get auto-upgraded.

This extends the patch to cover all of vld[1234], vls[234]lane, vst[1234], and vst[234]lane. All tests now use the updated intrinsic names, and new tests have been added to test auto-upgrading.

Should I still email llvm-dev about this? If so, what exaclty should I ask about.

Forgot to say in my previous comment. The related clang patch is http://reviews.llvm.org/D13127

Herald added subscribers: qcolombet, MatzeB. · View Herald TranscriptSep 28 2015, 8:03 AM

In D12985#254767, @jketema wrote:

This extends the patch to cover all of vld[1234], vls[234]lane, vst[1234], and vst[234]lane. All tests now use the updated intrinsic names, and new tests have been added to test auto-upgrading.

Should I still email llvm-dev about this? If so, what exaclty should I ask about.

Forgot to say in my previous comment. The related clang patch is http://reviews.llvm.org/D13127

Both patches look good!

The purpose of the email to llvm-dev would be to announce the interface change and to ask if this would be a problem for anyone (it shouldn't be since you've added the auto-upgrade).

LGTM

This revision is now accepted and ready to land.Sep 30 2015, 2:37 AM

Closed by commit rL248887: [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|… (authored by jketema). · Explain WhySep 30 2015, 3:58 AM

This revision was automatically updated to reflect the committed changes.

Committed. Thanks for all the feedback!

Thanks for all the work! :)

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

IntrinsicsARM.td

44 lines

lib/

IR/

AutoUpgrade.cpp

55 lines

Target/

ARM/

ARMISelLowering.cpp

13 lines

test/

Analysis/

BasicAA/

cs-cs.ll

34 lines

intrinsics.ll

24 lines

TypeBasedAliasAnalysis/

intrinsics.ll

14 lines

CodeGen/

ARM/

2010-05-20-NEONSpillCrash.ll

24 lines

2010-05-21-BuildVector.ll

4 lines

2010-06-11-vmovdrr-bitcast.ll

4 lines

2010-06-29-PartialRedefFastAlloc.ll

4 lines

2011-08-12-vmovqqqq-pseudo.ll

4 lines

2012-01-24-RegSequenceLiveRange.ll

10 lines

2012-05-10-PreferVMOVtoVDUP32.ll

4 lines

2012-08-27-CopyPhysRegCrash.ll

14 lines

2013-10-11-select-stalls.ll

6 lines

2014-01-09-pseudo_expand_implicit_reg.ll

4 lines

arm-interleaved-accesses.ll

21 lines

coalesce-subregs.ll

38 lines

dagcombine-concatvector.ll

4 lines

6 lines

8 lines

64 lines

28 lines

4 lines

4 lines

139 lines

54 lines

42 lines

44 lines

44 lines

30 lines

98 lines

4 lines

14 lines

50 lines

46 lines

42 lines

44 lines

92 lines

Thumb2/

14 lines

8 lines

28 lines

4 lines

Transforms/

InstCombine/

neon-intrinsics.ll

12 lines

LoopStrengthReduce/

ARM/

ivchain-ARM.ll

50 lines

Diff 36081

llvm/trunk/include/llvm/IR/IntrinsicsARM.td

	Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines
	def int_arm_neon_vrinta : Neon_1Arg_Intrinsic;			def int_arm_neon_vrinta : Neon_1Arg_Intrinsic;
	def int_arm_neon_vrintz : Neon_1Arg_Intrinsic;			def int_arm_neon_vrintz : Neon_1Arg_Intrinsic;
	def int_arm_neon_vrintm : Neon_1Arg_Intrinsic;			def int_arm_neon_vrintm : Neon_1Arg_Intrinsic;
	def int_arm_neon_vrintp : Neon_1Arg_Intrinsic;			def int_arm_neon_vrintp : Neon_1Arg_Intrinsic;

	// De-interleaving vector loads from N-element structures.			// De-interleaving vector loads from N-element structures.
	// Source operands are the address and alignment.			// Source operands are the address and alignment.
	def int_arm_neon_vld1 : Intrinsic<[llvm_anyvector_ty],			def int_arm_neon_vld1 : Intrinsic<[llvm_anyvector_ty],
	[llvm_ptr_ty, llvm_i32_ty],			[llvm_anyptr_ty, llvm_i32_ty],
	[IntrReadArgMem]>;			[IntrReadArgMem]>;
	def int_arm_neon_vld2 : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>],			def int_arm_neon_vld2 : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>],
	[llvm_ptr_ty, llvm_i32_ty],			[llvm_anyptr_ty, llvm_i32_ty],
	[IntrReadArgMem]>;			[IntrReadArgMem]>;
	def int_arm_neon_vld3 : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,			def int_arm_neon_vld3 : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,
	LLVMMatchType<0>],			LLVMMatchType<0>],
	[llvm_ptr_ty, llvm_i32_ty],			[llvm_anyptr_ty, llvm_i32_ty],
	[IntrReadArgMem]>;			[IntrReadArgMem]>;
	def int_arm_neon_vld4 : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,			def int_arm_neon_vld4 : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,
	LLVMMatchType<0>, LLVMMatchType<0>],			LLVMMatchType<0>, LLVMMatchType<0>],
	[llvm_ptr_ty, llvm_i32_ty],			[llvm_anyptr_ty, llvm_i32_ty],
	[IntrReadArgMem]>;			[IntrReadArgMem]>;

	// Vector load N-element structure to one lane.			// Vector load N-element structure to one lane.
	// Source operands are: the address, the N input vectors (since only one			// Source operands are: the address, the N input vectors (since only one
	// lane is assigned), the lane number, and the alignment.			// lane is assigned), the lane number, and the alignment.
	def int_arm_neon_vld2lane : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>],			def int_arm_neon_vld2lane : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>],
	[llvm_ptr_ty, LLVMMatchType<0>,			[llvm_anyptr_ty, LLVMMatchType<0>,
	LLVMMatchType<0>, llvm_i32_ty,			LLVMMatchType<0>, llvm_i32_ty,
	llvm_i32_ty], [IntrReadArgMem]>;			llvm_i32_ty], [IntrReadArgMem]>;
	def int_arm_neon_vld3lane : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,			def int_arm_neon_vld3lane : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,
	LLVMMatchType<0>],			LLVMMatchType<0>],
	[llvm_ptr_ty, LLVMMatchType<0>,			[llvm_anyptr_ty, LLVMMatchType<0>,
	LLVMMatchType<0>, LLVMMatchType<0>,			LLVMMatchType<0>, LLVMMatchType<0>,
	llvm_i32_ty, llvm_i32_ty],			llvm_i32_ty, llvm_i32_ty],
	[IntrReadArgMem]>;			[IntrReadArgMem]>;
	def int_arm_neon_vld4lane : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,			def int_arm_neon_vld4lane : Intrinsic<[llvm_anyvector_ty, LLVMMatchType<0>,
	LLVMMatchType<0>, LLVMMatchType<0>],			LLVMMatchType<0>, LLVMMatchType<0>],
	[llvm_ptr_ty, LLVMMatchType<0>,			[llvm_anyptr_ty, LLVMMatchType<0>,
	LLVMMatchType<0>, LLVMMatchType<0>,			LLVMMatchType<0>, LLVMMatchType<0>,
	LLVMMatchType<0>, llvm_i32_ty,			LLVMMatchType<0>, llvm_i32_ty,
	llvm_i32_ty], [IntrReadArgMem]>;			llvm_i32_ty], [IntrReadArgMem]>;

	// Interleaving vector stores from N-element structures.			// Interleaving vector stores from N-element structures.
	// Source operands are: the address, the N vectors, and the alignment.			// Source operands are: the address, the N vectors, and the alignment.
	def int_arm_neon_vst1 : Intrinsic<[],			def int_arm_neon_vst1 : Intrinsic<[],
	[llvm_ptr_ty, llvm_anyvector_ty,			[llvm_anyptr_ty, llvm_anyvector_ty,
	llvm_i32_ty], [IntrReadWriteArgMem]>;			llvm_i32_ty], [IntrReadWriteArgMem]>;
	def int_arm_neon_vst2 : Intrinsic<[],			def int_arm_neon_vst2 : Intrinsic<[],
	[llvm_ptr_ty, llvm_anyvector_ty,			[llvm_anyptr_ty, llvm_anyvector_ty,
	LLVMMatchType<0>, llvm_i32_ty],			LLVMMatchType<1>, llvm_i32_ty],
	[IntrReadWriteArgMem]>;			[IntrReadWriteArgMem]>;
	def int_arm_neon_vst3 : Intrinsic<[],			def int_arm_neon_vst3 : Intrinsic<[],
	[llvm_ptr_ty, llvm_anyvector_ty,			[llvm_anyptr_ty, llvm_anyvector_ty,
	LLVMMatchType<0>, LLVMMatchType<0>,			LLVMMatchType<1>, LLVMMatchType<1>,
	llvm_i32_ty], [IntrReadWriteArgMem]>;			llvm_i32_ty], [IntrReadWriteArgMem]>;
	def int_arm_neon_vst4 : Intrinsic<[],			def int_arm_neon_vst4 : Intrinsic<[],
	[llvm_ptr_ty, llvm_anyvector_ty,			[llvm_anyptr_ty, llvm_anyvector_ty,
	LLVMMatchType<0>, LLVMMatchType<0>,			LLVMMatchType<1>, LLVMMatchType<1>,
	LLVMMatchType<0>, llvm_i32_ty],			LLVMMatchType<1>, llvm_i32_ty],
	[IntrReadWriteArgMem]>;			[IntrReadWriteArgMem]>;

	// Vector store N-element structure from one lane.			// Vector store N-element structure from one lane.
	// Source operands are: the address, the N vectors, the lane number, and			// Source operands are: the address, the N vectors, the lane number, and
	// the alignment.			// the alignment.
	def int_arm_neon_vst2lane : Intrinsic<[],			def int_arm_neon_vst2lane : Intrinsic<[],
	[llvm_ptr_ty, llvm_anyvector_ty,			[llvm_anyptr_ty, llvm_anyvector_ty,
	LLVMMatchType<0>, llvm_i32_ty,			LLVMMatchType<1>, llvm_i32_ty,
	llvm_i32_ty], [IntrReadWriteArgMem]>;			llvm_i32_ty], [IntrReadWriteArgMem]>;
	def int_arm_neon_vst3lane : Intrinsic<[],			def int_arm_neon_vst3lane : Intrinsic<[],
	[llvm_ptr_ty, llvm_anyvector_ty,			[llvm_anyptr_ty, llvm_anyvector_ty,
	LLVMMatchType<0>, LLVMMatchType<0>,			LLVMMatchType<1>, LLVMMatchType<1>,
	llvm_i32_ty, llvm_i32_ty],			llvm_i32_ty, llvm_i32_ty],
	[IntrReadWriteArgMem]>;			[IntrReadWriteArgMem]>;
	def int_arm_neon_vst4lane : Intrinsic<[],			def int_arm_neon_vst4lane : Intrinsic<[],
	[llvm_ptr_ty, llvm_anyvector_ty,			[llvm_anyptr_ty, llvm_anyvector_ty,
	LLVMMatchType<0>, LLVMMatchType<0>,			LLVMMatchType<1>, LLVMMatchType<1>,
	LLVMMatchType<0>, llvm_i32_ty,			LLVMMatchType<1>, llvm_i32_ty,
	llvm_i32_ty], [IntrReadWriteArgMem]>;			llvm_i32_ty], [IntrReadWriteArgMem]>;

	// Vector bitwise select.			// Vector bitwise select.
	def int_arm_neon_vbsl : Intrinsic<[llvm_anyvector_ty],			def int_arm_neon_vbsl : Intrinsic<[llvm_anyvector_ty],
	[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],			[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
	[IntrNoMem]>;			[IntrNoMem]>;


	Show All 35 Lines

llvm/trunk/lib/IR/AutoUpgrade.cpp

Show All 21 Lines
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
		#include "llvm/Support/Regex.h"
#include <cstring>		#include <cstring>
using namespace llvm;		using namespace llvm;

// Upgrade the declarations of the SSE4.1 functions whose arguments have		// Upgrade the declarations of the SSE4.1 functions whose arguments have
// changed their type from v4f32 to v2i64.		// changed their type from v4f32 to v2i64.
static bool UpgradeSSE41Function(Function* F, Intrinsic::ID IID,		static bool UpgradeSSE41Function(Function* F, Intrinsic::ID IID,
Function *&NewFn) {		Function *&NewFn) {
// Check whether this is an old version of the function, which received		// Check whether this is an old version of the function, which received
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (Name.startswith("arm.neon.vclz")) {
"llvm.ctlz." + Name.substr(14), F->getParent());		"llvm.ctlz." + Name.substr(14), F->getParent());
return true;		return true;
}		}
if (Name.startswith("arm.neon.vcnt")) {		if (Name.startswith("arm.neon.vcnt")) {
NewFn = Intrinsic::getDeclaration(F->getParent(), Intrinsic::ctpop,		NewFn = Intrinsic::getDeclaration(F->getParent(), Intrinsic::ctpop,
F->arg_begin()->getType());		F->arg_begin()->getType());
return true;		return true;
}		}
		Regex vldRegex("^arm\\.neon\\.vld([1234]\|[234]lane)\\.v[a-z0-9]*$");
		if (vldRegex.match(Name)) {
		auto fArgs = F->getFunctionType()->params();
		SmallVector<Type *, 4> Tys(fArgs.begin(), fArgs.end());
		// Can't use Intrinsic::getDeclaration here as the return types might
		// then only be structurally equal.
		FunctionType* fType = FunctionType::get(F->getReturnType(), Tys, false);
		NewFn = Function::Create(fType, F->getLinkage(),
		"llvm." + Name + ".p0i8", F->getParent());
		return true;
		}
		Regex vstRegex("^arm\\.neon\\.vst([1234]\|[234]lane)\\.v[a-z0-9]*$");
		if (vstRegex.match(Name)) {
		static Intrinsic::ID StoreInts[] = {Intrinsic::arm_neon_vst1,
		Intrinsic::arm_neon_vst2,
		Intrinsic::arm_neon_vst3,
		Intrinsic::arm_neon_vst4};

		static Intrinsic::ID StoreLaneInts[] = {Intrinsic::arm_neon_vst2lane,
		Intrinsic::arm_neon_vst3lane,
		Intrinsic::arm_neon_vst4lane};

		auto fArgs = F->getFunctionType()->params();
		Type *Tys[] = {fArgs[0], fArgs[1]};
		if (Name.find("lane") == StringRef::npos)
		NewFn = Intrinsic::getDeclaration(F->getParent(),
		StoreInts[fArgs.size() - 3], Tys);
		else
		NewFn = Intrinsic::getDeclaration(F->getParent(),
		StoreLaneInts[fArgs.size() - 5], Tys);
		return true;
		}
break;		break;
}		}

case 'c': {		case 'c': {
if (Name.startswith("ctlz.") && F->arg_size() == 1) {		if (Name.startswith("ctlz.") && F->arg_size() == 1) {
F->setName(Name + ".old");		F->setName(Name + ".old");
NewFn = Intrinsic::getDeclaration(F->getParent(), Intrinsic::ctlz,		NewFn = Intrinsic::getDeclaration(F->getParent(), Intrinsic::ctlz,
F->arg_begin()->getType());		F->arg_begin()->getType());
return true;		return true;
}		}
if (Name.startswith("cttz.") && F->arg_size() == 1) {		if (Name.startswith("cttz.") && F->arg_size() == 1) {
▲ Show 20 Lines • Show All 541 Lines • ▼ Show 20 Lines	void llvm::UpgradeIntrinsicCall(CallInst CI, Function NewFn) {
std::string Name = CI->getName();		std::string Name = CI->getName();
if (!Name.empty())		if (!Name.empty())
CI->setName(Name + ".old");		CI->setName(Name + ".old");

switch (NewFn->getIntrinsicID()) {		switch (NewFn->getIntrinsicID()) {
default:		default:
llvm_unreachable("Unknown function for CallInst upgrade.");		llvm_unreachable("Unknown function for CallInst upgrade.");

		case Intrinsic::arm_neon_vld1:
		case Intrinsic::arm_neon_vld2:
		case Intrinsic::arm_neon_vld3:
		case Intrinsic::arm_neon_vld4:
		case Intrinsic::arm_neon_vld2lane:
		case Intrinsic::arm_neon_vld3lane:
		case Intrinsic::arm_neon_vld4lane:
		case Intrinsic::arm_neon_vst1:
		case Intrinsic::arm_neon_vst2:
		case Intrinsic::arm_neon_vst3:
		case Intrinsic::arm_neon_vst4:
		case Intrinsic::arm_neon_vst2lane:
		case Intrinsic::arm_neon_vst3lane:
		case Intrinsic::arm_neon_vst4lane: {
		SmallVector<Value *, 4> Args(CI->arg_operands().begin(),
		CI->arg_operands().end());
		CI->replaceAllUsesWith(Builder.CreateCall(NewFn, Args));
		CI->eraseFromParent();
		return;
		}

case Intrinsic::ctlz:		case Intrinsic::ctlz:
case Intrinsic::cttz:		case Intrinsic::cttz:
assert(CI->getNumArgOperands() == 1 &&		assert(CI->getNumArgOperands() == 1 &&
"Mismatch between function args and call args");		"Mismatch between function args and call args");
CI->replaceAllUsesWith(Builder.CreateCall(		CI->replaceAllUsesWith(Builder.CreateCall(
NewFn, {CI->getArgOperand(0), Builder.getFalse()}, Name));		NewFn, {CI->getArgOperand(0), Builder.getFalse()}, Name));
CI->eraseFromParent();		CI->eraseFromParent();
return;		return;
▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,796 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::lowerInterleavedLoad(
if (EltTy->isPointerTy())		if (EltTy->isPointerTy())
VecTy =		VecTy =
VectorType::get(DL.getIntPtrType(EltTy), VecTy->getVectorNumElements());		VectorType::get(DL.getIntPtrType(EltTy), VecTy->getVectorNumElements());

static const Intrinsic::ID LoadInts[3] = {Intrinsic::arm_neon_vld2,		static const Intrinsic::ID LoadInts[3] = {Intrinsic::arm_neon_vld2,
Intrinsic::arm_neon_vld3,		Intrinsic::arm_neon_vld3,
Intrinsic::arm_neon_vld4};		Intrinsic::arm_neon_vld4};

Function *VldnFunc =
Intrinsic::getDeclaration(LI->getModule(), LoadInts[Factor - 2], VecTy);

IRBuilder<> Builder(LI);		IRBuilder<> Builder(LI);
SmallVector<Value *, 2> Ops;		SmallVector<Value *, 2> Ops;

Type *Int8Ptr = Builder.getInt8PtrTy(LI->getPointerAddressSpace());		Type *Int8Ptr = Builder.getInt8PtrTy(LI->getPointerAddressSpace());
Ops.push_back(Builder.CreateBitCast(LI->getPointerOperand(), Int8Ptr));		Ops.push_back(Builder.CreateBitCast(LI->getPointerOperand(), Int8Ptr));
Ops.push_back(Builder.getInt32(LI->getAlignment()));		Ops.push_back(Builder.getInt32(LI->getAlignment()));

		Type *Tys[] = { VecTy, Int8Ptr };
		Function *VldnFunc =
		Intrinsic::getDeclaration(LI->getModule(), LoadInts[Factor - 2], Tys);
CallInst *VldN = Builder.CreateCall(VldnFunc, Ops, "vldN");		CallInst *VldN = Builder.CreateCall(VldnFunc, Ops, "vldN");

// Replace uses of each shufflevector with the corresponding vector loaded		// Replace uses of each shufflevector with the corresponding vector loaded
// by ldN.		// by ldN.
for (unsigned i = 0; i < Shuffles.size(); i++) {		for (unsigned i = 0; i < Shuffles.size(); i++) {
ShuffleVectorInst *SV = Shuffles[i];		ShuffleVectorInst *SV = Shuffles[i];
unsigned Index = Indices[i];		unsigned Index = Indices[i];

▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	if (EltTy->isPointerTy()) {
Op1 = Builder.CreatePtrToInt(Op1, IntVecTy);		Op1 = Builder.CreatePtrToInt(Op1, IntVecTy);

SubVecTy = VectorType::get(IntTy, NumSubElts);		SubVecTy = VectorType::get(IntTy, NumSubElts);
}		}

static Intrinsic::ID StoreInts[3] = {Intrinsic::arm_neon_vst2,		static Intrinsic::ID StoreInts[3] = {Intrinsic::arm_neon_vst2,
Intrinsic::arm_neon_vst3,		Intrinsic::arm_neon_vst3,
Intrinsic::arm_neon_vst4};		Intrinsic::arm_neon_vst4};
Function *VstNFunc = Intrinsic::getDeclaration(
SI->getModule(), StoreInts[Factor - 2], SubVecTy);

SmallVector<Value *, 6> Ops;		SmallVector<Value *, 6> Ops;

Type *Int8Ptr = Builder.getInt8PtrTy(SI->getPointerAddressSpace());		Type *Int8Ptr = Builder.getInt8PtrTy(SI->getPointerAddressSpace());
Ops.push_back(Builder.CreateBitCast(SI->getPointerOperand(), Int8Ptr));		Ops.push_back(Builder.CreateBitCast(SI->getPointerOperand(), Int8Ptr));

		Type *Tys[] = { Int8Ptr, SubVecTy };
		Function *VstNFunc = Intrinsic::getDeclaration(
		SI->getModule(), StoreInts[Factor - 2], Tys);

// Split the shufflevector operands into sub vectors for the new vstN call.		// Split the shufflevector operands into sub vectors for the new vstN call.
for (unsigned i = 0; i < Factor; i++)		for (unsigned i = 0; i < Factor; i++)
Ops.push_back(Builder.CreateShuffleVector(		Ops.push_back(Builder.CreateShuffleVector(
Op0, Op1, getSequentialMask(Builder, NumSubElts * i, NumSubElts)));		Op0, Op1, getSequentialMask(Builder, NumSubElts * i, NumSubElts)));

Ops.push_back(Builder.getInt32(SI->getAlignment()));		Ops.push_back(Builder.getInt32(SI->getAlignment()));
Builder.CreateCall(VstNFunc, Ops);		Builder.CreateCall(VstNFunc, Ops);
return true;		return true;
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/BasicAA/cs-cs.ll

	; RUN: opt < %s -basicaa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -basicaa -aa-eval -print-all-alias-modref-info -disable-output 2>&1 \| FileCheck %s
	target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f32:32:32-f64:32:32-v64:32:64-v128:32:128-a0:0:32-n32"			target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f32:32:32-f64:32:32-v64:32:64-v128:32:128-a0:0:32-n32"
	target triple = "arm-apple-ios"			target triple = "arm-apple-ios"

	declare <8 x i16> @llvm.arm.neon.vld1.v8i16(i8*, i32) nounwind readonly			declare <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8*, i32) nounwind readonly
	declare void @llvm.arm.neon.vst1.v8i16(i8*, <8 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v8i16(i8*, <8 x i16>, i32) nounwind

	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind			declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1) nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind

	declare void @a_readonly_func(i8 *) noinline nounwind readonly			declare void @a_readonly_func(i8 *) noinline nounwind readonly

	define <8 x i16> @test1(i8* %p, <8 x i16> %y) {			define <8 x i16> @test1(i8* %p, <8 x i16> %y) {
	entry:			entry:
	%q = getelementptr i8, i8* %p, i64 16			%q = getelementptr i8, i8* %p, i64 16
	%a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) nounwind			%a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) nounwind
	call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	%b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) nounwind			%b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) nounwind
	%c = add <8 x i16> %a, %b			%c = add <8 x i16> %a, %b
	ret <8 x i16> %c			ret <8 x i16> %c

	; CHECK-LABEL: Function: test1:			; CHECK-LABEL: Function: test1:

	; CHECK: NoAlias: i8* %p, i8* %q			; CHECK: NoAlias: i8* %p, i8* %q
	; CHECK: Just Ref: Ptr: i8* %p <-> %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4			; CHECK: Just Ref: Ptr: i8* %p <-> %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4
	; CHECK: NoModRef: Ptr: i8* %q <-> %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4			; CHECK: NoModRef: Ptr: i8* %q <-> %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4
	; CHECK: NoModRef: Ptr: i8* %p <-> call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			; CHECK: NoModRef: Ptr: i8* %p <-> call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	; CHECK: Both ModRef: Ptr: i8* %q <-> call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			; CHECK: Both ModRef: Ptr: i8* %q <-> call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	; CHECK: Just Ref: Ptr: i8* %p <-> %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4			; CHECK: Just Ref: Ptr: i8* %p <-> %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4
	; CHECK: NoModRef: Ptr: i8* %q <-> %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4			; CHECK: NoModRef: Ptr: i8* %q <-> %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4
	; CHECK: NoModRef: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4 <-> call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			; CHECK: NoModRef: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4 <-> call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	; CHECK: NoModRef: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4 <-> %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4			; CHECK: NoModRef: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4 <-> %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4
	; CHECK: NoModRef: call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16) <-> %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4			; CHECK: NoModRef: call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16) <-> %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4
	; CHECK: NoModRef: call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16) <-> %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4			; CHECK: NoModRef: call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16) <-> %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4
	; CHECK: NoModRef: %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4 <-> %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4			; CHECK: NoModRef: %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4 <-> %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4
	; CHECK: NoModRef: %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) #4 <-> call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			; CHECK: NoModRef: %b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) #4 <-> call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	}			}

	define void @test2(i8* %P, i8* %Q) nounwind ssp {			define void @test2(i8* %P, i8* %Q) nounwind ssp {
	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i32 1, i1 false)			tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i32 1, i1 false)
	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i32 1, i1 false)			tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %P, i8* %Q, i64 12, i32 1, i1 false)
	ret void			ret void

	; CHECK-LABEL: Function: test2:			; CHECK-LABEL: Function: test2:
	▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/BasicAA/intrinsics.ll

	; RUN: opt -basicaa -gvn -S < %s \| FileCheck %s			; RUN: opt -basicaa -gvn -S < %s \| FileCheck %s

	target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f32:32:32-f64:32:32-v64:32:64-v128:32:128-a0:0:32-n32"			target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f32:32:32-f64:32:32-v64:32:64-v128:32:128-a0:0:32-n32"

	; BasicAA should prove that these calls don't interfere, since they are			; BasicAA should prove that these calls don't interfere, since they are
	; IntrArgReadMem and have noalias pointers.			; IntrArgReadMem and have noalias pointers.

	; CHECK: define <8 x i16> @test0(i8* noalias %p, i8* noalias %q, <8 x i16> %y) {			; CHECK: define <8 x i16> @test0(i8* noalias %p, i8* noalias %q, <8 x i16> %y) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) [[ATTR:#[0-9]+]]			; CHECK-NEXT: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) [[ATTR:#[0-9]+]]
	; CHECK-NEXT: call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			; CHECK-NEXT: call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	; CHECK-NEXT: %c = add <8 x i16> %a, %a			; CHECK-NEXT: %c = add <8 x i16> %a, %a
	define <8 x i16> @test0(i8* noalias %p, i8* noalias %q, <8 x i16> %y) {			define <8 x i16> @test0(i8* noalias %p, i8* noalias %q, <8 x i16> %y) {
	entry:			entry:
	%a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) nounwind			%a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) nounwind
	call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	%b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) nounwind			%b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) nounwind
	%c = add <8 x i16> %a, %b			%c = add <8 x i16> %a, %b
	ret <8 x i16> %c			ret <8 x i16> %c
	}			}

	; CHECK: define <8 x i16> @test1(i8* %p, <8 x i16> %y) {			; CHECK: define <8 x i16> @test1(i8* %p, <8 x i16> %y) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: %q = getelementptr i8, i8* %p, i64 16			; CHECK-NEXT: %q = getelementptr i8, i8* %p, i64 16
	; CHECK-NEXT: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) [[ATTR]]			; CHECK-NEXT: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) [[ATTR]]
	; CHECK-NEXT: call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			; CHECK-NEXT: call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	; CHECK-NEXT: %c = add <8 x i16> %a, %a			; CHECK-NEXT: %c = add <8 x i16> %a, %a
	define <8 x i16> @test1(i8* %p, <8 x i16> %y) {			define <8 x i16> @test1(i8* %p, <8 x i16> %y) {
	entry:			entry:
	%q = getelementptr i8, i8* %p, i64 16			%q = getelementptr i8, i8* %p, i64 16
	%a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) nounwind			%a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) nounwind
	call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	%b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) nounwind			%b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) nounwind
	%c = add <8 x i16> %a, %b			%c = add <8 x i16> %a, %b
	ret <8 x i16> %c			ret <8 x i16> %c
	}			}

	declare <8 x i16> @llvm.arm.neon.vld1.v8i16(i8*, i32) nounwind readonly			declare <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8*, i32) nounwind readonly
	declare void @llvm.arm.neon.vst1.v8i16(i8*, <8 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v8i16(i8*, <8 x i16>, i32) nounwind

	; CHECK: attributes #0 = { nounwind readonly argmemonly }			; CHECK: attributes #0 = { nounwind readonly argmemonly }
	; CHECK: attributes #1 = { nounwind argmemonly }			; CHECK: attributes #1 = { nounwind argmemonly }
	; CHECK: attributes [[ATTR]] = { nounwind }			; CHECK: attributes [[ATTR]] = { nounwind }

llvm/trunk/test/Analysis/TypeBasedAliasAnalysis/intrinsics.ll

	; RUN: opt -tbaa -basicaa -gvn -S < %s \| FileCheck %s			; RUN: opt -tbaa -basicaa -gvn -S < %s \| FileCheck %s

	target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f32:32:32-f64:32:32-v64:32:64-v128:32:128-a0:0:32-n32"			target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:32-f32:32:32-f64:32:32-v64:32:64-v128:32:128-a0:0:32-n32"

	; TBAA should prove that these calls don't interfere, since they are			; TBAA should prove that these calls don't interfere, since they are
	; IntrArgReadMem and have TBAA metadata.			; IntrArgReadMem and have TBAA metadata.

	; CHECK: define <8 x i16> @test0(i8* %p, i8* %q, <8 x i16> %y) {			; CHECK: define <8 x i16> @test0(i8* %p, i8* %q, <8 x i16> %y) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) [[NUW:#[0-9]+]]			; CHECK-NEXT: %a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) [[NUW:#[0-9]+]]
	; CHECK-NEXT: call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16)			; CHECK-NEXT: call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16)
	; CHECK-NEXT: %c = add <8 x i16> %a, %a			; CHECK-NEXT: %c = add <8 x i16> %a, %a
	define <8 x i16> @test0(i8* %p, i8* %q, <8 x i16> %y) {			define <8 x i16> @test0(i8* %p, i8* %q, <8 x i16> %y) {
	entry:			entry:
	%a = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) nounwind, !tbaa !2			%a = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) nounwind, !tbaa !2
	call void @llvm.arm.neon.vst1.v8i16(i8* %q, <8 x i16> %y, i32 16), !tbaa !1			call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %q, <8 x i16> %y, i32 16), !tbaa !1
	%b = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %p, i32 16) nounwind, !tbaa !2			%b = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %p, i32 16) nounwind, !tbaa !2
	%c = add <8 x i16> %a, %b			%c = add <8 x i16> %a, %b
	ret <8 x i16> %c			ret <8 x i16> %c
	}			}

	declare <8 x i16> @llvm.arm.neon.vld1.v8i16(i8*, i32) nounwind readonly			declare <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8*, i32) nounwind readonly
	declare void @llvm.arm.neon.vst1.v8i16(i8*, <8 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v8i16(i8*, <8 x i16>, i32) nounwind

	; CHECK: attributes #0 = { nounwind readonly argmemonly }			; CHECK: attributes #0 = { nounwind readonly argmemonly }
	; CHECK: attributes #1 = { nounwind argmemonly }			; CHECK: attributes #1 = { nounwind argmemonly }
	; CHECK: attributes [[NUW]] = { nounwind }			; CHECK: attributes [[NUW]] = { nounwind }

	!0 = !{!"tbaa root", null}			!0 = !{!"tbaa root", null}
	!1 = !{!3, !3, i64 0}			!1 = !{!3, !3, i64 0}
	!2 = !{!4, !4, i64 0}			!2 = !{!4, !4, i64 0}
	!3 = !{!"A", !0}			!3 = !{!"A", !0}
	!4 = !{!"B", !0}			!4 = !{!"B", !0}

llvm/trunk/test/CodeGen/ARM/2010-05-20-NEONSpillCrash.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon -O0 -optimize-regalloc -regalloc=basic %s -o /dev/null			; RUN: llc -mtriple=arm-eabi -mattr=+neon -O0 -optimize-regalloc -regalloc=basic %s -o /dev/null

	; This test would crash the rewriter when trying to handle a spill after one of			; This test would crash the rewriter when trying to handle a spill after one of
	; the @llvm.arm.neon.vld3.v8i8 defined three parts of a register.			; the @llvm.arm.neon.vld3.v8i8.p0i8 defined three parts of a register.

	%struct.__neon_int8x8x3_t = type { <8 x i8>, <8 x i8>, <8 x i8> }			%struct.__neon_int8x8x3_t = type { <8 x i8>, <8 x i8>, <8 x i8> }

	declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8*, i32) nounwind readonly			declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8*, i32) nounwind readonly

	declare void @llvm.arm.neon.vst3.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32) nounwind

	define <8 x i8> @t3(i8* %A1, i8* %A2, i8* %A3, i8* %A4, i8* %A5, i8* %A6, i8* %A7, i8* %A8, i8* %B) nounwind {			define <8 x i8> @t3(i8* %A1, i8* %A2, i8* %A3, i8* %A4, i8* %A5, i8* %A6, i8* %A7, i8* %A8, i8* %B) nounwind {
	%tmp1b = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A2, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]			%tmp1b = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A2, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]
	%tmp2b = extractvalue %struct.__neon_int8x8x3_t %tmp1b, 0 ; <<8 x i8>> [#uses=1]			%tmp2b = extractvalue %struct.__neon_int8x8x3_t %tmp1b, 0 ; <<8 x i8>> [#uses=1]
	%tmp4b = extractvalue %struct.__neon_int8x8x3_t %tmp1b, 1 ; <<8 x i8>> [#uses=1]			%tmp4b = extractvalue %struct.__neon_int8x8x3_t %tmp1b, 1 ; <<8 x i8>> [#uses=1]
	%tmp1d = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A4, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]			%tmp1d = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A4, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]
	%tmp2d = extractvalue %struct.__neon_int8x8x3_t %tmp1d, 0 ; <<8 x i8>> [#uses=1]			%tmp2d = extractvalue %struct.__neon_int8x8x3_t %tmp1d, 0 ; <<8 x i8>> [#uses=1]
	%tmp4d = extractvalue %struct.__neon_int8x8x3_t %tmp1d, 1 ; <<8 x i8>> [#uses=1]			%tmp4d = extractvalue %struct.__neon_int8x8x3_t %tmp1d, 1 ; <<8 x i8>> [#uses=1]
	%tmp1e = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A5, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=1]			%tmp1e = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A5, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=1]
	%tmp2e = extractvalue %struct.__neon_int8x8x3_t %tmp1e, 0 ; <<8 x i8>> [#uses=1]			%tmp2e = extractvalue %struct.__neon_int8x8x3_t %tmp1e, 0 ; <<8 x i8>> [#uses=1]
	%tmp1f = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A6, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=1]			%tmp1f = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A6, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=1]
	%tmp2f = extractvalue %struct.__neon_int8x8x3_t %tmp1f, 0 ; <<8 x i8>> [#uses=1]			%tmp2f = extractvalue %struct.__neon_int8x8x3_t %tmp1f, 0 ; <<8 x i8>> [#uses=1]
	%tmp1g = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A7, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]			%tmp1g = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A7, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]
	%tmp2g = extractvalue %struct.__neon_int8x8x3_t %tmp1g, 0 ; <<8 x i8>> [#uses=1]			%tmp2g = extractvalue %struct.__neon_int8x8x3_t %tmp1g, 0 ; <<8 x i8>> [#uses=1]
	%tmp4g = extractvalue %struct.__neon_int8x8x3_t %tmp1g, 1 ; <<8 x i8>> [#uses=1]			%tmp4g = extractvalue %struct.__neon_int8x8x3_t %tmp1g, 1 ; <<8 x i8>> [#uses=1]
	%tmp1h = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A8, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]			%tmp1h = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A8, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]
	%tmp2h = extractvalue %struct.__neon_int8x8x3_t %tmp1h, 0 ; <<8 x i8>> [#uses=1]			%tmp2h = extractvalue %struct.__neon_int8x8x3_t %tmp1h, 0 ; <<8 x i8>> [#uses=1]
	%tmp3h = extractvalue %struct.__neon_int8x8x3_t %tmp1h, 2 ; <<8 x i8>> [#uses=1]			%tmp3h = extractvalue %struct.__neon_int8x8x3_t %tmp1h, 2 ; <<8 x i8>> [#uses=1]
	%tmp2bd = add <8 x i8> %tmp2b, %tmp2d ; <<8 x i8>> [#uses=1]			%tmp2bd = add <8 x i8> %tmp2b, %tmp2d ; <<8 x i8>> [#uses=1]
	%tmp4bd = add <8 x i8> %tmp4b, %tmp4d ; <<8 x i8>> [#uses=1]			%tmp4bd = add <8 x i8> %tmp4b, %tmp4d ; <<8 x i8>> [#uses=1]
	%tmp2abcd = mul <8 x i8> undef, %tmp2bd ; <<8 x i8>> [#uses=1]			%tmp2abcd = mul <8 x i8> undef, %tmp2bd ; <<8 x i8>> [#uses=1]
	%tmp4abcd = mul <8 x i8> undef, %tmp4bd ; <<8 x i8>> [#uses=2]			%tmp4abcd = mul <8 x i8> undef, %tmp4bd ; <<8 x i8>> [#uses=2]
	call void @llvm.arm.neon.vst3.v8i8(i8* %A1, <8 x i8> %tmp4abcd, <8 x i8> zeroinitializer, <8 x i8> %tmp2abcd, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v8i8(i8* %A1, <8 x i8> %tmp4abcd, <8 x i8> zeroinitializer, <8 x i8> %tmp2abcd, i32 1)
	%tmp2ef = sub <8 x i8> %tmp2e, %tmp2f ; <<8 x i8>> [#uses=1]			%tmp2ef = sub <8 x i8> %tmp2e, %tmp2f ; <<8 x i8>> [#uses=1]
	%tmp2gh = sub <8 x i8> %tmp2g, %tmp2h ; <<8 x i8>> [#uses=1]			%tmp2gh = sub <8 x i8> %tmp2g, %tmp2h ; <<8 x i8>> [#uses=1]
	%tmp3gh = sub <8 x i8> zeroinitializer, %tmp3h ; <<8 x i8>> [#uses=1]			%tmp3gh = sub <8 x i8> zeroinitializer, %tmp3h ; <<8 x i8>> [#uses=1]
	%tmp4ef = sub <8 x i8> zeroinitializer, %tmp4g ; <<8 x i8>> [#uses=1]			%tmp4ef = sub <8 x i8> zeroinitializer, %tmp4g ; <<8 x i8>> [#uses=1]
	%tmp2efgh = mul <8 x i8> %tmp2ef, %tmp2gh ; <<8 x i8>> [#uses=1]			%tmp2efgh = mul <8 x i8> %tmp2ef, %tmp2gh ; <<8 x i8>> [#uses=1]
	%tmp3efgh = mul <8 x i8> undef, %tmp3gh ; <<8 x i8>> [#uses=1]			%tmp3efgh = mul <8 x i8> undef, %tmp3gh ; <<8 x i8>> [#uses=1]
	%tmp4efgh = mul <8 x i8> %tmp4ef, undef ; <<8 x i8>> [#uses=2]			%tmp4efgh = mul <8 x i8> %tmp4ef, undef ; <<8 x i8>> [#uses=2]
	call void @llvm.arm.neon.vst3.v8i8(i8* %A2, <8 x i8> %tmp4efgh, <8 x i8> %tmp3efgh, <8 x i8> %tmp2efgh, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v8i8(i8* %A2, <8 x i8> %tmp4efgh, <8 x i8> %tmp3efgh, <8 x i8> %tmp2efgh, i32 1)
	%tmp4 = sub <8 x i8> %tmp4efgh, %tmp4abcd ; <<8 x i8>> [#uses=1]			%tmp4 = sub <8 x i8> %tmp4efgh, %tmp4abcd ; <<8 x i8>> [#uses=1]
	tail call void @llvm.arm.neon.vst3.v8i8(i8* %B, <8 x i8> zeroinitializer, <8 x i8> undef, <8 x i8> undef, i32 1)			tail call void @llvm.arm.neon.vst3.p0i8.v8i8(i8* %B, <8 x i8> zeroinitializer, <8 x i8> undef, <8 x i8> undef, i32 1)
	ret <8 x i8> %tmp4			ret <8 x i8> %tmp4
	}			}

llvm/trunk/test/CodeGen/ARM/2010-05-21-BuildVector.ll

Show All 30 Lines	;CHECK: vldr s
%16 = ashr i32 %15, 30		%16 = ashr i32 %15, 30
%.sum14 = add i32 %16, 4		%.sum14 = add i32 %16, 4
%17 = getelementptr inbounds float, float* %table, i32 %.sum14		%17 = getelementptr inbounds float, float* %table, i32 %.sum14
;CHECK: vldr s		;CHECK: vldr s
%18 = load float, float* %17, align 4		%18 = load float, float* %17, align 4
%tmp5 = insertelement <4 x float> %tmp7, float %18, i32 3		%tmp5 = insertelement <4 x float> %tmp7, float %18, i32 3
%19 = fmul <4 x float> %tmp5, %2		%19 = fmul <4 x float> %tmp5, %2
%20 = bitcast float* %fltp to i8*		%20 = bitcast float* %fltp to i8*
tail call void @llvm.arm.neon.vst1.v4f32(i8* %20, <4 x float> %19, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v4f32(i8* %20, <4 x float> %19, i32 1)
ret void		ret void
}		}

declare void @llvm.arm.neon.vst1.v4f32(i8*, <4 x float>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v4f32(i8*, <4 x float>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/2010-06-11-vmovdrr-bitcast.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o /dev/null			; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o /dev/null
	; rdar://8084742			; rdar://8084742

	%struct.__int8x8x2_t = type { [2 x <8 x i8>] }			%struct.__int8x8x2_t = type { [2 x <8 x i8>] }

	define void @foo(%struct.__int8x8x2_t* nocapture %a, i8* %b) nounwind {			define void @foo(%struct.__int8x8x2_t* nocapture %a, i8* %b) nounwind {
	entry:			entry:
	%0 = bitcast %struct.__int8x8x2_t* %a to i128* ; <i128*> [#uses=1]			%0 = bitcast %struct.__int8x8x2_t* %a to i128* ; <i128*> [#uses=1]
	%srcval = load i128, i128* %0, align 8 ; <i128> [#uses=2]			%srcval = load i128, i128* %0, align 8 ; <i128> [#uses=2]
	%tmp6 = trunc i128 %srcval to i64 ; <i64> [#uses=1]			%tmp6 = trunc i128 %srcval to i64 ; <i64> [#uses=1]
	%tmp8 = lshr i128 %srcval, 64 ; <i128> [#uses=1]			%tmp8 = lshr i128 %srcval, 64 ; <i128> [#uses=1]
	%tmp9 = trunc i128 %tmp8 to i64 ; <i64> [#uses=1]			%tmp9 = trunc i128 %tmp8 to i64 ; <i64> [#uses=1]
	%tmp16.i = bitcast i64 %tmp6 to <8 x i8> ; <<8 x i8>> [#uses=1]			%tmp16.i = bitcast i64 %tmp6 to <8 x i8> ; <<8 x i8>> [#uses=1]
	%tmp20.i = bitcast i64 %tmp9 to <8 x i8> ; <<8 x i8>> [#uses=1]			%tmp20.i = bitcast i64 %tmp9 to <8 x i8> ; <<8 x i8>> [#uses=1]
	tail call void @llvm.arm.neon.vst2.v8i8(i8* %b, <8 x i8> %tmp16.i, <8 x i8> %tmp20.i, i32 1) nounwind			tail call void @llvm.arm.neon.vst2.p0i8.v8i8(i8* %b, <8 x i8> %tmp16.i, <8 x i8> %tmp20.i, i32 1) nounwind
	ret void			ret void
	}			}

	declare void @llvm.arm.neon.vst2.v8i8(i8*, <8 x i8>, <8 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v8i8(i8*, <8 x i8>, <8 x i8>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/2010-06-29-PartialRedefFastAlloc.ll

	Show All 10 Lines
	; redef, it cannot also get %Q0.			; redef, it cannot also get %Q0.

	; CHECK: vld1.64 {d16, d17}, [r{{.}}]			; CHECK: vld1.64 {d16, d17}, [r{{.}}]
	; CHECK-NOT: vld1.64 {d16, d17}			; CHECK-NOT: vld1.64 {d16, d17}
	; CHECK: vmov.f64			; CHECK: vmov.f64

	define i32 @test(i8* %arg) nounwind {			define i32 @test(i8* %arg) nounwind {
	entry:			entry:
	%0 = call <2 x i64> @llvm.arm.neon.vld1.v2i64(i8* %arg, i32 1)			%0 = call <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8* %arg, i32 1)
	%1 = shufflevector <2 x i64> undef, <2 x i64> %0, <2 x i32> <i32 1, i32 2>			%1 = shufflevector <2 x i64> undef, <2 x i64> %0, <2 x i32> <i32 1, i32 2>
	store <2 x i64> %1, <2 x i64>* undef, align 16			store <2 x i64> %1, <2 x i64>* undef, align 16
	ret i32 undef			ret i32 undef
	}			}

	declare <2 x i64> @llvm.arm.neon.vld1.v2i64(i8*, i32) nounwind readonly			declare <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8*, i32) nounwind readonly

llvm/trunk/test/CodeGen/ARM/2011-08-12-vmovqqqq-pseudo.ll

	; RUN: llc %s -mtriple=thumbv7-apple-darwin -verify-machineinstrs -mcpu=cortex-a9 -O0 -o -			; RUN: llc %s -mtriple=thumbv7-apple-darwin -verify-machineinstrs -mcpu=cortex-a9 -O0 -o -
	; Make sure that the VMOVQQQQ pseudo instruction is handled properly			; Make sure that the VMOVQQQQ pseudo instruction is handled properly
	; by codegen.			; by codegen.

	define void @test_vmovqqqq_pseudo() nounwind ssp {			define void @test_vmovqqqq_pseudo() nounwind ssp {
	entry:			entry:
	%vld3_lane = call { <8 x i16>, <8 x i16>, <8 x i16> } @llvm.arm.neon.vld3lane.v8i16(i8* undef, <8 x i16> undef, <8 x i16> undef, <8 x i16> zeroinitializer, i32 7, i32 2)			%vld3_lane = call { <8 x i16>, <8 x i16>, <8 x i16> } @llvm.arm.neon.vld3lane.v8i16.p0i8(i8* undef, <8 x i16> undef, <8 x i16> undef, <8 x i16> zeroinitializer, i32 7, i32 2)
	store { <8 x i16>, <8 x i16>, <8 x i16> } %vld3_lane, { <8 x i16>, <8 x i16>, <8 x i16> }* undef			store { <8 x i16>, <8 x i16>, <8 x i16> } %vld3_lane, { <8 x i16>, <8 x i16>, <8 x i16> }* undef
	ret void			ret void
	}			}

	declare { <8 x i16>, <8 x i16>, <8 x i16> } @llvm.arm.neon.vld3lane.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly			declare { <8 x i16>, <8 x i16>, <8 x i16> } @llvm.arm.neon.vld3lane.v8i16.p0i8(i8*, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly

llvm/trunk/test/CodeGen/ARM/2012-01-24-RegSequenceLiveRange.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	cond.end295: ; preds = %entry
%shuffle.i38.i.i1036 = shufflevector <2 x i64> zeroinitializer, <2 x i64> undef, <1 x i32> zeroinitializer		%shuffle.i38.i.i1036 = shufflevector <2 x i64> zeroinitializer, <2 x i64> undef, <1 x i32> zeroinitializer
%shuffle.i37.i.i1037 = shufflevector <1 x i64> %shuffle.i39.i.i1035, <1 x i64> %shuffle.i38.i.i1036, <2 x i32> <i32 0, i32 1>		%shuffle.i37.i.i1037 = shufflevector <1 x i64> %shuffle.i39.i.i1035, <1 x i64> %shuffle.i38.i.i1036, <2 x i32> <i32 0, i32 1>
%0 = bitcast <2 x i64> %shuffle.i37.i.i1037 to <4 x float>		%0 = bitcast <2 x i64> %shuffle.i37.i.i1037 to <4 x float>
%1 = bitcast <4 x float> undef to <2 x i64>		%1 = bitcast <4 x float> undef to <2 x i64>
%shuffle.i36.i.i = shufflevector <2 x i64> %1, <2 x i64> undef, <1 x i32> zeroinitializer		%shuffle.i36.i.i = shufflevector <2 x i64> %1, <2 x i64> undef, <1 x i32> zeroinitializer
%shuffle.i35.i.i = shufflevector <2 x i64> undef, <2 x i64> undef, <1 x i32> zeroinitializer		%shuffle.i35.i.i = shufflevector <2 x i64> undef, <2 x i64> undef, <1 x i32> zeroinitializer
%shuffle.i34.i.i = shufflevector <1 x i64> %shuffle.i36.i.i, <1 x i64> %shuffle.i35.i.i, <2 x i32> <i32 0, i32 1>		%shuffle.i34.i.i = shufflevector <1 x i64> %shuffle.i36.i.i, <1 x i64> %shuffle.i35.i.i, <2 x i32> <i32 0, i32 1>
%2 = bitcast <2 x i64> %shuffle.i34.i.i to <4 x float>		%2 = bitcast <2 x i64> %shuffle.i34.i.i to <4 x float>
tail call void @llvm.arm.neon.vst1.v4f32(i8* undef, <4 x float> %0, i32 4) nounwind		tail call void @llvm.arm.neon.vst1.p0i8.v4f32(i8* undef, <4 x float> %0, i32 4) nounwind
tail call void @llvm.arm.neon.vst1.v4f32(i8* undef, <4 x float> %2, i32 4) nounwind		tail call void @llvm.arm.neon.vst1.p0i8.v4f32(i8* undef, <4 x float> %2, i32 4) nounwind
unreachable		unreachable

for.end: ; preds = %entry		for.end: ; preds = %entry
ret void		ret void
}		}

; Check that pseudo-expansion preserves <undef> flags.		; Check that pseudo-expansion preserves <undef> flags.
define void @foo3(i8* %p) nounwind ssp {		define void @foo3(i8* %p) nounwind ssp {
entry:		entry:
tail call void @llvm.arm.neon.vst2.v4f32(i8* %p, <4 x float> undef, <4 x float> undef, i32 4)		tail call void @llvm.arm.neon.vst2.p0i8.v4f32(i8* %p, <4 x float> undef, <4 x float> undef, i32 4)
ret void		ret void
}		}

declare arm_aapcs_vfpcc void @bar(i8*, float, float, float)		declare arm_aapcs_vfpcc void @bar(i8*, float, float, float)
declare void @llvm.arm.neon.vst1.v4f32(i8*, <4 x float>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v4f32(i8*, <4 x float>, i32) nounwind
declare void @llvm.arm.neon.vst2.v4f32(i8*, <4 x float>, <4 x float>, i32) nounwind		declare void @llvm.arm.neon.vst2.p0i8.v4f32(i8*, <4 x float>, <4 x float>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/2012-05-10-PreferVMOVtoVDUP32.ll

	; RUN: llc -mtriple=arm-eabi -mcpu=swift %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mcpu=swift %s -o - \| FileCheck %s
	; <rdar://problem/10451892>			; <rdar://problem/10451892>

	define void @f(i32 %x, i32* %p) nounwind ssp {			define void @f(i32 %x, i32* %p) nounwind ssp {
	entry:			entry:
	; CHECK-NOT: vdup.32			; CHECK-NOT: vdup.32
	%vecinit.i = insertelement <2 x i32> undef, i32 %x, i32 0			%vecinit.i = insertelement <2 x i32> undef, i32 %x, i32 0
	%vecinit1.i = insertelement <2 x i32> %vecinit.i, i32 %x, i32 1			%vecinit1.i = insertelement <2 x i32> %vecinit.i, i32 %x, i32 1
	%0 = bitcast i32* %p to i8*			%0 = bitcast i32* %p to i8*
	tail call void @llvm.arm.neon.vst1.v2i32(i8* %0, <2 x i32> %vecinit1.i, i32 4)			tail call void @llvm.arm.neon.vst1.p0i8.v2i32(i8* %0, <2 x i32> %vecinit1.i, i32 4)
	ret void			ret void
	}			}

	declare void @llvm.arm.neon.vst1.v2i32(i8*, <2 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v2i32(i8*, <2 x i32>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/2012-08-27-CopyPhysRegCrash.ll

; RUN: llc < %s -mcpu=cortex-a8 -march=thumb		; RUN: llc < %s -mcpu=cortex-a8 -march=thumb
; Test that this doesn't crash.		; Test that this doesn't crash.
; <rdar://problem/12183003>		; <rdar://problem/12183003>

target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"		target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"
target triple = "thumbv7-apple-ios5.1.0"		target triple = "thumbv7-apple-ios5.1.0"

declare { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8(i8*, i32) nounwind readonly		declare { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8.p0i8(i8*, i32) nounwind readonly

declare void @llvm.arm.neon.vst1.v16i8(i8*, <16 x i8>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v16i8(i8*, <16 x i8>, i32) nounwind

define void @findEdges(i8*) nounwind ssp {		define void @findEdges(i8*) nounwind ssp {
%2 = icmp sgt i32 undef, 0		%2 = icmp sgt i32 undef, 0
br i1 %2, label %5, label %3		br i1 %2, label %5, label %3

; <label>:3 ; preds = %5, %1		; <label>:3 ; preds = %5, %1
%4 = phi i8* [ %0, %1 ], [ %19, %5 ]		%4 = phi i8* [ %0, %1 ], [ %19, %5 ]
ret void		ret void

; <label>:5 ; preds = %5, %1		; <label>:5 ; preds = %5, %1
%6 = phi i8* [ %19, %5 ], [ %0, %1 ]		%6 = phi i8* [ %19, %5 ], [ %0, %1 ]
%7 = tail call { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8(i8* null, i32 1)		%7 = tail call { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8.p0i8(i8* null, i32 1)
%8 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %7, 0		%8 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %7, 0
%9 = getelementptr inbounds i8, i8* null, i32 3		%9 = getelementptr inbounds i8, i8* null, i32 3
%10 = tail call { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8(i8* %9, i32 1)		%10 = tail call { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8.p0i8(i8* %9, i32 1)
%11 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %10, 2		%11 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %10, 2
%12 = tail call { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8(i8* %6, i32 1)		%12 = tail call { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8.p0i8(i8* %6, i32 1)
%13 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %12, 0		%13 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %12, 0
%14 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %12, 1		%14 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %12, 1
%15 = getelementptr inbounds i8, i8* %6, i32 3		%15 = getelementptr inbounds i8, i8* %6, i32 3
%16 = tail call { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8(i8* %15, i32 1)		%16 = tail call { <16 x i8>, <16 x i8>, <16 x i8> } @llvm.arm.neon.vld3.v16i8.p0i8(i8* %15, i32 1)
%17 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %16, 1		%17 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %16, 1
%18 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %16, 2		%18 = extractvalue { <16 x i8>, <16 x i8>, <16 x i8> } %16, 2
%19 = getelementptr inbounds i8, i8* %6, i32 48		%19 = getelementptr inbounds i8, i8* %6, i32 48
%20 = bitcast <16 x i8> %13 to <2 x i64>		%20 = bitcast <16 x i8> %13 to <2 x i64>
%21 = bitcast <16 x i8> %8 to <2 x i64>		%21 = bitcast <16 x i8> %8 to <2 x i64>
%22 = bitcast <16 x i8> %14 to <2 x i64>		%22 = bitcast <16 x i8> %14 to <2 x i64>
%23 = shufflevector <2 x i64> %22, <2 x i64> undef, <1 x i32> zeroinitializer		%23 = shufflevector <2 x i64> %22, <2 x i64> undef, <1 x i32> zeroinitializer
%24 = bitcast <1 x i64> %23 to <8 x i8>		%24 = bitcast <1 x i64> %23 to <8 x i8>
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	; <label>:5 ; preds = %5, %1
%91 = bitcast <4 x i16> %90 to <1 x i64>		%91 = bitcast <4 x i16> %90 to <1 x i64>
%92 = shufflevector <1 x i64> undef, <1 x i64> %91, <2 x i32> <i32 0, i32 1>		%92 = shufflevector <1 x i64> undef, <1 x i64> %91, <2 x i32> <i32 0, i32 1>
%93 = bitcast <2 x i64> %92 to <8 x i16>		%93 = bitcast <2 x i64> %92 to <8 x i16>
%94 = tail call <8 x i8> @llvm.arm.neon.vshiftn.v8i8(<8 x i16> %93, <8 x i16> <i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8>)		%94 = tail call <8 x i8> @llvm.arm.neon.vshiftn.v8i8(<8 x i16> %93, <8 x i16> <i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8, i16 -8>)
%95 = bitcast <8 x i8> %56 to <1 x i64>		%95 = bitcast <8 x i8> %56 to <1 x i64>
%96 = bitcast <8 x i8> %94 to <1 x i64>		%96 = bitcast <8 x i8> %94 to <1 x i64>
%97 = shufflevector <1 x i64> %95, <1 x i64> %96, <2 x i32> <i32 0, i32 1>		%97 = shufflevector <1 x i64> %95, <1 x i64> %96, <2 x i32> <i32 0, i32 1>
%98 = bitcast <2 x i64> %97 to <16 x i8>		%98 = bitcast <2 x i64> %97 to <16 x i8>
tail call void @llvm.arm.neon.vst1.v16i8(i8* null, <16 x i8> %98, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v16i8(i8* null, <16 x i8> %98, i32 1)
%99 = icmp slt i32 undef, undef		%99 = icmp slt i32 undef, undef
br i1 %99, label %5, label %3		br i1 %99, label %5, label %3
}		}

declare <4 x i16> @llvm.arm.neon.vqshiftnu.v4i16(<4 x i32>, <4 x i32>) nounwind readnone		declare <4 x i16> @llvm.arm.neon.vqshiftnu.v4i16(<4 x i32>, <4 x i32>) nounwind readnone

declare <8 x i8> @llvm.arm.neon.vshiftn.v8i8(<8 x i16>, <8 x i16>) nounwind readnone		declare <8 x i8> @llvm.arm.neon.vshiftn.v8i8(<8 x i16>, <8 x i16>) nounwind readnone

declare <4 x i16> @llvm.arm.neon.vqrshiftnu.v4i16(<4 x i32>, <4 x i32>) nounwind readnone		declare <4 x i16> @llvm.arm.neon.vqrshiftnu.v4i16(<4 x i32>, <4 x i32>) nounwind readnone

declare <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16>, <4 x i16>) nounwind readnone		declare <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16>, <4 x i16>) nounwind readnone

declare <8 x i16> @llvm.arm.neon.vmaxu.v8i16(<8 x i16>, <8 x i16>) nounwind readnone		declare <8 x i16> @llvm.arm.neon.vmaxu.v8i16(<8 x i16>, <8 x i16>) nounwind readnone

declare <8 x i16> @llvm.arm.neon.vabs.v8i16(<8 x i16>) nounwind readnone		declare <8 x i16> @llvm.arm.neon.vabs.v8i16(<8 x i16>) nounwind readnone

llvm/trunk/test/CodeGen/ARM/2013-10-11-select-stalls.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc < %s -mtriple=thumbv7-apple-ios -disable-ifcvt-diamond -stats 2>&1 \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-apple-ios -disable-ifcvt-diamond -stats 2>&1 \| FileCheck %s
	; Evaluate the two vld1.8 instructions in separate MBB's,			; Evaluate the two vld1.8 instructions in separate MBB's,
	; instead of stalling on one and conditionally overwriting its result.			; instead of stalling on one and conditionally overwriting its result.
	;			;
	; Update: After if-conversion the two vld1.8 instructions are in the same MBB			; Update: After if-conversion the two vld1.8 instructions are in the same MBB
	; again. So we disable this if-conversion to eliminate its influence to this			; again. So we disable this if-conversion to eliminate its influence to this
	; test.			; test.

	; CHECK-NOT: Number of pipeline stalls			; CHECK-NOT: Number of pipeline stalls
	define <16 x i8> @multiselect(i32 %avail, i8* %foo, i8* %bar) {			define <16 x i8> @multiselect(i32 %avail, i8* %foo, i8* %bar) {
	entry:			entry:
	%vld1 = call <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* %foo, i32 1)			%vld1 = call <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* %foo, i32 1)
	%vld2 = call <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* %bar, i32 1)			%vld2 = call <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* %bar, i32 1)
	%and = and i32 %avail, 3			%and = and i32 %avail, 3
	%tobool = icmp eq i32 %and, 0			%tobool = icmp eq i32 %and, 0
	%retv = select i1 %tobool, <16 x i8> %vld1, <16 x i8> %vld2			%retv = select i1 %tobool, <16 x i8> %vld1, <16 x i8> %vld2
	ret <16 x i8> %retv			ret <16 x i8> %retv
	}			}

	declare <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* , i32 )			declare <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* , i32 )

llvm/trunk/test/CodeGen/ARM/2014-01-09-pseudo_expand_implicit_reg.ll

Show All 21 Lines	; CHECK: VST1d64Q %R{{[0-9]+}}<kill>, 8, %D{{[0-9]+}}, pred:14, pred:%noreg, %Q{{[0-9]+}}_Q{{[0-9]+}}<imp-use,kill>
%s3 = bitcast <8 x i8> %t3 to <1 x i64>		%s3 = bitcast <8 x i8> %t3 to <1 x i64>

%tmp0 = bitcast <1 x i64> %s2 to i64		%tmp0 = bitcast <1 x i64> %s2 to i64
%tmp1 = bitcast <1 x i64> %s3 to i64		%tmp1 = bitcast <1 x i64> %s3 to i64

%n0 = insertelement <2 x i64> undef, i64 %tmp0, i32 0		%n0 = insertelement <2 x i64> undef, i64 %tmp0, i32 0
%n1 = insertelement <2 x i64> %n0, i64 %tmp1, i32 1		%n1 = insertelement <2 x i64> %n0, i64 %tmp1, i32 1

call void @llvm.arm.neon.vst4.v1i64(i8* %m, <1 x i64> %s0, <1 x i64> %s1, <1 x i64> %s2, <1 x i64> %s3, i32 8)		call void @llvm.arm.neon.vst4.p0i8.v1i64(i8* %m, <1 x i64> %s0, <1 x i64> %s1, <1 x i64> %s2, <1 x i64> %s3, i32 8)

call void @bar(<2 x i64> %n1)		call void @bar(<2 x i64> %n1)

ret void		ret void
}		}

%struct.__neon_int8x8x4_t = type { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> }		%struct.__neon_int8x8x4_t = type { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> }
define <8 x i8> @vtbx4(<8 x i8>* %A, %struct.__neon_int8x8x4_t* %B, <8 x i8>* %C) nounwind {		define <8 x i8> @vtbx4(<8 x i8>* %A, %struct.__neon_int8x8x4_t* %B, <8 x i8>* %C) nounwind {
; CHECK: vtbx4:		; CHECK: vtbx4:
; CHECK: VTBX4 {{.*}}, pred:14, pred:%noreg, %Q{{[0-9]+}}_Q{{[0-9]+}}<imp-use>		; CHECK: VTBX4 {{.*}}, pred:14, pred:%noreg, %Q{{[0-9]+}}_Q{{[0-9]+}}<imp-use>
%tmp1 = load <8 x i8>, <8 x i8>* %A		%tmp1 = load <8 x i8>, <8 x i8>* %A
%tmp2 = load %struct.__neon_int8x8x4_t, %struct.__neon_int8x8x4_t* %B		%tmp2 = load %struct.__neon_int8x8x4_t, %struct.__neon_int8x8x4_t* %B
%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 0		%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 0
%tmp4 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 1		%tmp4 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 1
%tmp5 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 2		%tmp5 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 2
%tmp6 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 3		%tmp6 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 3
%tmp7 = load <8 x i8>, <8 x i8>* %C		%tmp7 = load <8 x i8>, <8 x i8>* %C
%tmp8 = call <8 x i8> @llvm.arm.neon.vtbx4(<8 x i8> %tmp1, <8 x i8> %tmp3, <8 x i8> %tmp4, <8 x i8> %tmp5, <8 x i8> %tmp6, <8 x i8> %tmp7)		%tmp8 = call <8 x i8> @llvm.arm.neon.vtbx4(<8 x i8> %tmp1, <8 x i8> %tmp3, <8 x i8> %tmp4, <8 x i8> %tmp5, <8 x i8> %tmp6, <8 x i8> %tmp7)
call void @bar2(%struct.__neon_int8x8x4_t %tmp2, <8 x i8> %tmp8)		call void @bar2(%struct.__neon_int8x8x4_t %tmp2, <8 x i8> %tmp8)
ret <8 x i8> %tmp8		ret <8 x i8> %tmp8
}		}

declare void @llvm.arm.neon.vst4.v1i64(i8*, <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64>, i32)		declare void @llvm.arm.neon.vst4.p0i8.v1i64(i8*, <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64>, i32)
declare <8 x i8> @llvm.arm.neon.vtbx4(<8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>) nounwind readnone		declare <8 x i8> @llvm.arm.neon.vtbx4(<8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>) nounwind readnone
declare void @bar2(%struct.__neon_int8x8x4_t, <8 x i8>)		declare void @bar2(%struct.__neon_int8x8x4_t, <8 x i8>)
declare void @bar(<2 x i64> %arg)		declare void @bar(<2 x i64> %arg)

llvm/trunk/test/CodeGen/ARM/arm-interleaved-accesses.ll

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	define void @store_undef_mask_factor4(i32* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3) {			define void @store_undef_mask_factor4(i32* %ptr, <4 x i32> %v0, <4 x i32> %v1, <4 x i32> %v2, <4 x i32> %v3) {
	%base = bitcast i32* %ptr to <16 x i32>*			%base = bitcast i32* %ptr to <16 x i32>*
	%v0_v1 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%v0_v1 = shufflevector <4 x i32> %v0, <4 x i32> %v1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%v2_v3 = shufflevector <4 x i32> %v2, <4 x i32> %v3, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%v2_v3 = shufflevector <4 x i32> %v2, <4 x i32> %v3, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%interleaved.vec = shufflevector <8 x i32> %v0_v1, <8 x i32> %v2_v3, <16 x i32> <i32 0, i32 4, i32 8, i32 undef, i32 undef, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>			%interleaved.vec = shufflevector <8 x i32> %v0_v1, <8 x i32> %v2_v3, <16 x i32> <i32 0, i32 4, i32 8, i32 undef, i32 undef, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
	store <16 x i32> %interleaved.vec, <16 x i32>* %base, align 4			store <16 x i32> %interleaved.vec, <16 x i32>* %base, align 4
	ret void			ret void
	}			}

				; The following test cases check that address spaces are properly handled

				; CHECK-LABEL: load_address_space
				; CHECK: vld3.32
				define void @load_address_space(<4 x i32> addrspace(1)* %A, <2 x i32>* %B) {
				%tmp = load <4 x i32>, <4 x i32> addrspace(1)* %A
				%interleaved = shufflevector <4 x i32> %tmp, <4 x i32> undef, <2 x i32> <i32 0, i32 3>
				store <2 x i32> %interleaved, <2 x i32>* %B
				ret void
				}

				; CHECK-LABEL: store_address_space
				; CHECK: vst2.32
				define void @store_address_space(<2 x i32>* %A, <2 x i32>* %B, <4 x i32> addrspace(1)* %C) {
				%tmp0 = load <2 x i32>, <2 x i32>* %A
				%tmp1 = load <2 x i32>, <2 x i32>* %B
				%interleaved = shufflevector <2 x i32> %tmp0, <2 x i32> %tmp1, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
				store <4 x i32> %interleaved, <4 x i32> addrspace(1)* %C
				ret void
				}

llvm/trunk/test/CodeGen/ARM/coalesce-subregs.ll

; RUN: llc < %s -mcpu=cortex-a9 -verify-coalescing -verify-machineinstrs \| FileCheck %s		; RUN: llc < %s -mcpu=cortex-a9 -verify-coalescing -verify-machineinstrs \| FileCheck %s
target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"		target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32"
target triple = "thumbv7-apple-ios0.0.0"		target triple = "thumbv7-apple-ios0.0.0"

; CHECK: f		; CHECK: f
; The vld2 and vst2 are not aligned wrt each other, the second Q loaded is the		; The vld2 and vst2 are not aligned wrt each other, the second Q loaded is the
; first one stored.		; first one stored.
; The coalescer must find a super-register larger than QQ to eliminate the copy		; The coalescer must find a super-register larger than QQ to eliminate the copy
; setting up the vst2 data.		; setting up the vst2 data.
; CHECK: vld2		; CHECK: vld2
; CHECK-NOT: vorr		; CHECK-NOT: vorr
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK: vst2		; CHECK: vst2
define void @f(float* %p, i32 %c) nounwind ssp {		define void @f(float* %p, i32 %c) nounwind ssp {
entry:		entry:
%0 = bitcast float* %p to i8*		%0 = bitcast float* %p to i8*
%vld2 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32(i8* %0, i32 4)		%vld2 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32.p0i8(i8* %0, i32 4)
%vld221 = extractvalue { <4 x float>, <4 x float> } %vld2, 1		%vld221 = extractvalue { <4 x float>, <4 x float> } %vld2, 1
%add.ptr = getelementptr inbounds float, float* %p, i32 8		%add.ptr = getelementptr inbounds float, float* %p, i32 8
%1 = bitcast float* %add.ptr to i8*		%1 = bitcast float* %add.ptr to i8*
tail call void @llvm.arm.neon.vst2.v4f32(i8* %1, <4 x float> %vld221, <4 x float> undef, i32 4)		tail call void @llvm.arm.neon.vst2.p0i8.v4f32(i8* %1, <4 x float> %vld221, <4 x float> undef, i32 4)
ret void		ret void
}		}

; CHECK: f1		; CHECK: f1
; FIXME: This function still has copies.		; FIXME: This function still has copies.
define void @f1(float* %p, i32 %c) nounwind ssp {		define void @f1(float* %p, i32 %c) nounwind ssp {
entry:		entry:
%0 = bitcast float* %p to i8*		%0 = bitcast float* %p to i8*
%vld2 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32(i8* %0, i32 4)		%vld2 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32.p0i8(i8* %0, i32 4)
%vld221 = extractvalue { <4 x float>, <4 x float> } %vld2, 1		%vld221 = extractvalue { <4 x float>, <4 x float> } %vld2, 1
%add.ptr = getelementptr inbounds float, float* %p, i32 8		%add.ptr = getelementptr inbounds float, float* %p, i32 8
%1 = bitcast float* %add.ptr to i8*		%1 = bitcast float* %add.ptr to i8*
%vld22 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32(i8* %1, i32 4)		%vld22 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32.p0i8(i8* %1, i32 4)
%vld2215 = extractvalue { <4 x float>, <4 x float> } %vld22, 0		%vld2215 = extractvalue { <4 x float>, <4 x float> } %vld22, 0
tail call void @llvm.arm.neon.vst2.v4f32(i8* %1, <4 x float> %vld221, <4 x float> %vld2215, i32 4)		tail call void @llvm.arm.neon.vst2.p0i8.v4f32(i8* %1, <4 x float> %vld221, <4 x float> %vld2215, i32 4)
ret void		ret void
}		}

; CHECK: f2		; CHECK: f2
; FIXME: This function still has copies.		; FIXME: This function still has copies.
define void @f2(float* %p, i32 %c) nounwind ssp {		define void @f2(float* %p, i32 %c) nounwind ssp {
entry:		entry:
%0 = bitcast float* %p to i8*		%0 = bitcast float* %p to i8*
%vld2 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32(i8* %0, i32 4)		%vld2 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32.p0i8(i8* %0, i32 4)
%vld224 = extractvalue { <4 x float>, <4 x float> } %vld2, 1		%vld224 = extractvalue { <4 x float>, <4 x float> } %vld2, 1
br label %do.body		br label %do.body

do.body: ; preds = %do.body, %entry		do.body: ; preds = %do.body, %entry
%qq0.0.1.0 = phi <4 x float> [ %vld224, %entry ], [ %vld2216, %do.body ]		%qq0.0.1.0 = phi <4 x float> [ %vld224, %entry ], [ %vld2216, %do.body ]
%c.addr.0 = phi i32 [ %c, %entry ], [ %dec, %do.body ]		%c.addr.0 = phi i32 [ %c, %entry ], [ %dec, %do.body ]
%p.addr.0 = phi float* [ %p, %entry ], [ %add.ptr, %do.body ]		%p.addr.0 = phi float* [ %p, %entry ], [ %add.ptr, %do.body ]
%add.ptr = getelementptr inbounds float, float* %p.addr.0, i32 8		%add.ptr = getelementptr inbounds float, float* %p.addr.0, i32 8
%1 = bitcast float* %add.ptr to i8*		%1 = bitcast float* %add.ptr to i8*
%vld22 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32(i8* %1, i32 4)		%vld22 = tail call { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32.p0i8(i8* %1, i32 4)
%vld2215 = extractvalue { <4 x float>, <4 x float> } %vld22, 0		%vld2215 = extractvalue { <4 x float>, <4 x float> } %vld22, 0
%vld2216 = extractvalue { <4 x float>, <4 x float> } %vld22, 1		%vld2216 = extractvalue { <4 x float>, <4 x float> } %vld22, 1
tail call void @llvm.arm.neon.vst2.v4f32(i8* %1, <4 x float> %qq0.0.1.0, <4 x float> %vld2215, i32 4)		tail call void @llvm.arm.neon.vst2.p0i8.v4f32(i8* %1, <4 x float> %qq0.0.1.0, <4 x float> %vld2215, i32 4)
%dec = add nsw i32 %c.addr.0, -1		%dec = add nsw i32 %c.addr.0, -1
%tobool = icmp eq i32 %dec, 0		%tobool = icmp eq i32 %dec, 0
br i1 %tobool, label %do.end, label %do.body		br i1 %tobool, label %do.end, label %do.body

do.end: ; preds = %do.body		do.end: ; preds = %do.body
ret void		ret void
}		}

declare { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32(i8*, i32) nounwind readonly		declare { <4 x float>, <4 x float> } @llvm.arm.neon.vld2.v4f32.p0i8(i8*, i32) nounwind readonly
declare void @llvm.arm.neon.vst2.v4f32(i8*, <4 x float>, <4 x float>, i32) nounwind		declare void @llvm.arm.neon.vst2.p0i8.v4f32(i8*, <4 x float>, <4 x float>, i32) nounwind

; CHECK: f3		; CHECK: f3
; This function has lane insertions that span basic blocks.		; This function has lane insertions that span basic blocks.
; The trivial REG_SEQUENCE lowering can't handle that, but the coalescer can.		; The trivial REG_SEQUENCE lowering can't handle that, but the coalescer can.
;		;
; void f3(float p, float q) {		; void f3(float p, float q) {
; float32x2_t x;		; float32x2_t x;
; x[1] = p[3];		; x[1] = p[3];
Show All 27 Lines	if.else: ; preds = %entry
%3 = load float, float* %arrayidx4, align 4		%3 = load float, float* %arrayidx4, align 4
%vecins5 = insertelement <2 x float> %vecins, float %3, i32 0		%vecins5 = insertelement <2 x float> %vecins, float %3, i32 0
br label %if.end		br label %if.end

if.end: ; preds = %if.else, %if.then		if.end: ; preds = %if.else, %if.then
%x.0 = phi <2 x float> [ %vecins3, %if.then ], [ %vecins5, %if.else ]		%x.0 = phi <2 x float> [ %vecins3, %if.then ], [ %vecins5, %if.else ]
%add.ptr = getelementptr inbounds float, float* %p, i32 4		%add.ptr = getelementptr inbounds float, float* %p, i32 4
%4 = bitcast float* %add.ptr to i8*		%4 = bitcast float* %add.ptr to i8*
tail call void @llvm.arm.neon.vst1.v2f32(i8* %4, <2 x float> %x.0, i32 4)		tail call void @llvm.arm.neon.vst1.p0i8.v2f32(i8* %4, <2 x float> %x.0, i32 4)
ret void		ret void
}		}

declare void @llvm.arm.neon.vst1.v2f32(i8*, <2 x float>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v2f32(i8*, <2 x float>, i32) nounwind
declare <2 x float> @llvm.arm.neon.vld1.v2f32(i8*, i32) nounwind readonly		declare <2 x float> @llvm.arm.neon.vld1.v2f32.p0i8(i8*, i32) nounwind readonly

; CHECK: f4		; CHECK: f4
; This function inserts a lane into a fully defined vector.		; This function inserts a lane into a fully defined vector.
; The destination lane isn't read, so the subregs can coalesce.		; The destination lane isn't read, so the subregs can coalesce.
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK-NOT: vorr		; CHECK-NOT: vorr
define void @f4(float* %p, float* %q) nounwind ssp {		define void @f4(float* %p, float* %q) nounwind ssp {
entry:		entry:
%0 = bitcast float* %p to i8*		%0 = bitcast float* %p to i8*
%vld1 = tail call <2 x float> @llvm.arm.neon.vld1.v2f32(i8* %0, i32 4)		%vld1 = tail call <2 x float> @llvm.arm.neon.vld1.v2f32.p0i8(i8* %0, i32 4)
%tobool = icmp eq float* %q, null		%tobool = icmp eq float* %q, null
br i1 %tobool, label %if.end, label %if.then		br i1 %tobool, label %if.end, label %if.then

if.then: ; preds = %entry		if.then: ; preds = %entry
%1 = load float, float* %q, align 4		%1 = load float, float* %q, align 4
%arrayidx1 = getelementptr inbounds float, float* %q, i32 1		%arrayidx1 = getelementptr inbounds float, float* %q, i32 1
%2 = load float, float* %arrayidx1, align 4		%2 = load float, float* %arrayidx1, align 4
%add = fadd float %1, %2		%add = fadd float %1, %2
%vecins = insertelement <2 x float> %vld1, float %add, i32 1		%vecins = insertelement <2 x float> %vld1, float %add, i32 1
br label %if.end		br label %if.end

if.end: ; preds = %entry, %if.then		if.end: ; preds = %entry, %if.then
%x.0 = phi <2 x float> [ %vecins, %if.then ], [ %vld1, %entry ]		%x.0 = phi <2 x float> [ %vecins, %if.then ], [ %vld1, %entry ]
tail call void @llvm.arm.neon.vst1.v2f32(i8* %0, <2 x float> %x.0, i32 4)		tail call void @llvm.arm.neon.vst1.p0i8.v2f32(i8* %0, <2 x float> %x.0, i32 4)
ret void		ret void
}		}

; CHECK: f5		; CHECK: f5
; Coalesce vector lanes through phis.		; Coalesce vector lanes through phis.
; CHECK: vmov.f32 {{.*}}, #1.0		; CHECK: vmov.f32 {{.*}}, #1.0
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK-NOT: vorr		; CHECK-NOT: vorr
; CHECK: bx		; CHECK: bx
; We may leave the last insertelement in the if.end block.		; We may leave the last insertelement in the if.end block.
; It is inserting the %add value into a dead lane, but %add causes interference		; It is inserting the %add value into a dead lane, but %add causes interference
; in the entry block, and we don't do dead lane checks across basic blocks.		; in the entry block, and we don't do dead lane checks across basic blocks.
define void @f5(float* %p, float* %q) nounwind ssp {		define void @f5(float* %p, float* %q) nounwind ssp {
entry:		entry:
%0 = bitcast float* %p to i8*		%0 = bitcast float* %p to i8*
%vld1 = tail call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %0, i32 4)		%vld1 = tail call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* %0, i32 4)
%vecext = extractelement <4 x float> %vld1, i32 0		%vecext = extractelement <4 x float> %vld1, i32 0
%vecext1 = extractelement <4 x float> %vld1, i32 1		%vecext1 = extractelement <4 x float> %vld1, i32 1
%vecext2 = extractelement <4 x float> %vld1, i32 2		%vecext2 = extractelement <4 x float> %vld1, i32 2
%vecext3 = extractelement <4 x float> %vld1, i32 3		%vecext3 = extractelement <4 x float> %vld1, i32 3
%add = fadd float %vecext3, 1.000000e+00		%add = fadd float %vecext3, 1.000000e+00
%tobool = icmp eq float* %q, null		%tobool = icmp eq float* %q, null
br i1 %tobool, label %if.end, label %if.then		br i1 %tobool, label %if.end, label %if.then

Show All 11 Lines
if.end: ; preds = %entry, %if.then		if.end: ; preds = %entry, %if.then
%a.0 = phi float [ %add4, %if.then ], [ %vecext, %entry ]		%a.0 = phi float [ %add4, %if.then ], [ %vecext, %entry ]
%b.0 = phi float [ %add6, %if.then ], [ %vecext1, %entry ]		%b.0 = phi float [ %add6, %if.then ], [ %vecext1, %entry ]
%c.0 = phi float [ %add8, %if.then ], [ %vecext2, %entry ]		%c.0 = phi float [ %add8, %if.then ], [ %vecext2, %entry ]
%vecinit = insertelement <4 x float> undef, float %a.0, i32 0		%vecinit = insertelement <4 x float> undef, float %a.0, i32 0
%vecinit9 = insertelement <4 x float> %vecinit, float %b.0, i32 1		%vecinit9 = insertelement <4 x float> %vecinit, float %b.0, i32 1
%vecinit10 = insertelement <4 x float> %vecinit9, float %c.0, i32 2		%vecinit10 = insertelement <4 x float> %vecinit9, float %c.0, i32 2
%vecinit11 = insertelement <4 x float> %vecinit10, float %add, i32 3		%vecinit11 = insertelement <4 x float> %vecinit10, float %add, i32 3
tail call void @llvm.arm.neon.vst1.v4f32(i8* %0, <4 x float> %vecinit11, i32 4)		tail call void @llvm.arm.neon.vst1.p0i8.v4f32(i8* %0, <4 x float> %vecinit11, i32 4)
ret void		ret void
}		}

declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly		declare <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8*, i32) nounwind readonly

declare void @llvm.arm.neon.vst1.v4f32(i8*, <4 x float>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v4f32(i8*, <4 x float>, i32) nounwind

; CHECK: pr13999		; CHECK: pr13999
define void @pr13999() nounwind readonly {		define void @pr13999() nounwind readonly {
entry:		entry:
br i1 true, label %outer_loop, label %loop.end		br i1 true, label %outer_loop, label %loop.end

outer_loop:		outer_loop:
%d = phi double [ 0.0, %entry ], [ %add, %after_inner_loop ]		%d = phi double [ 0.0, %entry ], [ %add, %after_inner_loop ]
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/dagcombine-concatvector.ll

	Show All 13 Lines
	bb:			bb:
	%tmp = extractvalue [4 x i64] %vec.coerce, 0			%tmp = extractvalue [4 x i64] %vec.coerce, 0
	%tmp2 = bitcast i64 %tmp to <8 x i8>			%tmp2 = bitcast i64 %tmp to <8 x i8>
	%tmp3 = shufflevector <8 x i8> %tmp2, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%tmp3 = shufflevector <8 x i8> %tmp2, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%tmp4 = extractvalue [4 x i64] %vec.coerce, 1			%tmp4 = extractvalue [4 x i64] %vec.coerce, 1
	%tmp5 = bitcast i64 %tmp4 to <8 x i8>			%tmp5 = bitcast i64 %tmp4 to <8 x i8>
	%tmp6 = shufflevector <8 x i8> %tmp5, <8 x i8> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%tmp6 = shufflevector <8 x i8> %tmp5, <8 x i8> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%tmp7 = shufflevector <16 x i8> %tmp6, <16 x i8> %tmp3, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%tmp7 = shufflevector <16 x i8> %tmp6, <16 x i8> %tmp3, <16 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	tail call void @llvm.arm.neon.vst1.v16i8(i8* %arg, <16 x i8> %tmp7, i32 2)			tail call void @llvm.arm.neon.vst1.p0i8.v16i8(i8* %arg, <16 x i8> %tmp7, i32 2)
	ret void			ret void
	}			}

	declare void @llvm.arm.neon.vst1.v16i8(i8*, <16 x i8>, i32)			declare void @llvm.arm.neon.vst1.p0i8.v16i8(i8*, <16 x i8>, i32)

llvm/trunk/test/CodeGen/ARM/neon_spill.ll

	Show All 16 Lines

	declare arm_aapcs_vfpcc %0** @func2()			declare arm_aapcs_vfpcc %0** @func2()

	declare arm_aapcs_vfpcc %2* @func3(%2, %2, i32)			declare arm_aapcs_vfpcc %2* @func3(%2, %2, i32)

	declare arm_aapcs_vfpcc %2** @func4()			declare arm_aapcs_vfpcc %2** @func4()

	define arm_aapcs_vfpcc void @foo(%3* nocapture) nounwind align 2 {			define arm_aapcs_vfpcc void @foo(%3* nocapture) nounwind align 2 {
	call void @llvm.arm.neon.vst4.v4i32(i8* undef, <4 x i32> <i32 0, i32 1065353216, i32 1073741824, i32 1077936128>, <4 x i32> <i32 1082130432, i32 1084227584, i32 1086324736, i32 1088421888>, <4 x i32> <i32 1090519040, i32 1091567616, i32 1092616192, i32 1093664768>, <4 x i32> <i32 1094713344, i32 1095761920, i32 1096810496, i32 1097859072>, i32 16) nounwind			call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* undef, <4 x i32> <i32 0, i32 1065353216, i32 1073741824, i32 1077936128>, <4 x i32> <i32 1082130432, i32 1084227584, i32 1086324736, i32 1088421888>, <4 x i32> <i32 1090519040, i32 1091567616, i32 1092616192, i32 1093664768>, <4 x i32> <i32 1094713344, i32 1095761920, i32 1096810496, i32 1097859072>, i32 16) nounwind
	%2 = call arm_aapcs_vfpcc %0** @func2() nounwind			%2 = call arm_aapcs_vfpcc %0** @func2() nounwind
	%3 = load %0, %0* %2, align 4			%3 = load %0, %0* %2, align 4
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%4 = call arm_aapcs_vfpcc %2* @func3(%2* undef, %2* undef, i32 2956) nounwind			%4 = call arm_aapcs_vfpcc %2* @func3(%2* undef, %2* undef, i32 2956) nounwind
	call arm_aapcs_vfpcc void @func1(%0* %3, float* undef, float* undef, %2* undef)			call arm_aapcs_vfpcc void @func1(%0* %3, float* undef, float* undef, %2* undef)
	%5 = call arm_aapcs_vfpcc %0** @func2() nounwind			%5 = call arm_aapcs_vfpcc %0** @func2() nounwind
	store float 1.000000e+00, float* undef, align 4			store float 1.000000e+00, float* undef, align 4
	call arm_aapcs_vfpcc void @func1(%0* undef, float* undef, float* undef, %2* undef)			call arm_aapcs_vfpcc void @func1(%0* undef, float* undef, float* undef, %2* undef)
	store float 1.500000e+01, float* undef, align 4			store float 1.500000e+01, float* undef, align 4
	%6 = call arm_aapcs_vfpcc %2** @func4() nounwind			%6 = call arm_aapcs_vfpcc %2** @func4() nounwind
	%7 = call arm_aapcs_vfpcc %2* @func3(%2* undef, %2* undef, i32 2971) nounwind			%7 = call arm_aapcs_vfpcc %2* @func3(%2* undef, %2* undef, i32 2971) nounwind
	%8 = fadd float undef, -1.000000e+05			%8 = fadd float undef, -1.000000e+05
	store float %8, float* undef, align 16			store float %8, float* undef, align 16
	%9 = call arm_aapcs_vfpcc i32 @rand() nounwind			%9 = call arm_aapcs_vfpcc i32 @rand() nounwind
	%10 = fmul float undef, 2.000000e+05			%10 = fmul float undef, 2.000000e+05
	%11 = fadd float %10, -1.000000e+05			%11 = fadd float %10, -1.000000e+05
	store float %11, float* undef, align 4			store float %11, float* undef, align 4
	call void @llvm.arm.neon.vst4.v4i32(i8* undef, <4 x i32> <i32 0, i32 1065353216, i32 1073741824, i32 1077936128>, <4 x i32> <i32 1082130432, i32 1084227584, i32 1086324736, i32 1088421888>, <4 x i32> <i32 1090519040, i32 1091567616, i32 1092616192, i32 1093664768>, <4 x i32> <i32 1094713344, i32 1095761920, i32 1096810496, i32 1097859072>, i32 16) nounwind			call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* undef, <4 x i32> <i32 0, i32 1065353216, i32 1073741824, i32 1077936128>, <4 x i32> <i32 1082130432, i32 1084227584, i32 1086324736, i32 1088421888>, <4 x i32> <i32 1090519040, i32 1091567616, i32 1092616192, i32 1093664768>, <4 x i32> <i32 1094713344, i32 1095761920, i32 1096810496, i32 1097859072>, i32 16) nounwind
	ret void			ret void
	}			}

	declare void @llvm.arm.neon.vst4.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32) nounwind

	declare arm_aapcs_vfpcc i32 @rand()			declare arm_aapcs_vfpcc i32 @rand()

llvm/trunk/test/CodeGen/ARM/out-of-registers.ll

	; RUN: llc -O3 %s -o - \| FileCheck %s			; RUN: llc -O3 %s -o - \| FileCheck %s
	; ModuleID = 'fo.c'			; ModuleID = 'fo.c'
	target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:64:128-a0:0:32-n8:16:32-S64"			target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:64:128-a0:0:32-n8:16:32-S64"
	target triple = "thumbv7-none-linux-gnueabi"			target triple = "thumbv7-none-linux-gnueabi"

	; CHECK: vpush			; CHECK: vpush
	; CHECK: vpop			; CHECK: vpop

	define void @foo(float* nocapture %A) #0 {			define void @foo(float* nocapture %A) #0 {
	%1= bitcast float* %A to i8*			%1= bitcast float* %A to i8*
	%2 = tail call { <4 x float>, <4 x float>, <4 x float>, <4 x float> } @llvm.arm.neon.vld4.v4f32(i8* %1, i32 4)			%2 = tail call { <4 x float>, <4 x float>, <4 x float>, <4 x float> } @llvm.arm.neon.vld4.v4f32.p0i8(i8* %1, i32 4)
	%3 = extractvalue { <4 x float>, <4 x float>, <4 x float>, <4 x float> } %2, 0			%3 = extractvalue { <4 x float>, <4 x float>, <4 x float>, <4 x float> } %2, 0
	%divp_vec = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %3			%divp_vec = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %3
	%4 = extractvalue { <4 x float>, <4 x float>, <4 x float>, <4 x float> } %2, 1			%4 = extractvalue { <4 x float>, <4 x float>, <4 x float>, <4 x float> } %2, 1
	%div3p_vec = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %4			%div3p_vec = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %4
	%5 = extractvalue { <4 x float>, <4 x float>, <4 x float>, <4 x float> } %2, 2			%5 = extractvalue { <4 x float>, <4 x float>, <4 x float>, <4 x float> } %2, 2
	%div8p_vec = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %5			%div8p_vec = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %5
	%6 = extractvalue { <4 x float>, <4 x float>, <4 x float>, <4 x float> } %2, 3			%6 = extractvalue { <4 x float>, <4 x float>, <4 x float>, <4 x float> } %2, 3
	%div13p_vec = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %6			%div13p_vec = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %6
	tail call void @llvm.arm.neon.vst4.v4f32(i8* %1, <4 x float> %divp_vec, <4 x float> %div3p_vec, <4 x float> %div8p_vec, <4 x float> %div13p_vec, i32 4)			tail call void @llvm.arm.neon.vst4.p0i8.v4f32(i8* %1, <4 x float> %divp_vec, <4 x float> %div3p_vec, <4 x float> %div8p_vec, <4 x float> %div13p_vec, i32 4)
	ret void			ret void
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	declare i32 @llvm.annotation.i32(i32, i8, i8, i32) #1			declare i32 @llvm.annotation.i32(i32, i8, i8, i32) #1

	; Function Attrs: nounwind readonly			; Function Attrs: nounwind readonly

	; Function Attrs: nounwind			; Function Attrs: nounwind
	declare void @llvm.arm.neon.vst4.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32) #1			declare void @llvm.arm.neon.vst4.p0i8.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32) #1
	declare { <4 x float>, <4 x float>, <4 x float>, <4 x float> } @llvm.arm.neon.vld4.v4f32(i8*, i32) #2			declare { <4 x float>, <4 x float>, <4 x float>, <4 x float> } @llvm.arm.neon.vld4.v4f32.p0i8(i8*, i32) #2

	; Function Attrs: nounwind			; Function Attrs: nounwind

	attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "stack-protector-buffer-size"="8" "unsafe-fp-math"="true" "use-soft-float"="false" }			attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "stack-protector-buffer-size"="8" "unsafe-fp-math"="true" "use-soft-float"="false" }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }
	attributes #2 = { nounwind readonly }			attributes #2 = { nounwind readonly }

	!llvm.ident = !{!0}			!llvm.ident = !{!0}

	!0 = !{!"Snapdragon LLVM ARM Compiler 3.4"}			!0 = !{!"Snapdragon LLVM ARM Compiler 3.4"}
	!1 = !{!1}			!1 = !{!1}

llvm/trunk/test/CodeGen/ARM/reg_sequence.ll

Show All 18 Lines
; CHECK: vshrn.i32		; CHECK: vshrn.i32
; CHECK-NOT: vmov d		; CHECK-NOT: vmov d
; CHECK-NEXT: vst1.16		; CHECK-NEXT: vst1.16
%0 = getelementptr inbounds %struct.int32x4_t, %struct.int32x4_t* %vT0ptr, i32 0, i32 0 ; <<4 x i32>*> [#uses=1]		%0 = getelementptr inbounds %struct.int32x4_t, %struct.int32x4_t* %vT0ptr, i32 0, i32 0 ; <<4 x i32>*> [#uses=1]
%1 = load <4 x i32>, <4 x i32>* %0, align 16 ; <<4 x i32>> [#uses=1]		%1 = load <4 x i32>, <4 x i32>* %0, align 16 ; <<4 x i32>> [#uses=1]
%2 = getelementptr inbounds %struct.int32x4_t, %struct.int32x4_t* %vT1ptr, i32 0, i32 0 ; <<4 x i32>*> [#uses=1]		%2 = getelementptr inbounds %struct.int32x4_t, %struct.int32x4_t* %vT1ptr, i32 0, i32 0 ; <<4 x i32>*> [#uses=1]
%3 = load <4 x i32>, <4 x i32>* %2, align 16 ; <<4 x i32>> [#uses=1]		%3 = load <4 x i32>, <4 x i32>* %2, align 16 ; <<4 x i32>> [#uses=1]
%4 = bitcast i16* %i_ptr to i8* ; <i8*> [#uses=1]		%4 = bitcast i16* %i_ptr to i8* ; <i8*> [#uses=1]
%5 = tail call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %4, i32 1) ; <<8 x i16>> [#uses=1]		%5 = tail call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %4, i32 1) ; <<8 x i16>> [#uses=1]
%6 = bitcast <8 x i16> %5 to <2 x double> ; <<2 x double>> [#uses=2]		%6 = bitcast <8 x i16> %5 to <2 x double> ; <<2 x double>> [#uses=2]
%7 = extractelement <2 x double> %6, i32 0 ; <double> [#uses=1]		%7 = extractelement <2 x double> %6, i32 0 ; <double> [#uses=1]
%8 = bitcast double %7 to <4 x i16> ; <<4 x i16>> [#uses=1]		%8 = bitcast double %7 to <4 x i16> ; <<4 x i16>> [#uses=1]
%9 = sext <4 x i16> %8 to <4 x i32> ; <<4 x i32>> [#uses=1]		%9 = sext <4 x i16> %8 to <4 x i32> ; <<4 x i32>> [#uses=1]
%10 = extractelement <2 x double> %6, i32 1 ; <double> [#uses=1]		%10 = extractelement <2 x double> %6, i32 1 ; <double> [#uses=1]
%11 = bitcast double %10 to <4 x i16> ; <<4 x i16>> [#uses=1]		%11 = bitcast double %10 to <4 x i16> ; <<4 x i16>> [#uses=1]
%12 = sext <4 x i16> %11 to <4 x i32> ; <<4 x i32>> [#uses=1]		%12 = sext <4 x i16> %11 to <4 x i32> ; <<4 x i32>> [#uses=1]
%13 = mul <4 x i32> %1, %9 ; <<4 x i32>> [#uses=1]		%13 = mul <4 x i32> %1, %9 ; <<4 x i32>> [#uses=1]
%14 = mul <4 x i32> %3, %12 ; <<4 x i32>> [#uses=1]		%14 = mul <4 x i32> %3, %12 ; <<4 x i32>> [#uses=1]
%15 = lshr <4 x i32> %13, <i32 12, i32 12, i32 12, i32 12>		%15 = lshr <4 x i32> %13, <i32 12, i32 12, i32 12, i32 12>
%trunc_15 = trunc <4 x i32> %15 to <4 x i16>		%trunc_15 = trunc <4 x i32> %15 to <4 x i16>
%16 = lshr <4 x i32> %14, <i32 12, i32 12, i32 12, i32 12>		%16 = lshr <4 x i32> %14, <i32 12, i32 12, i32 12, i32 12>
%trunc_16 = trunc <4 x i32> %16 to <4 x i16>		%trunc_16 = trunc <4 x i32> %16 to <4 x i16>
%17 = shufflevector <4 x i16> %trunc_15, <4 x i16> %trunc_16, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> ; <<8 x i16>> [#uses=1]		%17 = shufflevector <4 x i16> %trunc_15, <4 x i16> %trunc_16, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> ; <<8 x i16>> [#uses=1]
%18 = bitcast i16* %o_ptr to i8* ; <i8*> [#uses=1]		%18 = bitcast i16* %o_ptr to i8* ; <i8*> [#uses=1]
tail call void @llvm.arm.neon.vst1.v8i16(i8* %18, <8 x i16> %17, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %18, <8 x i16> %17, i32 1)
ret void		ret void
}		}

define void @t2(i16* %i_ptr, i16* %o_ptr, %struct.int16x8_t* nocapture %vT0ptr, %struct.int16x8_t* nocapture %vT1ptr) nounwind {		define void @t2(i16* %i_ptr, i16* %o_ptr, %struct.int16x8_t* nocapture %vT0ptr, %struct.int16x8_t* nocapture %vT1ptr) nounwind {
entry:		entry:
; CHECK-LABEL: t2:		; CHECK-LABEL: t2:
; CHECK: vld1.16		; CHECK: vld1.16
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK: vmul.i16		; CHECK: vmul.i16
; CHECK: vld1.16		; CHECK: vld1.16
; CHECK: vmul.i16		; CHECK: vmul.i16
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK: vst1.16		; CHECK: vst1.16
; CHECK: vst1.16		; CHECK: vst1.16
%0 = getelementptr inbounds %struct.int16x8_t, %struct.int16x8_t* %vT0ptr, i32 0, i32 0 ; <<8 x i16>*> [#uses=1]		%0 = getelementptr inbounds %struct.int16x8_t, %struct.int16x8_t* %vT0ptr, i32 0, i32 0 ; <<8 x i16>*> [#uses=1]
%1 = load <8 x i16>, <8 x i16>* %0, align 16 ; <<8 x i16>> [#uses=1]		%1 = load <8 x i16>, <8 x i16>* %0, align 16 ; <<8 x i16>> [#uses=1]
%2 = getelementptr inbounds %struct.int16x8_t, %struct.int16x8_t* %vT1ptr, i32 0, i32 0 ; <<8 x i16>*> [#uses=1]		%2 = getelementptr inbounds %struct.int16x8_t, %struct.int16x8_t* %vT1ptr, i32 0, i32 0 ; <<8 x i16>*> [#uses=1]
%3 = load <8 x i16>, <8 x i16>* %2, align 16 ; <<8 x i16>> [#uses=1]		%3 = load <8 x i16>, <8 x i16>* %2, align 16 ; <<8 x i16>> [#uses=1]
%4 = bitcast i16* %i_ptr to i8* ; <i8*> [#uses=1]		%4 = bitcast i16* %i_ptr to i8* ; <i8*> [#uses=1]
%5 = tail call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %4, i32 1) ; <<8 x i16>> [#uses=1]		%5 = tail call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %4, i32 1) ; <<8 x i16>> [#uses=1]
%6 = getelementptr inbounds i16, i16* %i_ptr, i32 8 ; <i16*> [#uses=1]		%6 = getelementptr inbounds i16, i16* %i_ptr, i32 8 ; <i16*> [#uses=1]
%7 = bitcast i16* %6 to i8* ; <i8*> [#uses=1]		%7 = bitcast i16* %6 to i8* ; <i8*> [#uses=1]
%8 = tail call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %7, i32 1) ; <<8 x i16>> [#uses=1]		%8 = tail call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %7, i32 1) ; <<8 x i16>> [#uses=1]
%9 = mul <8 x i16> %1, %5 ; <<8 x i16>> [#uses=1]		%9 = mul <8 x i16> %1, %5 ; <<8 x i16>> [#uses=1]
%10 = mul <8 x i16> %3, %8 ; <<8 x i16>> [#uses=1]		%10 = mul <8 x i16> %3, %8 ; <<8 x i16>> [#uses=1]
%11 = bitcast i16* %o_ptr to i8* ; <i8*> [#uses=1]		%11 = bitcast i16* %o_ptr to i8* ; <i8*> [#uses=1]
tail call void @llvm.arm.neon.vst1.v8i16(i8* %11, <8 x i16> %9, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %11, <8 x i16> %9, i32 1)
%12 = getelementptr inbounds i16, i16* %o_ptr, i32 8 ; <i16*> [#uses=1]		%12 = getelementptr inbounds i16, i16* %o_ptr, i32 8 ; <i16*> [#uses=1]
%13 = bitcast i16* %12 to i8* ; <i8*> [#uses=1]		%13 = bitcast i16* %12 to i8* ; <i8*> [#uses=1]
tail call void @llvm.arm.neon.vst1.v8i16(i8* %13, <8 x i16> %10, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %13, <8 x i16> %10, i32 1)
ret void		ret void
}		}

define <8 x i8> @t3(i8* %A, i8* %B) nounwind {		define <8 x i8> @t3(i8* %A, i8* %B) nounwind {
; CHECK-LABEL: t3:		; CHECK-LABEL: t3:
; CHECK: vld3.8		; CHECK: vld3.8
; CHECK: vmul.i8		; CHECK: vmul.i8
; CHECK: vmov r		; CHECK: vmov r
; CHECK-NOT: vmov d		; CHECK-NOT: vmov d
; CHECK: vst3.8		; CHECK: vst3.8
%tmp1 = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]		%tmp1 = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=2]
%tmp2 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 0 ; <<8 x i8>> [#uses=1]		%tmp2 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 0 ; <<8 x i8>> [#uses=1]
%tmp3 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 2 ; <<8 x i8>> [#uses=1]		%tmp3 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 2 ; <<8 x i8>> [#uses=1]
%tmp4 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 1 ; <<8 x i8>> [#uses=1]		%tmp4 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 1 ; <<8 x i8>> [#uses=1]
%tmp5 = sub <8 x i8> %tmp3, %tmp4		%tmp5 = sub <8 x i8> %tmp3, %tmp4
%tmp6 = add <8 x i8> %tmp2, %tmp3 ; <<8 x i8>> [#uses=1]		%tmp6 = add <8 x i8> %tmp2, %tmp3 ; <<8 x i8>> [#uses=1]
%tmp7 = mul <8 x i8> %tmp4, %tmp2		%tmp7 = mul <8 x i8> %tmp4, %tmp2
tail call void @llvm.arm.neon.vst3.v8i8(i8* %B, <8 x i8> %tmp5, <8 x i8> %tmp6, <8 x i8> %tmp7, i32 1)		tail call void @llvm.arm.neon.vst3.p0i8.v8i8(i8* %B, <8 x i8> %tmp5, <8 x i8> %tmp6, <8 x i8> %tmp7, i32 1)
ret <8 x i8> %tmp4		ret <8 x i8> %tmp4
}		}

define void @t4(i32* %in, i32* %out) nounwind {		define void @t4(i32* %in, i32* %out) nounwind {
entry:		entry:
; CHECK-LABEL: t4:		; CHECK-LABEL: t4:
; CHECK: vld2.32		; CHECK: vld2.32
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK: vld2.32		; CHECK: vld2.32
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK: bne		; CHECK: bne
%tmp1 = bitcast i32* %in to i8* ; <i8*> [#uses=1]		%tmp1 = bitcast i32* %in to i8* ; <i8*> [#uses=1]
%tmp2 = tail call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32(i8* %tmp1, i32 1) ; <%struct.__neon_int32x4x2_t> [#uses=2]		%tmp2 = tail call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32.p0i8(i8* %tmp1, i32 1) ; <%struct.__neon_int32x4x2_t> [#uses=2]
%tmp3 = getelementptr inbounds i32, i32* %in, i32 8 ; <i32*> [#uses=1]		%tmp3 = getelementptr inbounds i32, i32* %in, i32 8 ; <i32*> [#uses=1]
%tmp4 = bitcast i32* %tmp3 to i8* ; <i8*> [#uses=1]		%tmp4 = bitcast i32* %tmp3 to i8* ; <i8*> [#uses=1]
%tmp5 = tail call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32(i8* %tmp4, i32 1) ; <%struct.__neon_int32x4x2_t> [#uses=2]		%tmp5 = tail call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32.p0i8(i8* %tmp4, i32 1) ; <%struct.__neon_int32x4x2_t> [#uses=2]
%tmp8 = bitcast i32* %out to i8* ; <i8*> [#uses=1]		%tmp8 = bitcast i32* %out to i8* ; <i8*> [#uses=1]
br i1 undef, label %return1, label %return2		br i1 undef, label %return1, label %return2

return1:		return1:
; CHECK: %return1		; CHECK: %return1
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK-NEXT: vadd.i32		; CHECK-NEXT: vadd.i32
; CHECK-NEXT: vadd.i32		; CHECK-NEXT: vadd.i32
; CHECK-NEXT: vst2.32		; CHECK-NEXT: vst2.32
%tmp52 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 0 ; <<4 x i32>> [#uses=1]		%tmp52 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 0 ; <<4 x i32>> [#uses=1]
%tmp57 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 1 ; <<4 x i32>> [#uses=1]		%tmp57 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 1 ; <<4 x i32>> [#uses=1]
%tmp = extractvalue %struct.__neon_int32x4x2_t %tmp5, 0 ; <<4 x i32>> [#uses=1]		%tmp = extractvalue %struct.__neon_int32x4x2_t %tmp5, 0 ; <<4 x i32>> [#uses=1]
%tmp39 = extractvalue %struct.__neon_int32x4x2_t %tmp5, 1 ; <<4 x i32>> [#uses=1]		%tmp39 = extractvalue %struct.__neon_int32x4x2_t %tmp5, 1 ; <<4 x i32>> [#uses=1]
%tmp6 = add <4 x i32> %tmp52, %tmp ; <<4 x i32>> [#uses=1]		%tmp6 = add <4 x i32> %tmp52, %tmp ; <<4 x i32>> [#uses=1]
%tmp7 = add <4 x i32> %tmp57, %tmp39 ; <<4 x i32>> [#uses=1]		%tmp7 = add <4 x i32> %tmp57, %tmp39 ; <<4 x i32>> [#uses=1]
tail call void @llvm.arm.neon.vst2.v4i32(i8* %tmp8, <4 x i32> %tmp6, <4 x i32> %tmp7, i32 1)		tail call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* %tmp8, <4 x i32> %tmp6, <4 x i32> %tmp7, i32 1)
ret void		ret void

return2:		return2:
; CHECK: %return2		; CHECK: %return2
; CHECK: vadd.i32		; CHECK: vadd.i32
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK: vst2.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}		; CHECK: vst2.32 {d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}}
%tmp100 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 0 ; <<4 x i32>> [#uses=1]		%tmp100 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 0 ; <<4 x i32>> [#uses=1]
%tmp101 = extractvalue %struct.__neon_int32x4x2_t %tmp5, 1 ; <<4 x i32>> [#uses=1]		%tmp101 = extractvalue %struct.__neon_int32x4x2_t %tmp5, 1 ; <<4 x i32>> [#uses=1]
%tmp102 = add <4 x i32> %tmp100, %tmp101 ; <<4 x i32>> [#uses=1]		%tmp102 = add <4 x i32> %tmp100, %tmp101 ; <<4 x i32>> [#uses=1]
tail call void @llvm.arm.neon.vst2.v4i32(i8* %tmp8, <4 x i32> %tmp102, <4 x i32> %tmp101, i32 1)		tail call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* %tmp8, <4 x i32> %tmp102, <4 x i32> %tmp101, i32 1)
call void @llvm.trap()		call void @llvm.trap()
unreachable		unreachable
}		}

define <8 x i16> @t5(i16* %A, <8 x i16>* %B) nounwind {		define <8 x i16> @t5(i16* %A, <8 x i16>* %B) nounwind {
; CHECK-LABEL: t5:		; CHECK-LABEL: t5:
; CHECK: vld1.32		; CHECK: vld1.32
; How can FileCheck match Q and D registers? We need a lisp interpreter.		; How can FileCheck match Q and D registers? We need a lisp interpreter.
; CHECK: vorr {{q[0-9]+}}, {{q[0-9]+}}, {{q[0-9]+}}		; CHECK: vorr {{q[0-9]+}}, {{q[0-9]+}}, {{q[0-9]+}}
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK: vld2.16 {d{{[0-9]+}}[1], d{{[0-9]+}}[1]}, [r0]		; CHECK: vld2.16 {d{{[0-9]+}}[1], d{{[0-9]+}}[1]}, [r0]
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK: vadd.i16		; CHECK: vadd.i16
%tmp0 = bitcast i16* %A to i8* ; <i8*> [#uses=1]		%tmp0 = bitcast i16* %A to i8* ; <i8*> [#uses=1]
%tmp1 = load <8 x i16>, <8 x i16>* %B ; <<8 x i16>> [#uses=2]		%tmp1 = load <8 x i16>, <8 x i16>* %B ; <<8 x i16>> [#uses=2]
%tmp2 = call %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2lane.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1, i32 1) ; <%struct.__neon_int16x8x2_t> [#uses=2]		%tmp2 = call %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2lane.v8i16.p0i8(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1, i32 1) ; <%struct.__neon_int16x8x2_t> [#uses=2]
%tmp3 = extractvalue %struct.__neon_int16x8x2_t %tmp2, 0 ; <<8 x i16>> [#uses=1]		%tmp3 = extractvalue %struct.__neon_int16x8x2_t %tmp2, 0 ; <<8 x i16>> [#uses=1]
%tmp4 = extractvalue %struct.__neon_int16x8x2_t %tmp2, 1 ; <<8 x i16>> [#uses=1]		%tmp4 = extractvalue %struct.__neon_int16x8x2_t %tmp2, 1 ; <<8 x i16>> [#uses=1]
%tmp5 = add <8 x i16> %tmp3, %tmp4 ; <<8 x i16>> [#uses=1]		%tmp5 = add <8 x i16> %tmp3, %tmp4 ; <<8 x i16>> [#uses=1]
ret <8 x i16> %tmp5		ret <8 x i16> %tmp5
}		}

define <8 x i8> @t6(i8* %A, <8 x i8>* %B) nounwind {		define <8 x i8> @t6(i8* %A, <8 x i8>* %B) nounwind {
; CHECK-LABEL: t6:		; CHECK-LABEL: t6:
; CHECK: vldr		; CHECK: vldr
; CHECK: vorr d[[D0:[0-9]+]], d[[D1:[0-9]+]]		; CHECK: vorr d[[D0:[0-9]+]], d[[D1:[0-9]+]]
; CHECK-NEXT: vld2.8 {d[[D1]][1], d[[D0]][1]}		; CHECK-NEXT: vld2.8 {d[[D1]][1], d[[D0]][1]}
%tmp1 = load <8 x i8>, <8 x i8>* %B ; <<8 x i8>> [#uses=2]		%tmp1 = load <8 x i8>, <8 x i8>* %B ; <<8 x i8>> [#uses=2]
%tmp2 = call %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 1) ; <%struct.__neon_int8x8x2_t> [#uses=2]		%tmp2 = call %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8.p0i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 1) ; <%struct.__neon_int8x8x2_t> [#uses=2]
%tmp3 = extractvalue %struct.__neon_int8x8x2_t %tmp2, 0 ; <<8 x i8>> [#uses=1]		%tmp3 = extractvalue %struct.__neon_int8x8x2_t %tmp2, 0 ; <<8 x i8>> [#uses=1]
%tmp4 = extractvalue %struct.__neon_int8x8x2_t %tmp2, 1 ; <<8 x i8>> [#uses=1]		%tmp4 = extractvalue %struct.__neon_int8x8x2_t %tmp2, 1 ; <<8 x i8>> [#uses=1]
%tmp5 = add <8 x i8> %tmp3, %tmp4 ; <<8 x i8>> [#uses=1]		%tmp5 = add <8 x i8> %tmp3, %tmp4 ; <<8 x i8>> [#uses=1]
ret <8 x i8> %tmp5		ret <8 x i8> %tmp5
}		}

define void @t7(i32* %iptr, i32* %optr) nounwind {		define void @t7(i32* %iptr, i32* %optr) nounwind {
entry:		entry:
; CHECK-LABEL: t7:		; CHECK-LABEL: t7:
; CHECK: vld2.32		; CHECK: vld2.32
; CHECK: vst2.32		; CHECK: vst2.32
; CHECK: vld1.32 {d{{[0-9]+}}, d{{[0-9]+}}},		; CHECK: vld1.32 {d{{[0-9]+}}, d{{[0-9]+}}},
; CHECK: vorr q[[Q0:[0-9]+]], q[[Q1:[0-9]+]], q[[Q1:[0-9]+]]		; CHECK: vorr q[[Q0:[0-9]+]], q[[Q1:[0-9]+]], q[[Q1:[0-9]+]]
; CHECK-NOT: vmov		; CHECK-NOT: vmov
; CHECK: vuzp.32 q[[Q1]], q[[Q0]]		; CHECK: vuzp.32 q[[Q1]], q[[Q0]]
; CHECK: vst1.32		; CHECK: vst1.32
%0 = bitcast i32* %iptr to i8* ; <i8*> [#uses=2]		%0 = bitcast i32* %iptr to i8* ; <i8*> [#uses=2]
%1 = tail call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32(i8* %0, i32 1) ; <%struct.__neon_int32x4x2_t> [#uses=2]		%1 = tail call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32.p0i8(i8* %0, i32 1) ; <%struct.__neon_int32x4x2_t> [#uses=2]
%tmp57 = extractvalue %struct.__neon_int32x4x2_t %1, 0 ; <<4 x i32>> [#uses=1]		%tmp57 = extractvalue %struct.__neon_int32x4x2_t %1, 0 ; <<4 x i32>> [#uses=1]
%tmp60 = extractvalue %struct.__neon_int32x4x2_t %1, 1 ; <<4 x i32>> [#uses=1]		%tmp60 = extractvalue %struct.__neon_int32x4x2_t %1, 1 ; <<4 x i32>> [#uses=1]
%2 = bitcast i32* %optr to i8* ; <i8*> [#uses=2]		%2 = bitcast i32* %optr to i8* ; <i8*> [#uses=2]
tail call void @llvm.arm.neon.vst2.v4i32(i8* %2, <4 x i32> %tmp57, <4 x i32> %tmp60, i32 1)		tail call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* %2, <4 x i32> %tmp57, <4 x i32> %tmp60, i32 1)
%3 = tail call <4 x i32> @llvm.arm.neon.vld1.v4i32(i8* %0, i32 1) ; <<4 x i32>> [#uses=1]		%3 = tail call <4 x i32> @llvm.arm.neon.vld1.v4i32.p0i8(i8* %0, i32 1) ; <<4 x i32>> [#uses=1]
%4 = shufflevector <4 x i32> %3, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 0, i32 2> ; <<4 x i32>> [#uses=1]		%4 = shufflevector <4 x i32> %3, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 0, i32 2> ; <<4 x i32>> [#uses=1]
tail call void @llvm.arm.neon.vst1.v4i32(i8* %2, <4 x i32> %4, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v4i32(i8* %2, <4 x i32> %4, i32 1)
ret void		ret void
}		}

; PR7156		; PR7156
define arm_aapcs_vfpcc i32 @t8() nounwind {		define arm_aapcs_vfpcc i32 @t8() nounwind {
; CHECK-LABEL: t8:		; CHECK-LABEL: t8:
; CHECK: vrsqrte.f32 q8, q8		; CHECK: vrsqrte.f32 q8, q8
bb.nph55.bb.nph55.split_crit_edge:		bb.nph55.bb.nph55.split_crit_edge:
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	exit: ; preds = %bb.i19
unreachable		unreachable

bb14: ; preds = %bb6		bb14: ; preds = %bb6
ret i32 0		ret i32 0
}		}

; This test crashes the coalescer because live variables were not updated properly.		; This test crashes the coalescer because live variables were not updated properly.
define <8 x i8> @t11(i8* %A1, i8* %A2, i8* %A3, i8* %A4, i8* %A5, i8* %A6, i8* %A7, i8* %A8, i8* %B) nounwind {		define <8 x i8> @t11(i8* %A1, i8* %A2, i8* %A3, i8* %A4, i8* %A5, i8* %A6, i8* %A7, i8* %A8, i8* %B) nounwind {
%tmp1d = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A4, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=1]		%tmp1d = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A4, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=1]
%tmp2d = extractvalue %struct.__neon_int8x8x3_t %tmp1d, 0 ; <<8 x i8>> [#uses=1]		%tmp2d = extractvalue %struct.__neon_int8x8x3_t %tmp1d, 0 ; <<8 x i8>> [#uses=1]
%tmp1f = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A6, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=1]		%tmp1f = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A6, i32 1) ; <%struct.__neon_int8x8x3_t> [#uses=1]
%tmp2f = extractvalue %struct.__neon_int8x8x3_t %tmp1f, 0 ; <<8 x i8>> [#uses=1]		%tmp2f = extractvalue %struct.__neon_int8x8x3_t %tmp1f, 0 ; <<8 x i8>> [#uses=1]
%tmp2bd = add <8 x i8> zeroinitializer, %tmp2d ; <<8 x i8>> [#uses=1]		%tmp2bd = add <8 x i8> zeroinitializer, %tmp2d ; <<8 x i8>> [#uses=1]
%tmp2abcd = mul <8 x i8> zeroinitializer, %tmp2bd ; <<8 x i8>> [#uses=1]		%tmp2abcd = mul <8 x i8> zeroinitializer, %tmp2bd ; <<8 x i8>> [#uses=1]
%tmp2ef = sub <8 x i8> zeroinitializer, %tmp2f ; <<8 x i8>> [#uses=1]		%tmp2ef = sub <8 x i8> zeroinitializer, %tmp2f ; <<8 x i8>> [#uses=1]
%tmp2efgh = mul <8 x i8> %tmp2ef, undef ; <<8 x i8>> [#uses=2]		%tmp2efgh = mul <8 x i8> %tmp2ef, undef ; <<8 x i8>> [#uses=2]
call void @llvm.arm.neon.vst3.v8i8(i8* %A2, <8 x i8> undef, <8 x i8> undef, <8 x i8> %tmp2efgh, i32 1)		call void @llvm.arm.neon.vst3.p0i8.v8i8(i8* %A2, <8 x i8> undef, <8 x i8> undef, <8 x i8> %tmp2efgh, i32 1)
%tmp2 = sub <8 x i8> %tmp2efgh, %tmp2abcd ; <<8 x i8>> [#uses=1]		%tmp2 = sub <8 x i8> %tmp2efgh, %tmp2abcd ; <<8 x i8>> [#uses=1]
%tmp7 = mul <8 x i8> undef, %tmp2 ; <<8 x i8>> [#uses=1]		%tmp7 = mul <8 x i8> undef, %tmp2 ; <<8 x i8>> [#uses=1]
tail call void @llvm.arm.neon.vst3.v8i8(i8* %B, <8 x i8> undef, <8 x i8> undef, <8 x i8> %tmp7, i32 1)		tail call void @llvm.arm.neon.vst3.p0i8.v8i8(i8* %B, <8 x i8> undef, <8 x i8> undef, <8 x i8> %tmp7, i32 1)
ret <8 x i8> undef		ret <8 x i8> undef
}		}

declare <4 x i32> @llvm.arm.neon.vld1.v4i32(i8*, i32) nounwind readonly		declare <4 x i32> @llvm.arm.neon.vld1.v4i32.p0i8(i8*, i32) nounwind readonly

declare <8 x i16> @llvm.arm.neon.vld1.v8i16(i8*, i32) nounwind readonly		declare <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8*, i32) nounwind readonly

declare <4 x i16> @llvm.arm.neon.vshiftn.v4i16(<4 x i32>, <4 x i32>) nounwind readnone		declare <4 x i16> @llvm.arm.neon.vshiftn.v4i16(<4 x i32>, <4 x i32>) nounwind readnone

declare void @llvm.arm.neon.vst1.v4i32(i8*, <4 x i32>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v4i32(i8*, <4 x i32>, i32) nounwind

declare void @llvm.arm.neon.vst1.v8i16(i8*, <8 x i16>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v8i16(i8*, <8 x i16>, i32) nounwind

declare void @llvm.arm.neon.vst3.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32)		declare void @llvm.arm.neon.vst3.p0i8.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32)
nounwind		nounwind

declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8*, i32) nounwind readonly		declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8*, i32) nounwind readonly

declare %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32(i8*, i32) nounwind readonly		declare %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32.p0i8(i8*, i32) nounwind readonly

declare %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8(i8*, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly		declare %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8.p0i8(i8*, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly

declare %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2lane.v8i16(i8*, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly		declare %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2lane.v8i16.p0i8(i8*, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly

declare void @llvm.arm.neon.vst2.v4i32(i8*, <4 x i32>, <4 x i32>, i32) nounwind		declare void @llvm.arm.neon.vst2.p0i8.v4i32(i8*, <4 x i32>, <4 x i32>, i32) nounwind

declare <4 x float> @llvm.arm.neon.vrsqrte.v4f32(<4 x float>) nounwind readnone		declare <4 x float> @llvm.arm.neon.vrsqrte.v4f32(<4 x float>) nounwind readnone

declare void @llvm.trap() nounwind		declare void @llvm.trap() nounwind

llvm/trunk/test/CodeGen/ARM/spill-q.ll

	; RUN: llc < %s -mtriple=armv7-elf -mattr=+neon -arm-atomic-cfg-tidy=0 \| FileCheck %s			; RUN: llc < %s -mtriple=armv7-elf -mattr=+neon -arm-atomic-cfg-tidy=0 \| FileCheck %s
	; PR4789			; PR4789

	%bar = type { float, float, float }			%bar = type { float, float, float }
	%baz = type { i32, [16 x %bar], [16 x float], [16 x i32], i8 }			%baz = type { i32, [16 x %bar], [16 x float], [16 x i32], i8 }
	%foo = type { <4 x float> }			%foo = type { <4 x float> }
	%quux = type { i32 (...)*, %baz, i32 }			%quux = type { i32 (...)*, %baz, i32 }
	%quuz = type { %quux, i32, %bar, [128 x i8], [16 x %foo], %foo, %foo, %foo }			%quuz = type { %quux, i32, %bar, [128 x i8], [16 x %foo], %foo, %foo, %foo }

	declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly			declare <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8*, i32) nounwind readonly

	define void @aaa(%quuz* %this, i8* %block) {			define void @aaa(%quuz* %this, i8* %block) {
	; CHECK-LABEL: aaa:			; CHECK-LABEL: aaa:
	; CHECK: bfc {{.*}}, #0, #4			; CHECK: bfc {{.*}}, #0, #4
	; CHECK: vst1.64 {{.*}}sp:128			; CHECK: vst1.64 {{.*}}sp:128
	; CHECK: vld1.64 {{.*}}sp:128			; CHECK: vld1.64 {{.*}}sp:128
	entry:			entry:
	%aligned_vec = alloca <4 x float>, align 16			%aligned_vec = alloca <4 x float>, align 16
	%"alloca point" = bitcast i32 0 to i32			%"alloca point" = bitcast i32 0 to i32
	%vecptr = bitcast <4 x float>* %aligned_vec to i8*			%vecptr = bitcast <4 x float>* %aligned_vec to i8*
	%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %vecptr, i32 1) nounwind ; <<4 x float>> [#uses=1]			%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* %vecptr, i32 1) nounwind ; <<4 x float>> [#uses=1]
	store float 6.300000e+01, float* undef, align 4			store float 6.300000e+01, float* undef, align 4
	%1 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind ; <<4 x float>> [#uses=1]			%1 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind ; <<4 x float>> [#uses=1]
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%2 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind ; <<4 x float>> [#uses=1]			%2 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind ; <<4 x float>> [#uses=1]
	%ld3 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld3 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld4 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld4 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld5 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld5 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld6 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld6 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld7 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld7 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld8 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld8 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld9 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld9 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld10 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld10 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld11 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld11 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld12 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld12 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%val173 = load <4 x float>, <4 x float>* undef ; <<4 x float>> [#uses=1]			%val173 = load <4 x float>, <4 x float>* undef ; <<4 x float>> [#uses=1]
	br label %bb4			br label %bb4

	bb4: ; preds = %bb193, %entry			bb4: ; preds = %bb193, %entry
	%besterror.0.2264 = phi <4 x float> [ undef, %entry ], [ %besterror.0.0, %bb193 ] ; <<4 x float>> [#uses=2]			%besterror.0.2264 = phi <4 x float> [ undef, %entry ], [ %besterror.0.0, %bb193 ] ; <<4 x float>> [#uses=2]
	%part0.0.0261 = phi <4 x float> [ zeroinitializer, %entry ], [ %23, %bb193 ] ; <<4 x float>> [#uses=2]			%part0.0.0261 = phi <4 x float> [ zeroinitializer, %entry ], [ %23, %bb193 ] ; <<4 x float>> [#uses=2]
	%3 = fmul <4 x float> zeroinitializer, %0 ; <<4 x float>> [#uses=2]			%3 = fmul <4 x float> zeroinitializer, %0 ; <<4 x float>> [#uses=2]
	Show All 39 Lines

llvm/trunk/test/CodeGen/ARM/vcge.ll

	Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
	;CHECK: vcle.f32			;CHECK: vcle.f32
	entry:			entry:
	%0 = fcmp ole <4 x float> undef, zeroinitializer			%0 = fcmp ole <4 x float> undef, zeroinitializer
	%1 = sext <4 x i1> %0 to <4 x i16>			%1 = sext <4 x i1> %0 to <4 x i16>
	%2 = add <4 x i16> %1, zeroinitializer			%2 = add <4 x i16> %1, zeroinitializer
	%3 = shufflevector <4 x i16> %2, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%3 = shufflevector <4 x i16> %2, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%4 = add <8 x i16> %3, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%4 = add <8 x i16> %3, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%5 = trunc <8 x i16> %4 to <8 x i8>			%5 = trunc <8 x i16> %4 to <8 x i8>
	tail call void @llvm.arm.neon.vst1.v8i8(i8* undef, <8 x i8> %5, i32 1)			tail call void @llvm.arm.neon.vst1.p0i8.v8i8(i8* undef, <8 x i8> %5, i32 1)
	unreachable			unreachable
	}			}

	declare void @llvm.arm.neon.vst1.v8i8(i8*, <8 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v8i8(i8*, <8 x i8>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/vector-DAGCombine.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; operands with i16 elements.			; operands with i16 elements.
	define void @test_i16_constant_fold() nounwind optsize {			define void @test_i16_constant_fold() nounwind optsize {
	entry:			entry:
	%0 = sext <4 x i1> zeroinitializer to <4 x i16>			%0 = sext <4 x i1> zeroinitializer to <4 x i16>
	%1 = add <4 x i16> %0, zeroinitializer			%1 = add <4 x i16> %0, zeroinitializer
	%2 = shufflevector <4 x i16> %1, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%2 = shufflevector <4 x i16> %1, <4 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%3 = add <8 x i16> %2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>			%3 = add <8 x i16> %2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
	%4 = trunc <8 x i16> %3 to <8 x i8>			%4 = trunc <8 x i16> %3 to <8 x i8>
	tail call void @llvm.arm.neon.vst1.v8i8(i8* undef, <8 x i8> %4, i32 1)			tail call void @llvm.arm.neon.vst1.p0i8.v8i8(i8* undef, <8 x i8> %4, i32 1)
	unreachable			unreachable
	}			}

	declare void @llvm.arm.neon.vst1.v8i8(i8*, <8 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v8i8(i8*, <8 x i8>, i32) nounwind

	; Test that loads and stores of i64 vector elements are handled as f64 values			; Test that loads and stores of i64 vector elements are handled as f64 values
	; so they are not split up into i32 values. Radar 8755338.			; so they are not split up into i32 values. Radar 8755338.
	define void @i64_buildvector(i64* %ptr, <2 x i64>* %vp) nounwind {			define void @i64_buildvector(i64* %ptr, <2 x i64>* %vp) nounwind {
	; CHECK: i64_buildvector			; CHECK: i64_buildvector
	; CHECK: vldr			; CHECK: vldr
	%t0 = load i64, i64* %ptr, align 4			%t0 = load i64, i64* %ptr, align 4
	%t1 = insertelement <2 x i64> undef, i64 %t0, i32 0			%t1 = insertelement <2 x i64> undef, i64 %t0, i32 0
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/vld-vst-upgrade.ll

				; RUN: llc -mtriple=arm-eabi -mattr=+neon < %s \| FileCheck %s

				%struct.__neon_int32x2x2_t = type { <2 x i32>, <2 x i32> }
				%struct.__neon_int32x2x3_t = type { <2 x i32>, <2 x i32>, <2 x i32> }
				%struct.__neon_int32x2x4_t = type { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> }

				; vld[1234] auto-upgrade tests

				; CHECK-LABEL: test_vld1_upgrade:
				; CHECK: vld1.32 {d16}, [r0]
				define <2 x i32> @test_vld1_upgrade(i8* %ptr) {
				%tmp1 = call <2 x i32> @llvm.arm.neon.vld1.v2i32(i8* %ptr, i32 1)
				ret <2 x i32> %tmp1
				}

				declare <2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32) nounwind readonly

				; CHECK-LABEL: test_vld2_upgrade:
				; CHECK: vld2.32 {d16, d17}, [r0]
				define %struct.__neon_int32x2x2_t @test_vld2_upgrade(i8* %ptr) {
				%tmp1 = call %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2.v2i32(i8* %ptr, i32 1)
				ret %struct.__neon_int32x2x2_t %tmp1
				}

				declare %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2.v2i32(i8*, i32) nounwind readonly

				; CHECK-LABEL: test_vld3_upgrade:
				; CHECK: vld3.32 {d16, d17, d18}, [r1]
				define %struct.__neon_int32x2x3_t @test_vld3_upgrade(i8* %ptr) {
				%tmp1 = call %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3.v2i32(i8* %ptr, i32 1)
				ret %struct.__neon_int32x2x3_t %tmp1
				}

				declare %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3.v2i32(i8*, i32) nounwind readonly

				; CHECK-LABEL: test_vld4_upgrade:
				; CHECK: vld4.32 {d16, d17, d18, d19}, [r1]
				define %struct.__neon_int32x2x4_t @test_vld4_upgrade(i8* %ptr) {
				%tmp1 = call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32(i8* %ptr, i32 1)
				ret %struct.__neon_int32x2x4_t %tmp1
				}

				declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32(i8*, i32) nounwind readonly

				; vld[234]lane auto-upgrade tests

				; CHECK-LABEL: test_vld2lane_upgrade:
				; CHECK: vld2.32 {d16[1], d17[1]}, [r0]
				define %struct.__neon_int32x2x2_t @test_vld2lane_upgrade(i8* %ptr, <2 x i32> %A, <2 x i32> %B) {
				%tmp1 = call %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2lane.v2i32(i8* %ptr, <2 x i32> %A, <2 x i32> %B, i32 1, i32 1)
				ret %struct.__neon_int32x2x2_t %tmp1
				}

				declare %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2lane.v2i32(i8*, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly

				; CHECK-LABEL: test_vld3lane_upgrade:
				; CHECK: vld3.32 {d16[1], d17[1], d18[1]}, [r1]
				define %struct.__neon_int32x2x3_t @test_vld3lane_upgrade(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C) {
				%tmp1 = call %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3lane.v2i32(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C, i32 1, i32 1)
				ret %struct.__neon_int32x2x3_t %tmp1
				}

				declare %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3lane.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly

				; CHECK-LABEL: test_vld4lane_upgrade:
				; CHECK: vld4.32 {d16[1], d17[1], d18[1], d19[1]}, [r1]
				define %struct.__neon_int32x2x4_t @test_vld4lane_upgrade(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C, <2 x i32> %D) {
				%tmp1 = call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C, <2 x i32> %D, i32 1, i32 1)
				ret %struct.__neon_int32x2x4_t %tmp1
				}

				declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly

				; vst[1234] auto-upgrade tests

				; CHECK-LABEL: test_vst1_upgrade:
				; CHECK: vst1.32 {d16}, [r0]
				define void @test_vst1_upgrade(i8* %ptr, <2 x i32> %A) {
				call void @llvm.arm.neon.vst1.v2i32(i8* %ptr, <2 x i32> %A, i32 1)
				ret void
				}

				declare void @llvm.arm.neon.vst1.v2i32(i8*, <2 x i32>, i32) nounwind

				; CHECK-LABEL: test_vst2_upgrade:
				; CHECK: vst2.32 {d16, d17}, [r0]
				define void @test_vst2_upgrade(i8* %ptr, <2 x i32> %A, <2 x i32> %B) {
				call void @llvm.arm.neon.vst2.v2i32(i8* %ptr, <2 x i32> %A, <2 x i32> %B, i32 1)
				ret void
				}

				declare void @llvm.arm.neon.vst2.v2i32(i8*, <2 x i32>, <2 x i32>, i32) nounwind

				; CHECK-LABEL: test_vst3_upgrade:
				; CHECK: vst3.32 {d16, d17, d18}, [r0]
				define void @test_vst3_upgrade(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C) {
				call void @llvm.arm.neon.vst3.v2i32(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C, i32 1)
				ret void
				}

				declare void @llvm.arm.neon.vst3.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, i32) nounwind

				; CHECK-LABEL: test_vst4_upgrade:
				; CHECK: vst4.32 {d16, d17, d18, d19}, [r0]
				define void @test_vst4_upgrade(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C, <2 x i32> %D) {
				call void @llvm.arm.neon.vst4.v2i32(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C, <2 x i32> %D, i32 1)
				ret void
				}

				declare void @llvm.arm.neon.vst4.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32) nounwind

				; vst[234]lane auto-upgrade tests

				; CHECK-LABEL: test_vst2lane_upgrade:
				; CHECK: vst2.32 {d16[1], d17[1]}, [r0]
				define void @test_vst2lane_upgrade(i8* %ptr, <2 x i32> %A, <2 x i32> %B) {
				call void @llvm.arm.neon.vst2lane.v2i32(i8* %ptr, <2 x i32> %A, <2 x i32> %B, i32 1, i32 1)
				ret void
				}

				declare void @llvm.arm.neon.vst2lane.v2i32(i8*, <2 x i32>, <2 x i32>, i32, i32) nounwind

				; CHECK-LABEL: test_vst3lane_upgrade:
				; CHECK: vst3.32 {d16[1], d17[1], d18[1]}, [r0]
				define void @test_vst3lane_upgrade(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C) {
				call void @llvm.arm.neon.vst3lane.v2i32(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C, i32 1, i32 1)
				ret void
				}

				declare void @llvm.arm.neon.vst3lane.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind

				; CHECK-LABEL: test_vst4lane_upgrade:
				; CHECK: vst4.32 {d16[1], d17[1], d18[1], d19[1]}, [r0]
				define void @test_vst4lane_upgrade(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C, <2 x i32> %D) {
				call void @llvm.arm.neon.vst4lane.v2i32(i8* %ptr, <2 x i32> %A, <2 x i32> %B, <2 x i32> %C, <2 x i32> %D, i32 1, i32 1)
				ret void
				}

				declare void @llvm.arm.neon.vst4lane.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind

llvm/trunk/test/CodeGen/ARM/vld1.ll

	; RUN: llc -mtriple=arm-eabi -float-abi=soft -mattr=+neon %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -float-abi=soft -mattr=+neon %s -o - \| FileCheck %s

	; RUN: llc -mtriple=arm-eabi -float-abi=soft -mattr=+neon -regalloc=basic %s -o - \			; RUN: llc -mtriple=arm-eabi -float-abi=soft -mattr=+neon -regalloc=basic %s -o - \
	; RUN: \| FileCheck %s			; RUN: \| FileCheck %s

	define <8 x i8> @vld1i8(i8* %A) nounwind {			define <8 x i8> @vld1i8(i8* %A) nounwind {
	;CHECK-LABEL: vld1i8:			;CHECK-LABEL: vld1i8:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;CHECK: vld1.8 {d16}, [r0:64]			;CHECK: vld1.8 {d16}, [r0:64]
	%tmp1 = call <8 x i8> @llvm.arm.neon.vld1.v8i8(i8* %A, i32 16)			%tmp1 = call <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8* %A, i32 16)
	ret <8 x i8> %tmp1			ret <8 x i8> %tmp1
	}			}

	define <4 x i16> @vld1i16(i16* %A) nounwind {			define <4 x i16> @vld1i16(i16* %A) nounwind {
	;CHECK-LABEL: vld1i16:			;CHECK-LABEL: vld1i16:
	;CHECK: vld1.16			;CHECK: vld1.16
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call <4 x i16> @llvm.arm.neon.vld1.v4i16(i8* %tmp0, i32 1)			%tmp1 = call <4 x i16> @llvm.arm.neon.vld1.v4i16.p0i8(i8* %tmp0, i32 1)
	ret <4 x i16> %tmp1			ret <4 x i16> %tmp1
	}			}

	;Check for a post-increment updating load.			;Check for a post-increment updating load.
	define <4 x i16> @vld1i16_update(i16** %ptr) nounwind {			define <4 x i16> @vld1i16_update(i16** %ptr) nounwind {
	;CHECK-LABEL: vld1i16_update:			;CHECK-LABEL: vld1i16_update:
	;CHECK: vld1.16 {d16}, [{{r[0-9]+}}]!			;CHECK: vld1.16 {d16}, [{{r[0-9]+}}]!
	%A = load i16, i16* %ptr			%A = load i16, i16* %ptr
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call <4 x i16> @llvm.arm.neon.vld1.v4i16(i8* %tmp0, i32 1)			%tmp1 = call <4 x i16> @llvm.arm.neon.vld1.v4i16.p0i8(i8* %tmp0, i32 1)
	%tmp2 = getelementptr i16, i16* %A, i32 4			%tmp2 = getelementptr i16, i16* %A, i32 4
	store i16* %tmp2, i16** %ptr			store i16* %tmp2, i16** %ptr
	ret <4 x i16> %tmp1			ret <4 x i16> %tmp1
	}			}

	define <2 x i32> @vld1i32(i32* %A) nounwind {			define <2 x i32> @vld1i32(i32* %A) nounwind {
	;CHECK-LABEL: vld1i32:			;CHECK-LABEL: vld1i32:
	;CHECK: vld1.32			;CHECK: vld1.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call <2 x i32> @llvm.arm.neon.vld1.v2i32(i8* %tmp0, i32 1)			%tmp1 = call <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8* %tmp0, i32 1)
	ret <2 x i32> %tmp1			ret <2 x i32> %tmp1
	}			}

	;Check for a post-increment updating load with register increment.			;Check for a post-increment updating load with register increment.
	define <2 x i32> @vld1i32_update(i32** %ptr, i32 %inc) nounwind {			define <2 x i32> @vld1i32_update(i32** %ptr, i32 %inc) nounwind {
	;CHECK-LABEL: vld1i32_update:			;CHECK-LABEL: vld1i32_update:
	;CHECK: vld1.32 {d16}, [{{r[0-9]+}}], {{r[0-9]+}}			;CHECK: vld1.32 {d16}, [{{r[0-9]+}}], {{r[0-9]+}}
	%A = load i32, i32* %ptr			%A = load i32, i32* %ptr
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call <2 x i32> @llvm.arm.neon.vld1.v2i32(i8* %tmp0, i32 1)			%tmp1 = call <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = getelementptr i32, i32* %A, i32 %inc			%tmp2 = getelementptr i32, i32* %A, i32 %inc
	store i32* %tmp2, i32** %ptr			store i32* %tmp2, i32** %ptr
	ret <2 x i32> %tmp1			ret <2 x i32> %tmp1
	}			}

	define <2 x float> @vld1f(float* %A) nounwind {			define <2 x float> @vld1f(float* %A) nounwind {
	;CHECK-LABEL: vld1f:			;CHECK-LABEL: vld1f:
	;CHECK: vld1.32			;CHECK: vld1.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = call <2 x float> @llvm.arm.neon.vld1.v2f32(i8* %tmp0, i32 1)			%tmp1 = call <2 x float> @llvm.arm.neon.vld1.v2f32.p0i8(i8* %tmp0, i32 1)
	ret <2 x float> %tmp1			ret <2 x float> %tmp1
	}			}

	define <1 x i64> @vld1i64(i64* %A) nounwind {			define <1 x i64> @vld1i64(i64* %A) nounwind {
	;CHECK-LABEL: vld1i64:			;CHECK-LABEL: vld1i64:
	;CHECK: vld1.64			;CHECK: vld1.64
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = call <1 x i64> @llvm.arm.neon.vld1.v1i64(i8* %tmp0, i32 1)			%tmp1 = call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8* %tmp0, i32 1)
	ret <1 x i64> %tmp1			ret <1 x i64> %tmp1
	}			}

	define <16 x i8> @vld1Qi8(i8* %A) nounwind {			define <16 x i8> @vld1Qi8(i8* %A) nounwind {
	;CHECK-LABEL: vld1Qi8:			;CHECK-LABEL: vld1Qi8:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vld1.8 {d16, d17}, [r0:64]			;CHECK: vld1.8 {d16, d17}, [r0:64]
	%tmp1 = call <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* %A, i32 8)			%tmp1 = call <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* %A, i32 8)
	ret <16 x i8> %tmp1			ret <16 x i8> %tmp1
	}			}

	;Check for a post-increment updating load.			;Check for a post-increment updating load.
	define <16 x i8> @vld1Qi8_update(i8** %ptr) nounwind {			define <16 x i8> @vld1Qi8_update(i8** %ptr) nounwind {
	;CHECK-LABEL: vld1Qi8_update:			;CHECK-LABEL: vld1Qi8_update:
	;CHECK: vld1.8 {d16, d17}, [{{r[0-9]+}}:64]!			;CHECK: vld1.8 {d16, d17}, [{{r[0-9]+}}:64]!
	%A = load i8, i8* %ptr			%A = load i8, i8* %ptr
	%tmp1 = call <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* %A, i32 8)			%tmp1 = call <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* %A, i32 8)
	%tmp2 = getelementptr i8, i8* %A, i32 16			%tmp2 = getelementptr i8, i8* %A, i32 16
	store i8* %tmp2, i8** %ptr			store i8* %tmp2, i8** %ptr
	ret <16 x i8> %tmp1			ret <16 x i8> %tmp1
	}			}

	define <8 x i16> @vld1Qi16(i16* %A) nounwind {			define <8 x i16> @vld1Qi16(i16* %A) nounwind {
	;CHECK-LABEL: vld1Qi16:			;CHECK-LABEL: vld1Qi16:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vld1.16 {d16, d17}, [r0:128]			;CHECK: vld1.16 {d16, d17}, [r0:128]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call <8 x i16> @llvm.arm.neon.vld1.v8i16(i8* %tmp0, i32 32)			%tmp1 = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8* %tmp0, i32 32)
	ret <8 x i16> %tmp1			ret <8 x i16> %tmp1
	}			}

	define <4 x i32> @vld1Qi32(i32* %A) nounwind {			define <4 x i32> @vld1Qi32(i32* %A) nounwind {
	;CHECK-LABEL: vld1Qi32:			;CHECK-LABEL: vld1Qi32:
	;CHECK: vld1.32			;CHECK: vld1.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call <4 x i32> @llvm.arm.neon.vld1.v4i32(i8* %tmp0, i32 1)			%tmp1 = call <4 x i32> @llvm.arm.neon.vld1.v4i32.p0i8(i8* %tmp0, i32 1)
	ret <4 x i32> %tmp1			ret <4 x i32> %tmp1
	}			}

	define <4 x float> @vld1Qf(float* %A) nounwind {			define <4 x float> @vld1Qf(float* %A) nounwind {
	;CHECK-LABEL: vld1Qf:			;CHECK-LABEL: vld1Qf:
	;CHECK: vld1.32			;CHECK: vld1.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %tmp0, i32 1)			%tmp1 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* %tmp0, i32 1)
	ret <4 x float> %tmp1			ret <4 x float> %tmp1
	}			}

	define <2 x i64> @vld1Qi64(i64* %A) nounwind {			define <2 x i64> @vld1Qi64(i64* %A) nounwind {
	;CHECK-LABEL: vld1Qi64:			;CHECK-LABEL: vld1Qi64:
	;CHECK: vld1.64			;CHECK: vld1.64
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = call <2 x i64> @llvm.arm.neon.vld1.v2i64(i8* %tmp0, i32 1)			%tmp1 = call <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8* %tmp0, i32 1)
	ret <2 x i64> %tmp1			ret <2 x i64> %tmp1
	}			}

	define <2 x double> @vld1Qf64(double* %A) nounwind {			define <2 x double> @vld1Qf64(double* %A) nounwind {
	;CHECK-LABEL: vld1Qf64:			;CHECK-LABEL: vld1Qf64:
	;CHECK: vld1.64			;CHECK: vld1.64
	%tmp0 = bitcast double* %A to i8*			%tmp0 = bitcast double* %A to i8*
	%tmp1 = call <2 x double> @llvm.arm.neon.vld1.v2f64(i8* %tmp0, i32 1)			%tmp1 = call <2 x double> @llvm.arm.neon.vld1.v2f64.p0i8(i8* %tmp0, i32 1)
	ret <2 x double> %tmp1			ret <2 x double> %tmp1
	}			}

	declare <8 x i8> @llvm.arm.neon.vld1.v8i8(i8*, i32) nounwind readonly			declare <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8*, i32) nounwind readonly
	declare <4 x i16> @llvm.arm.neon.vld1.v4i16(i8*, i32) nounwind readonly			declare <4 x i16> @llvm.arm.neon.vld1.v4i16.p0i8(i8*, i32) nounwind readonly
	declare <2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32) nounwind readonly			declare <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32) nounwind readonly
	declare <2 x float> @llvm.arm.neon.vld1.v2f32(i8*, i32) nounwind readonly			declare <2 x float> @llvm.arm.neon.vld1.v2f32.p0i8(i8*, i32) nounwind readonly
	declare <1 x i64> @llvm.arm.neon.vld1.v1i64(i8*, i32) nounwind readonly			declare <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8*, i32) nounwind readonly

	declare <16 x i8> @llvm.arm.neon.vld1.v16i8(i8*, i32) nounwind readonly			declare <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8*, i32) nounwind readonly
	declare <8 x i16> @llvm.arm.neon.vld1.v8i16(i8*, i32) nounwind readonly			declare <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8*, i32) nounwind readonly
	declare <4 x i32> @llvm.arm.neon.vld1.v4i32(i8*, i32) nounwind readonly			declare <4 x i32> @llvm.arm.neon.vld1.v4i32.p0i8(i8*, i32) nounwind readonly
	declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly			declare <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8*, i32) nounwind readonly
	declare <2 x i64> @llvm.arm.neon.vld1.v2i64(i8*, i32) nounwind readonly			declare <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8*, i32) nounwind readonly
	declare <2 x double> @llvm.arm.neon.vld1.v2f64(i8*, i32) nounwind readonly			declare <2 x double> @llvm.arm.neon.vld1.v2f64.p0i8(i8*, i32) nounwind readonly

	; Radar 8355607			; Radar 8355607
	; Do not crash if the vld1 result is not used.			; Do not crash if the vld1 result is not used.
	define void @unused_vld1_result() {			define void @unused_vld1_result() {
	entry:			entry:
	%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1)			%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1)
	call void @llvm.trap()			call void @llvm.trap()
	unreachable			unreachable
	}			}

	declare void @llvm.trap() nounwind			declare void @llvm.trap() nounwind

llvm/trunk/test/CodeGen/ARM/vld2.ll

	Show All 9 Lines
	%struct.__neon_int16x8x2_t = type { <8 x i16>, <8 x i16> }			%struct.__neon_int16x8x2_t = type { <8 x i16>, <8 x i16> }
	%struct.__neon_int32x4x2_t = type { <4 x i32>, <4 x i32> }			%struct.__neon_int32x4x2_t = type { <4 x i32>, <4 x i32> }
	%struct.__neon_float32x4x2_t = type { <4 x float>, <4 x float> }			%struct.__neon_float32x4x2_t = type { <4 x float>, <4 x float> }

	define <8 x i8> @vld2i8(i8* %A) nounwind {			define <8 x i8> @vld2i8(i8* %A) nounwind {
	;CHECK-LABEL: vld2i8:			;CHECK-LABEL: vld2i8:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vld2.8 {d16, d17}, [r0:64]			;CHECK: vld2.8 {d16, d17}, [r0:64]
	%tmp1 = call %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2.v8i8(i8* %A, i32 8)			%tmp1 = call %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2.v8i8.p0i8(i8* %A, i32 8)
	%tmp2 = extractvalue %struct.__neon_int8x8x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int8x8x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int8x8x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_int8x8x2_t %tmp1, 1
	%tmp4 = add <8 x i8> %tmp2, %tmp3			%tmp4 = add <8 x i8> %tmp2, %tmp3
	ret <8 x i8> %tmp4			ret <8 x i8> %tmp4
	}			}

	define <4 x i16> @vld2i16(i16* %A) nounwind {			define <4 x i16> @vld2i16(i16* %A) nounwind {
	;CHECK-LABEL: vld2i16:			;CHECK-LABEL: vld2i16:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vld2.16 {d16, d17}, [r0:128]			;CHECK: vld2.16 {d16, d17}, [r0:128]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call %struct.__neon_int16x4x2_t @llvm.arm.neon.vld2.v4i16(i8* %tmp0, i32 32)			%tmp1 = call %struct.__neon_int16x4x2_t @llvm.arm.neon.vld2.v4i16.p0i8(i8* %tmp0, i32 32)
	%tmp2 = extractvalue %struct.__neon_int16x4x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int16x4x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int16x4x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_int16x4x2_t %tmp1, 1
	%tmp4 = add <4 x i16> %tmp2, %tmp3			%tmp4 = add <4 x i16> %tmp2, %tmp3
	ret <4 x i16> %tmp4			ret <4 x i16> %tmp4
	}			}

	define <2 x i32> @vld2i32(i32* %A) nounwind {			define <2 x i32> @vld2i32(i32* %A) nounwind {
	;CHECK-LABEL: vld2i32:			;CHECK-LABEL: vld2i32:
	;CHECK: vld2.32			;CHECK: vld2.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2.v2i32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2.v2i32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_int32x2x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int32x2x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int32x2x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_int32x2x2_t %tmp1, 1
	%tmp4 = add <2 x i32> %tmp2, %tmp3			%tmp4 = add <2 x i32> %tmp2, %tmp3
	ret <2 x i32> %tmp4			ret <2 x i32> %tmp4
	}			}

	define <2 x float> @vld2f(float* %A) nounwind {			define <2 x float> @vld2f(float* %A) nounwind {
	;CHECK-LABEL: vld2f:			;CHECK-LABEL: vld2f:
	;CHECK: vld2.32			;CHECK: vld2.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = call %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2.v2f32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2.v2f32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_float32x2x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_float32x2x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_float32x2x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_float32x2x2_t %tmp1, 1
	%tmp4 = fadd <2 x float> %tmp2, %tmp3			%tmp4 = fadd <2 x float> %tmp2, %tmp3
	ret <2 x float> %tmp4			ret <2 x float> %tmp4
	}			}

	;Check for a post-increment updating load.			;Check for a post-increment updating load.
	define <2 x float> @vld2f_update(float** %ptr) nounwind {			define <2 x float> @vld2f_update(float** %ptr) nounwind {
	;CHECK-LABEL: vld2f_update:			;CHECK-LABEL: vld2f_update:
	;CHECK: vld2.32 {d16, d17}, [r1]!			;CHECK: vld2.32 {d16, d17}, [r1]!
	%A = load float, float* %ptr			%A = load float, float* %ptr
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = call %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2.v2f32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2.v2f32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_float32x2x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_float32x2x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_float32x2x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_float32x2x2_t %tmp1, 1
	%tmp4 = fadd <2 x float> %tmp2, %tmp3			%tmp4 = fadd <2 x float> %tmp2, %tmp3
	%tmp5 = getelementptr float, float* %A, i32 4			%tmp5 = getelementptr float, float* %A, i32 4
	store float* %tmp5, float** %ptr			store float* %tmp5, float** %ptr
	ret <2 x float> %tmp4			ret <2 x float> %tmp4
	}			}

	define <1 x i64> @vld2i64(i64* %A) nounwind {			define <1 x i64> @vld2i64(i64* %A) nounwind {
	;CHECK-LABEL: vld2i64:			;CHECK-LABEL: vld2i64:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vld1.64 {d16, d17}, [r0:128]			;CHECK: vld1.64 {d16, d17}, [r0:128]
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = call %struct.__neon_int64x1x2_t @llvm.arm.neon.vld2.v1i64(i8* %tmp0, i32 32)			%tmp1 = call %struct.__neon_int64x1x2_t @llvm.arm.neon.vld2.v1i64.p0i8(i8* %tmp0, i32 32)
	%tmp2 = extractvalue %struct.__neon_int64x1x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int64x1x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int64x1x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_int64x1x2_t %tmp1, 1
	%tmp4 = add <1 x i64> %tmp2, %tmp3			%tmp4 = add <1 x i64> %tmp2, %tmp3
	ret <1 x i64> %tmp4			ret <1 x i64> %tmp4
	}			}

	define <16 x i8> @vld2Qi8(i8* %A) nounwind {			define <16 x i8> @vld2Qi8(i8* %A) nounwind {
	;CHECK-LABEL: vld2Qi8:			;CHECK-LABEL: vld2Qi8:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vld2.8 {d16, d17, d18, d19}, [r0:64]			;CHECK: vld2.8 {d16, d17, d18, d19}, [r0:64]
	%tmp1 = call %struct.__neon_int8x16x2_t @llvm.arm.neon.vld2.v16i8(i8* %A, i32 8)			%tmp1 = call %struct.__neon_int8x16x2_t @llvm.arm.neon.vld2.v16i8.p0i8(i8* %A, i32 8)
	%tmp2 = extractvalue %struct.__neon_int8x16x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int8x16x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int8x16x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_int8x16x2_t %tmp1, 1
	%tmp4 = add <16 x i8> %tmp2, %tmp3			%tmp4 = add <16 x i8> %tmp2, %tmp3
	ret <16 x i8> %tmp4			ret <16 x i8> %tmp4
	}			}

	;Check for a post-increment updating load with register increment.			;Check for a post-increment updating load with register increment.
	define <16 x i8> @vld2Qi8_update(i8** %ptr, i32 %inc) nounwind {			define <16 x i8> @vld2Qi8_update(i8** %ptr, i32 %inc) nounwind {
	;CHECK-LABEL: vld2Qi8_update:			;CHECK-LABEL: vld2Qi8_update:
	;CHECK: vld2.8 {d16, d17, d18, d19}, [r2:128], r1			;CHECK: vld2.8 {d16, d17, d18, d19}, [r2:128], r1
	%A = load i8, i8* %ptr			%A = load i8, i8* %ptr
	%tmp1 = call %struct.__neon_int8x16x2_t @llvm.arm.neon.vld2.v16i8(i8* %A, i32 16)			%tmp1 = call %struct.__neon_int8x16x2_t @llvm.arm.neon.vld2.v16i8.p0i8(i8* %A, i32 16)
	%tmp2 = extractvalue %struct.__neon_int8x16x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int8x16x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int8x16x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_int8x16x2_t %tmp1, 1
	%tmp4 = add <16 x i8> %tmp2, %tmp3			%tmp4 = add <16 x i8> %tmp2, %tmp3
	%tmp5 = getelementptr i8, i8* %A, i32 %inc			%tmp5 = getelementptr i8, i8* %A, i32 %inc
	store i8* %tmp5, i8** %ptr			store i8* %tmp5, i8** %ptr
	ret <16 x i8> %tmp4			ret <16 x i8> %tmp4
	}			}

	define <8 x i16> @vld2Qi16(i16* %A) nounwind {			define <8 x i16> @vld2Qi16(i16* %A) nounwind {
	;CHECK-LABEL: vld2Qi16:			;CHECK-LABEL: vld2Qi16:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vld2.16 {d16, d17, d18, d19}, [r0:128]			;CHECK: vld2.16 {d16, d17, d18, d19}, [r0:128]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2.v8i16(i8* %tmp0, i32 16)			%tmp1 = call %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2.v8i16.p0i8(i8* %tmp0, i32 16)
	%tmp2 = extractvalue %struct.__neon_int16x8x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int16x8x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int16x8x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_int16x8x2_t %tmp1, 1
	%tmp4 = add <8 x i16> %tmp2, %tmp3			%tmp4 = add <8 x i16> %tmp2, %tmp3
	ret <8 x i16> %tmp4			ret <8 x i16> %tmp4
	}			}

	define <4 x i32> @vld2Qi32(i32* %A) nounwind {			define <4 x i32> @vld2Qi32(i32* %A) nounwind {
	;CHECK-LABEL: vld2Qi32:			;CHECK-LABEL: vld2Qi32:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vld2.32 {d16, d17, d18, d19}, [r0:256]			;CHECK: vld2.32 {d16, d17, d18, d19}, [r0:256]
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32(i8* %tmp0, i32 64)			%tmp1 = call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32.p0i8(i8* %tmp0, i32 64)
	%tmp2 = extractvalue %struct.__neon_int32x4x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int32x4x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int32x4x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_int32x4x2_t %tmp1, 1
	%tmp4 = add <4 x i32> %tmp2, %tmp3			%tmp4 = add <4 x i32> %tmp2, %tmp3
	ret <4 x i32> %tmp4			ret <4 x i32> %tmp4
	}			}

	define <4 x float> @vld2Qf(float* %A) nounwind {			define <4 x float> @vld2Qf(float* %A) nounwind {
	;CHECK-LABEL: vld2Qf:			;CHECK-LABEL: vld2Qf:
	;CHECK: vld2.32			;CHECK: vld2.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = call %struct.__neon_float32x4x2_t @llvm.arm.neon.vld2.v4f32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_float32x4x2_t @llvm.arm.neon.vld2.v4f32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_float32x4x2_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_float32x4x2_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_float32x4x2_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_float32x4x2_t %tmp1, 1
	%tmp4 = fadd <4 x float> %tmp2, %tmp3			%tmp4 = fadd <4 x float> %tmp2, %tmp3
	ret <4 x float> %tmp4			ret <4 x float> %tmp4
	}			}

	declare %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2.v8i8(i8*, i32) nounwind readonly			declare %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2.v8i8.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int16x4x2_t @llvm.arm.neon.vld2.v4i16(i8*, i32) nounwind readonly			declare %struct.__neon_int16x4x2_t @llvm.arm.neon.vld2.v4i16.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2.v2i32(i8*, i32) nounwind readonly			declare %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2.v2i32.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2.v2f32(i8*, i32) nounwind readonly			declare %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2.v2f32.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int64x1x2_t @llvm.arm.neon.vld2.v1i64(i8*, i32) nounwind readonly			declare %struct.__neon_int64x1x2_t @llvm.arm.neon.vld2.v1i64.p0i8(i8*, i32) nounwind readonly

	declare %struct.__neon_int8x16x2_t @llvm.arm.neon.vld2.v16i8(i8*, i32) nounwind readonly			declare %struct.__neon_int8x16x2_t @llvm.arm.neon.vld2.v16i8.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2.v8i16(i8*, i32) nounwind readonly			declare %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2.v8i16.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32(i8*, i32) nounwind readonly			declare %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2.v4i32.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_float32x4x2_t @llvm.arm.neon.vld2.v4f32(i8*, i32) nounwind readonly			declare %struct.__neon_float32x4x2_t @llvm.arm.neon.vld2.v4f32.p0i8(i8*, i32) nounwind readonly

llvm/trunk/test/CodeGen/ARM/vld3.ll

	Show All 10 Lines
	%struct.__neon_int16x8x3_t = type { <8 x i16>, <8 x i16>, <8 x i16> }			%struct.__neon_int16x8x3_t = type { <8 x i16>, <8 x i16>, <8 x i16> }
	%struct.__neon_int32x4x3_t = type { <4 x i32>, <4 x i32>, <4 x i32> }			%struct.__neon_int32x4x3_t = type { <4 x i32>, <4 x i32>, <4 x i32> }
	%struct.__neon_float32x4x3_t = type { <4 x float>, <4 x float>, <4 x float> }			%struct.__neon_float32x4x3_t = type { <4 x float>, <4 x float>, <4 x float> }

	define <8 x i8> @vld3i8(i8* %A) nounwind {			define <8 x i8> @vld3i8(i8* %A) nounwind {
	;CHECK-LABEL: vld3i8:			;CHECK-LABEL: vld3i8:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;CHECK: vld3.8 {d16, d17, d18}, [r0:64]			;CHECK: vld3.8 {d16, d17, d18}, [r0:64]
	%tmp1 = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8* %A, i32 32)			%tmp1 = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8* %A, i32 32)
	%tmp2 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int8x8x3_t %tmp1, 2
	%tmp4 = add <8 x i8> %tmp2, %tmp3			%tmp4 = add <8 x i8> %tmp2, %tmp3
	ret <8 x i8> %tmp4			ret <8 x i8> %tmp4
	}			}

	define <4 x i16> @vld3i16(i16* %A) nounwind {			define <4 x i16> @vld3i16(i16* %A) nounwind {
	;CHECK-LABEL: vld3i16:			;CHECK-LABEL: vld3i16:
	;CHECK: vld3.16			;CHECK: vld3.16
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3.v4i16(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3.v4i16.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_int16x4x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int16x4x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int16x4x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int16x4x3_t %tmp1, 2
	%tmp4 = add <4 x i16> %tmp2, %tmp3			%tmp4 = add <4 x i16> %tmp2, %tmp3
	ret <4 x i16> %tmp4			ret <4 x i16> %tmp4
	}			}

	;Check for a post-increment updating load with register increment.			;Check for a post-increment updating load with register increment.
	define <4 x i16> @vld3i16_update(i16** %ptr, i32 %inc) nounwind {			define <4 x i16> @vld3i16_update(i16** %ptr, i32 %inc) nounwind {
	;CHECK-LABEL: vld3i16_update:			;CHECK-LABEL: vld3i16_update:
	;CHECK: vld3.16 {d16, d17, d18}, [{{r[0-9]+}}], {{r[0-9]+}}			;CHECK: vld3.16 {d16, d17, d18}, [{{r[0-9]+}}], {{r[0-9]+}}
	%A = load i16, i16* %ptr			%A = load i16, i16* %ptr
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3.v4i16(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3.v4i16.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_int16x4x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int16x4x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int16x4x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int16x4x3_t %tmp1, 2
	%tmp4 = add <4 x i16> %tmp2, %tmp3			%tmp4 = add <4 x i16> %tmp2, %tmp3
	%tmp5 = getelementptr i16, i16* %A, i32 %inc			%tmp5 = getelementptr i16, i16* %A, i32 %inc
	store i16* %tmp5, i16** %ptr			store i16* %tmp5, i16** %ptr
	ret <4 x i16> %tmp4			ret <4 x i16> %tmp4
	}			}

	define <2 x i32> @vld3i32(i32* %A) nounwind {			define <2 x i32> @vld3i32(i32* %A) nounwind {
	;CHECK-LABEL: vld3i32:			;CHECK-LABEL: vld3i32:
	;CHECK: vld3.32			;CHECK: vld3.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3.v2i32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3.v2i32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_int32x2x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int32x2x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int32x2x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int32x2x3_t %tmp1, 2
	%tmp4 = add <2 x i32> %tmp2, %tmp3			%tmp4 = add <2 x i32> %tmp2, %tmp3
	ret <2 x i32> %tmp4			ret <2 x i32> %tmp4
	}			}

	define <2 x float> @vld3f(float* %A) nounwind {			define <2 x float> @vld3f(float* %A) nounwind {
	;CHECK-LABEL: vld3f:			;CHECK-LABEL: vld3f:
	;CHECK: vld3.32			;CHECK: vld3.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = call %struct.__neon_float32x2x3_t @llvm.arm.neon.vld3.v2f32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_float32x2x3_t @llvm.arm.neon.vld3.v2f32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_float32x2x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_float32x2x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_float32x2x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_float32x2x3_t %tmp1, 2
	%tmp4 = fadd <2 x float> %tmp2, %tmp3			%tmp4 = fadd <2 x float> %tmp2, %tmp3
	ret <2 x float> %tmp4			ret <2 x float> %tmp4
	}			}

	define <1 x i64> @vld3i64(i64* %A) nounwind {			define <1 x i64> @vld3i64(i64* %A) nounwind {
	;CHECK-LABEL: vld3i64:			;CHECK-LABEL: vld3i64:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;CHECK: vld1.64 {d16, d17, d18}, [r0:64]			;CHECK: vld1.64 {d16, d17, d18}, [r0:64]
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = call %struct.__neon_int64x1x3_t @llvm.arm.neon.vld3.v1i64(i8* %tmp0, i32 16)			%tmp1 = call %struct.__neon_int64x1x3_t @llvm.arm.neon.vld3.v1i64.p0i8(i8* %tmp0, i32 16)
	%tmp2 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 2
	%tmp4 = add <1 x i64> %tmp2, %tmp3			%tmp4 = add <1 x i64> %tmp2, %tmp3
	ret <1 x i64> %tmp4			ret <1 x i64> %tmp4
	}			}

	define <1 x i64> @vld3i64_update(i64** %ptr, i64* %A) nounwind {			define <1 x i64> @vld3i64_update(i64** %ptr, i64* %A) nounwind {
	;CHECK-LABEL: vld3i64_update:			;CHECK-LABEL: vld3i64_update:
	;CHECK: vld1.64 {d16, d17, d18}, [r1:64]!			;CHECK: vld1.64 {d16, d17, d18}, [r1:64]!
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = call %struct.__neon_int64x1x3_t @llvm.arm.neon.vld3.v1i64(i8* %tmp0, i32 16)			%tmp1 = call %struct.__neon_int64x1x3_t @llvm.arm.neon.vld3.v1i64.p0i8(i8* %tmp0, i32 16)
	%tmp5 = getelementptr i64, i64* %A, i32 3			%tmp5 = getelementptr i64, i64* %A, i32 3
	store i64* %tmp5, i64** %ptr			store i64* %tmp5, i64** %ptr
	%tmp2 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 2
	%tmp4 = add <1 x i64> %tmp2, %tmp3			%tmp4 = add <1 x i64> %tmp2, %tmp3
	ret <1 x i64> %tmp4			ret <1 x i64> %tmp4
	}			}

	define <16 x i8> @vld3Qi8(i8* %A) nounwind {			define <16 x i8> @vld3Qi8(i8* %A) nounwind {
	;CHECK-LABEL: vld3Qi8:			;CHECK-LABEL: vld3Qi8:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;CHECK: vld3.8 {d16, d18, d20}, [r0:64]!			;CHECK: vld3.8 {d16, d18, d20}, [r0:64]!
	;CHECK: vld3.8 {d17, d19, d21}, [r0:64]			;CHECK: vld3.8 {d17, d19, d21}, [r0:64]
	%tmp1 = call %struct.__neon_int8x16x3_t @llvm.arm.neon.vld3.v16i8(i8* %A, i32 32)			%tmp1 = call %struct.__neon_int8x16x3_t @llvm.arm.neon.vld3.v16i8.p0i8(i8* %A, i32 32)
	%tmp2 = extractvalue %struct.__neon_int8x16x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int8x16x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int8x16x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int8x16x3_t %tmp1, 2
	%tmp4 = add <16 x i8> %tmp2, %tmp3			%tmp4 = add <16 x i8> %tmp2, %tmp3
	ret <16 x i8> %tmp4			ret <16 x i8> %tmp4
	}			}

	define <8 x i16> @vld3Qi16(i16* %A) nounwind {			define <8 x i16> @vld3Qi16(i16* %A) nounwind {
	;CHECK-LABEL: vld3Qi16:			;CHECK-LABEL: vld3Qi16:
	;CHECK: vld3.16			;CHECK: vld3.16
	;CHECK: vld3.16			;CHECK: vld3.16
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3.v8i16(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3.v8i16.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_int16x8x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int16x8x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int16x8x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int16x8x3_t %tmp1, 2
	%tmp4 = add <8 x i16> %tmp2, %tmp3			%tmp4 = add <8 x i16> %tmp2, %tmp3
	ret <8 x i16> %tmp4			ret <8 x i16> %tmp4
	}			}

	define <4 x i32> @vld3Qi32(i32* %A) nounwind {			define <4 x i32> @vld3Qi32(i32* %A) nounwind {
	;CHECK-LABEL: vld3Qi32:			;CHECK-LABEL: vld3Qi32:
	;CHECK: vld3.32			;CHECK: vld3.32
	;CHECK: vld3.32			;CHECK: vld3.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3.v4i32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3.v4i32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_int32x4x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int32x4x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int32x4x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int32x4x3_t %tmp1, 2
	%tmp4 = add <4 x i32> %tmp2, %tmp3			%tmp4 = add <4 x i32> %tmp2, %tmp3
	ret <4 x i32> %tmp4			ret <4 x i32> %tmp4
	}			}

	;Check for a post-increment updating load.			;Check for a post-increment updating load.
	define <4 x i32> @vld3Qi32_update(i32** %ptr) nounwind {			define <4 x i32> @vld3Qi32_update(i32** %ptr) nounwind {
	;CHECK-LABEL: vld3Qi32_update:			;CHECK-LABEL: vld3Qi32_update:
	;CHECK: vld3.32 {d16, d18, d20}, [r[[R:[0-9]+]]]!			;CHECK: vld3.32 {d16, d18, d20}, [r[[R:[0-9]+]]]!
	;CHECK: vld3.32 {d17, d19, d21}, [r[[R]]]!			;CHECK: vld3.32 {d17, d19, d21}, [r[[R]]]!
	%A = load i32, i32* %ptr			%A = load i32, i32* %ptr
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3.v4i32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3.v4i32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_int32x4x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int32x4x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int32x4x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int32x4x3_t %tmp1, 2
	%tmp4 = add <4 x i32> %tmp2, %tmp3			%tmp4 = add <4 x i32> %tmp2, %tmp3
	%tmp5 = getelementptr i32, i32* %A, i32 12			%tmp5 = getelementptr i32, i32* %A, i32 12
	store i32* %tmp5, i32** %ptr			store i32* %tmp5, i32** %ptr
	ret <4 x i32> %tmp4			ret <4 x i32> %tmp4
	}			}

	define <4 x float> @vld3Qf(float* %A) nounwind {			define <4 x float> @vld3Qf(float* %A) nounwind {
	;CHECK-LABEL: vld3Qf:			;CHECK-LABEL: vld3Qf:
	;CHECK: vld3.32			;CHECK: vld3.32
	;CHECK: vld3.32			;CHECK: vld3.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = call %struct.__neon_float32x4x3_t @llvm.arm.neon.vld3.v4f32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_float32x4x3_t @llvm.arm.neon.vld3.v4f32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_float32x4x3_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_float32x4x3_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_float32x4x3_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_float32x4x3_t %tmp1, 2
	%tmp4 = fadd <4 x float> %tmp2, %tmp3			%tmp4 = fadd <4 x float> %tmp2, %tmp3
	ret <4 x float> %tmp4			ret <4 x float> %tmp4
	}			}

	declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8(i8*, i32) nounwind readonly			declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3.v8i8.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3.v4i16(i8*, i32) nounwind readonly			declare %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3.v4i16.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3.v2i32(i8*, i32) nounwind readonly			declare %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3.v2i32.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_float32x2x3_t @llvm.arm.neon.vld3.v2f32(i8*, i32) nounwind readonly			declare %struct.__neon_float32x2x3_t @llvm.arm.neon.vld3.v2f32.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int64x1x3_t @llvm.arm.neon.vld3.v1i64(i8*, i32) nounwind readonly			declare %struct.__neon_int64x1x3_t @llvm.arm.neon.vld3.v1i64.p0i8(i8*, i32) nounwind readonly

	declare %struct.__neon_int8x16x3_t @llvm.arm.neon.vld3.v16i8(i8*, i32) nounwind readonly			declare %struct.__neon_int8x16x3_t @llvm.arm.neon.vld3.v16i8.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3.v8i16(i8*, i32) nounwind readonly			declare %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3.v8i16.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3.v4i32(i8*, i32) nounwind readonly			declare %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3.v4i32.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_float32x4x3_t @llvm.arm.neon.vld3.v4f32(i8*, i32) nounwind readonly			declare %struct.__neon_float32x4x3_t @llvm.arm.neon.vld3.v4f32.p0i8(i8*, i32) nounwind readonly

llvm/trunk/test/CodeGen/ARM/vld4.ll

	Show All 9 Lines
	%struct.__neon_int16x8x4_t = type { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }			%struct.__neon_int16x8x4_t = type { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }
	%struct.__neon_int32x4x4_t = type { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> }			%struct.__neon_int32x4x4_t = type { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> }
	%struct.__neon_float32x4x4_t = type { <4 x float>, <4 x float>, <4 x float>, <4 x float> }			%struct.__neon_float32x4x4_t = type { <4 x float>, <4 x float>, <4 x float>, <4 x float> }

	define <8 x i8> @vld4i8(i8* %A) nounwind {			define <8 x i8> @vld4i8(i8* %A) nounwind {
	;CHECK-LABEL: vld4i8:			;CHECK-LABEL: vld4i8:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vld4.8 {d16, d17, d18, d19}, [r0:64]			;CHECK: vld4.8 {d16, d17, d18, d19}, [r0:64]
	%tmp1 = call %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4.v8i8(i8* %A, i32 8)			%tmp1 = call %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4.v8i8.p0i8(i8* %A, i32 8)
	%tmp2 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 2
	%tmp4 = add <8 x i8> %tmp2, %tmp3			%tmp4 = add <8 x i8> %tmp2, %tmp3
	ret <8 x i8> %tmp4			ret <8 x i8> %tmp4
	}			}

	;Check for a post-increment updating load with register increment.			;Check for a post-increment updating load with register increment.
	define <8 x i8> @vld4i8_update(i8** %ptr, i32 %inc) nounwind {			define <8 x i8> @vld4i8_update(i8** %ptr, i32 %inc) nounwind {
	;CHECK-LABEL: vld4i8_update:			;CHECK-LABEL: vld4i8_update:
	;CHECK: vld4.8 {d16, d17, d18, d19}, [r2:128], r1			;CHECK: vld4.8 {d16, d17, d18, d19}, [r2:128], r1
	%A = load i8, i8* %ptr			%A = load i8, i8* %ptr
	%tmp1 = call %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4.v8i8(i8* %A, i32 16)			%tmp1 = call %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4.v8i8.p0i8(i8* %A, i32 16)
	%tmp2 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 2
	%tmp4 = add <8 x i8> %tmp2, %tmp3			%tmp4 = add <8 x i8> %tmp2, %tmp3
	%tmp5 = getelementptr i8, i8* %A, i32 %inc			%tmp5 = getelementptr i8, i8* %A, i32 %inc
	store i8* %tmp5, i8** %ptr			store i8* %tmp5, i8** %ptr
	ret <8 x i8> %tmp4			ret <8 x i8> %tmp4
	}			}

	define <4 x i16> @vld4i16(i16* %A) nounwind {			define <4 x i16> @vld4i16(i16* %A) nounwind {
	;CHECK-LABEL: vld4i16:			;CHECK-LABEL: vld4i16:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vld4.16 {d16, d17, d18, d19}, [r0:128]			;CHECK: vld4.16 {d16, d17, d18, d19}, [r0:128]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4.v4i16(i8* %tmp0, i32 16)			%tmp1 = call %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4.v4i16.p0i8(i8* %tmp0, i32 16)
	%tmp2 = extractvalue %struct.__neon_int16x4x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int16x4x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int16x4x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int16x4x4_t %tmp1, 2
	%tmp4 = add <4 x i16> %tmp2, %tmp3			%tmp4 = add <4 x i16> %tmp2, %tmp3
	ret <4 x i16> %tmp4			ret <4 x i16> %tmp4
	}			}

	define <2 x i32> @vld4i32(i32* %A) nounwind {			define <2 x i32> @vld4i32(i32* %A) nounwind {
	;CHECK-LABEL: vld4i32:			;CHECK-LABEL: vld4i32:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vld4.32 {d16, d17, d18, d19}, [r0:256]			;CHECK: vld4.32 {d16, d17, d18, d19}, [r0:256]
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32(i8* %tmp0, i32 32)			%tmp1 = call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32.p0i8(i8* %tmp0, i32 32)
	%tmp2 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 2
	%tmp4 = add <2 x i32> %tmp2, %tmp3			%tmp4 = add <2 x i32> %tmp2, %tmp3
	ret <2 x i32> %tmp4			ret <2 x i32> %tmp4
	}			}

	define <2 x float> @vld4f(float* %A) nounwind {			define <2 x float> @vld4f(float* %A) nounwind {
	;CHECK-LABEL: vld4f:			;CHECK-LABEL: vld4f:
	;CHECK: vld4.32			;CHECK: vld4.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = call %struct.__neon_float32x2x4_t @llvm.arm.neon.vld4.v2f32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_float32x2x4_t @llvm.arm.neon.vld4.v2f32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_float32x2x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_float32x2x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_float32x2x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_float32x2x4_t %tmp1, 2
	%tmp4 = fadd <2 x float> %tmp2, %tmp3			%tmp4 = fadd <2 x float> %tmp2, %tmp3
	ret <2 x float> %tmp4			ret <2 x float> %tmp4
	}			}

	define <1 x i64> @vld4i64(i64* %A) nounwind {			define <1 x i64> @vld4i64(i64* %A) nounwind {
	;CHECK-LABEL: vld4i64:			;CHECK-LABEL: vld4i64:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vld1.64 {d16, d17, d18, d19}, [r0:256]			;CHECK: vld1.64 {d16, d17, d18, d19}, [r0:256]
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = call %struct.__neon_int64x1x4_t @llvm.arm.neon.vld4.v1i64(i8* %tmp0, i32 64)			%tmp1 = call %struct.__neon_int64x1x4_t @llvm.arm.neon.vld4.v1i64.p0i8(i8* %tmp0, i32 64)
	%tmp2 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 2
	%tmp4 = add <1 x i64> %tmp2, %tmp3			%tmp4 = add <1 x i64> %tmp2, %tmp3
	ret <1 x i64> %tmp4			ret <1 x i64> %tmp4
	}			}

	define <1 x i64> @vld4i64_update(i64** %ptr, i64* %A) nounwind {			define <1 x i64> @vld4i64_update(i64** %ptr, i64* %A) nounwind {
	;CHECK-LABEL: vld4i64_update:			;CHECK-LABEL: vld4i64_update:
	;CHECK: vld1.64 {d16, d17, d18, d19}, [r1:256]!			;CHECK: vld1.64 {d16, d17, d18, d19}, [r1:256]!
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = call %struct.__neon_int64x1x4_t @llvm.arm.neon.vld4.v1i64(i8* %tmp0, i32 64)			%tmp1 = call %struct.__neon_int64x1x4_t @llvm.arm.neon.vld4.v1i64.p0i8(i8* %tmp0, i32 64)
	%tmp5 = getelementptr i64, i64* %A, i32 4			%tmp5 = getelementptr i64, i64* %A, i32 4
	store i64* %tmp5, i64** %ptr			store i64* %tmp5, i64** %ptr
	%tmp2 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 2
	%tmp4 = add <1 x i64> %tmp2, %tmp3			%tmp4 = add <1 x i64> %tmp2, %tmp3
	ret <1 x i64> %tmp4			ret <1 x i64> %tmp4
	}			}

	define <16 x i8> @vld4Qi8(i8* %A) nounwind {			define <16 x i8> @vld4Qi8(i8* %A) nounwind {
	;CHECK-LABEL: vld4Qi8:			;CHECK-LABEL: vld4Qi8:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vld4.8 {d16, d18, d20, d22}, [r0:256]!			;CHECK: vld4.8 {d16, d18, d20, d22}, [r0:256]!
	;CHECK: vld4.8 {d17, d19, d21, d23}, [r0:256]			;CHECK: vld4.8 {d17, d19, d21, d23}, [r0:256]
	%tmp1 = call %struct.__neon_int8x16x4_t @llvm.arm.neon.vld4.v16i8(i8* %A, i32 64)			%tmp1 = call %struct.__neon_int8x16x4_t @llvm.arm.neon.vld4.v16i8.p0i8(i8* %A, i32 64)
	%tmp2 = extractvalue %struct.__neon_int8x16x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int8x16x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int8x16x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int8x16x4_t %tmp1, 2
	%tmp4 = add <16 x i8> %tmp2, %tmp3			%tmp4 = add <16 x i8> %tmp2, %tmp3
	ret <16 x i8> %tmp4			ret <16 x i8> %tmp4
	}			}

	define <8 x i16> @vld4Qi16(i16* %A) nounwind {			define <8 x i16> @vld4Qi16(i16* %A) nounwind {
	;CHECK-LABEL: vld4Qi16:			;CHECK-LABEL: vld4Qi16:
	;Check for no alignment specifier.			;Check for no alignment specifier.
	;CHECK: vld4.16 {d16, d18, d20, d22}, [r0]!			;CHECK: vld4.16 {d16, d18, d20, d22}, [r0]!
	;CHECK: vld4.16 {d17, d19, d21, d23}, [r0]			;CHECK: vld4.16 {d17, d19, d21, d23}, [r0]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4.v8i16(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4.v8i16.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_int16x8x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int16x8x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int16x8x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int16x8x4_t %tmp1, 2
	%tmp4 = add <8 x i16> %tmp2, %tmp3			%tmp4 = add <8 x i16> %tmp2, %tmp3
	ret <8 x i16> %tmp4			ret <8 x i16> %tmp4
	}			}

	;Check for a post-increment updating load.			;Check for a post-increment updating load.
	define <8 x i16> @vld4Qi16_update(i16** %ptr) nounwind {			define <8 x i16> @vld4Qi16_update(i16** %ptr) nounwind {
	;CHECK-LABEL: vld4Qi16_update:			;CHECK-LABEL: vld4Qi16_update:
	;CHECK: vld4.16 {d16, d18, d20, d22}, [r1:64]!			;CHECK: vld4.16 {d16, d18, d20, d22}, [r1:64]!
	;CHECK: vld4.16 {d17, d19, d21, d23}, [r1:64]!			;CHECK: vld4.16 {d17, d19, d21, d23}, [r1:64]!
	%A = load i16, i16* %ptr			%A = load i16, i16* %ptr
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = call %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4.v8i16(i8* %tmp0, i32 8)			%tmp1 = call %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4.v8i16.p0i8(i8* %tmp0, i32 8)
	%tmp2 = extractvalue %struct.__neon_int16x8x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int16x8x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int16x8x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int16x8x4_t %tmp1, 2
	%tmp4 = add <8 x i16> %tmp2, %tmp3			%tmp4 = add <8 x i16> %tmp2, %tmp3
	%tmp5 = getelementptr i16, i16* %A, i32 32			%tmp5 = getelementptr i16, i16* %A, i32 32
	store i16* %tmp5, i16** %ptr			store i16* %tmp5, i16** %ptr
	ret <8 x i16> %tmp4			ret <8 x i16> %tmp4
	}			}

	define <4 x i32> @vld4Qi32(i32* %A) nounwind {			define <4 x i32> @vld4Qi32(i32* %A) nounwind {
	;CHECK-LABEL: vld4Qi32:			;CHECK-LABEL: vld4Qi32:
	;CHECK: vld4.32			;CHECK: vld4.32
	;CHECK: vld4.32			;CHECK: vld4.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = call %struct.__neon_int32x4x4_t @llvm.arm.neon.vld4.v4i32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_int32x4x4_t @llvm.arm.neon.vld4.v4i32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_int32x4x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int32x4x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int32x4x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_int32x4x4_t %tmp1, 2
	%tmp4 = add <4 x i32> %tmp2, %tmp3			%tmp4 = add <4 x i32> %tmp2, %tmp3
	ret <4 x i32> %tmp4			ret <4 x i32> %tmp4
	}			}

	define <4 x float> @vld4Qf(float* %A) nounwind {			define <4 x float> @vld4Qf(float* %A) nounwind {
	;CHECK-LABEL: vld4Qf:			;CHECK-LABEL: vld4Qf:
	;CHECK: vld4.32			;CHECK: vld4.32
	;CHECK: vld4.32			;CHECK: vld4.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = call %struct.__neon_float32x4x4_t @llvm.arm.neon.vld4.v4f32(i8* %tmp0, i32 1)			%tmp1 = call %struct.__neon_float32x4x4_t @llvm.arm.neon.vld4.v4f32.p0i8(i8* %tmp0, i32 1)
	%tmp2 = extractvalue %struct.__neon_float32x4x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_float32x4x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_float32x4x4_t %tmp1, 2			%tmp3 = extractvalue %struct.__neon_float32x4x4_t %tmp1, 2
	%tmp4 = fadd <4 x float> %tmp2, %tmp3			%tmp4 = fadd <4 x float> %tmp2, %tmp3
	ret <4 x float> %tmp4			ret <4 x float> %tmp4
	}			}

	declare %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4.v8i8(i8*, i32) nounwind readonly			declare %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4.v8i8.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4.v4i16(i8*, i32) nounwind readonly			declare %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4.v4i16.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32(i8*, i32) nounwind readonly			declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_float32x2x4_t @llvm.arm.neon.vld4.v2f32(i8*, i32) nounwind readonly			declare %struct.__neon_float32x2x4_t @llvm.arm.neon.vld4.v2f32.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int64x1x4_t @llvm.arm.neon.vld4.v1i64(i8*, i32) nounwind readonly			declare %struct.__neon_int64x1x4_t @llvm.arm.neon.vld4.v1i64.p0i8(i8*, i32) nounwind readonly

	declare %struct.__neon_int8x16x4_t @llvm.arm.neon.vld4.v16i8(i8*, i32) nounwind readonly			declare %struct.__neon_int8x16x4_t @llvm.arm.neon.vld4.v16i8.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4.v8i16(i8*, i32) nounwind readonly			declare %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4.v8i16.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_int32x4x4_t @llvm.arm.neon.vld4.v4i32(i8*, i32) nounwind readonly			declare %struct.__neon_int32x4x4_t @llvm.arm.neon.vld4.v4i32.p0i8(i8*, i32) nounwind readonly
	declare %struct.__neon_float32x4x4_t @llvm.arm.neon.vld4.v4f32(i8*, i32) nounwind readonly			declare %struct.__neon_float32x4x4_t @llvm.arm.neon.vld4.v4f32.p0i8(i8*, i32) nounwind readonly

llvm/trunk/test/CodeGen/ARM/vlddup.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	%struct.__neon_int8x8x2_t = type { <8 x i8>, <8 x i8> }			%struct.__neon_int8x8x2_t = type { <8 x i8>, <8 x i8> }
	%struct.__neon_int4x16x2_t = type { <4 x i16>, <4 x i16> }			%struct.__neon_int4x16x2_t = type { <4 x i16>, <4 x i16> }
	%struct.__neon_int2x32x2_t = type { <2 x i32>, <2 x i32> }			%struct.__neon_int2x32x2_t = type { <2 x i32>, <2 x i32> }

	define <8 x i8> @vld2dupi8(i8* %A) nounwind {			define <8 x i8> @vld2dupi8(i8* %A) nounwind {
	;CHECK-LABEL: vld2dupi8:			;CHECK-LABEL: vld2dupi8:
	;Check the (default) alignment value.			;Check the (default) alignment value.
	;CHECK: vld2.8 {d16[], d17[]}, [r0]			;CHECK: vld2.8 {d16[], d17[]}, [r0]
	%tmp0 = tail call %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8(i8* %A, <8 x i8> undef, <8 x i8> undef, i32 0, i32 1)			%tmp0 = tail call %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8.p0i8(i8* %A, <8 x i8> undef, <8 x i8> undef, i32 0, i32 1)
	%tmp1 = extractvalue %struct.__neon_int8x8x2_t %tmp0, 0			%tmp1 = extractvalue %struct.__neon_int8x8x2_t %tmp0, 0
	%tmp2 = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer			%tmp2 = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer
	%tmp3 = extractvalue %struct.__neon_int8x8x2_t %tmp0, 1			%tmp3 = extractvalue %struct.__neon_int8x8x2_t %tmp0, 1
	%tmp4 = shufflevector <8 x i8> %tmp3, <8 x i8> undef, <8 x i32> zeroinitializer			%tmp4 = shufflevector <8 x i8> %tmp3, <8 x i8> undef, <8 x i32> zeroinitializer
	%tmp5 = add <8 x i8> %tmp2, %tmp4			%tmp5 = add <8 x i8> %tmp2, %tmp4
	ret <8 x i8> %tmp5			ret <8 x i8> %tmp5
	}			}

	define <4 x i16> @vld2dupi16(i8* %A) nounwind {			define <4 x i16> @vld2dupi16(i8* %A) nounwind {
	;CHECK-LABEL: vld2dupi16:			;CHECK-LABEL: vld2dupi16:
	;Check that a power-of-two alignment smaller than the total size of the memory			;Check that a power-of-two alignment smaller than the total size of the memory
	;being loaded is ignored.			;being loaded is ignored.
	;CHECK: vld2.16 {d16[], d17[]}, [r0]			;CHECK: vld2.16 {d16[], d17[]}, [r0]
	%tmp0 = tail call %struct.__neon_int4x16x2_t @llvm.arm.neon.vld2lane.v4i16(i8* %A, <4 x i16> undef, <4 x i16> undef, i32 0, i32 2)			%tmp0 = tail call %struct.__neon_int4x16x2_t @llvm.arm.neon.vld2lane.v4i16.p0i8(i8* %A, <4 x i16> undef, <4 x i16> undef, i32 0, i32 2)
	%tmp1 = extractvalue %struct.__neon_int4x16x2_t %tmp0, 0			%tmp1 = extractvalue %struct.__neon_int4x16x2_t %tmp0, 0
	%tmp2 = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp2 = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp3 = extractvalue %struct.__neon_int4x16x2_t %tmp0, 1			%tmp3 = extractvalue %struct.__neon_int4x16x2_t %tmp0, 1
	%tmp4 = shufflevector <4 x i16> %tmp3, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp4 = shufflevector <4 x i16> %tmp3, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp5 = add <4 x i16> %tmp2, %tmp4			%tmp5 = add <4 x i16> %tmp2, %tmp4
	ret <4 x i16> %tmp5			ret <4 x i16> %tmp5
	}			}

	;Check for a post-increment updating load.			;Check for a post-increment updating load.
	define <4 x i16> @vld2dupi16_update(i16** %ptr) nounwind {			define <4 x i16> @vld2dupi16_update(i16** %ptr) nounwind {
	;CHECK-LABEL: vld2dupi16_update:			;CHECK-LABEL: vld2dupi16_update:
	;CHECK: vld2.16 {d16[], d17[]}, [r1]!			;CHECK: vld2.16 {d16[], d17[]}, [r1]!
	%A = load i16, i16* %ptr			%A = load i16, i16* %ptr
	%A2 = bitcast i16* %A to i8*			%A2 = bitcast i16* %A to i8*
	%tmp0 = tail call %struct.__neon_int4x16x2_t @llvm.arm.neon.vld2lane.v4i16(i8* %A2, <4 x i16> undef, <4 x i16> undef, i32 0, i32 2)			%tmp0 = tail call %struct.__neon_int4x16x2_t @llvm.arm.neon.vld2lane.v4i16.p0i8(i8* %A2, <4 x i16> undef, <4 x i16> undef, i32 0, i32 2)
	%tmp1 = extractvalue %struct.__neon_int4x16x2_t %tmp0, 0			%tmp1 = extractvalue %struct.__neon_int4x16x2_t %tmp0, 0
	%tmp2 = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp2 = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp3 = extractvalue %struct.__neon_int4x16x2_t %tmp0, 1			%tmp3 = extractvalue %struct.__neon_int4x16x2_t %tmp0, 1
	%tmp4 = shufflevector <4 x i16> %tmp3, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp4 = shufflevector <4 x i16> %tmp3, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp5 = add <4 x i16> %tmp2, %tmp4			%tmp5 = add <4 x i16> %tmp2, %tmp4
	%tmp6 = getelementptr i16, i16* %A, i32 2			%tmp6 = getelementptr i16, i16* %A, i32 2
	store i16* %tmp6, i16** %ptr			store i16* %tmp6, i16** %ptr
	ret <4 x i16> %tmp5			ret <4 x i16> %tmp5
	}			}

	define <2 x i32> @vld2dupi32(i8* %A) nounwind {			define <2 x i32> @vld2dupi32(i8* %A) nounwind {
	;CHECK-LABEL: vld2dupi32:			;CHECK-LABEL: vld2dupi32:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;CHECK: vld2.32 {d16[], d17[]}, [r0:64]			;CHECK: vld2.32 {d16[], d17[]}, [r0:64]
	%tmp0 = tail call %struct.__neon_int2x32x2_t @llvm.arm.neon.vld2lane.v2i32(i8* %A, <2 x i32> undef, <2 x i32> undef, i32 0, i32 16)			%tmp0 = tail call %struct.__neon_int2x32x2_t @llvm.arm.neon.vld2lane.v2i32.p0i8(i8* %A, <2 x i32> undef, <2 x i32> undef, i32 0, i32 16)
	%tmp1 = extractvalue %struct.__neon_int2x32x2_t %tmp0, 0			%tmp1 = extractvalue %struct.__neon_int2x32x2_t %tmp0, 0
	%tmp2 = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer			%tmp2 = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer
	%tmp3 = extractvalue %struct.__neon_int2x32x2_t %tmp0, 1			%tmp3 = extractvalue %struct.__neon_int2x32x2_t %tmp0, 1
	%tmp4 = shufflevector <2 x i32> %tmp3, <2 x i32> undef, <2 x i32> zeroinitializer			%tmp4 = shufflevector <2 x i32> %tmp3, <2 x i32> undef, <2 x i32> zeroinitializer
	%tmp5 = add <2 x i32> %tmp2, %tmp4			%tmp5 = add <2 x i32> %tmp2, %tmp4
	ret <2 x i32> %tmp5			ret <2 x i32> %tmp5
	}			}

	declare %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8(i8*, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly			declare %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8.p0i8(i8*, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly
	declare %struct.__neon_int4x16x2_t @llvm.arm.neon.vld2lane.v4i16(i8*, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly			declare %struct.__neon_int4x16x2_t @llvm.arm.neon.vld2lane.v4i16.p0i8(i8*, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly
	declare %struct.__neon_int2x32x2_t @llvm.arm.neon.vld2lane.v2i32(i8*, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly			declare %struct.__neon_int2x32x2_t @llvm.arm.neon.vld2lane.v2i32.p0i8(i8*, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly

	%struct.__neon_int8x8x3_t = type { <8 x i8>, <8 x i8>, <8 x i8> }			%struct.__neon_int8x8x3_t = type { <8 x i8>, <8 x i8>, <8 x i8> }
	%struct.__neon_int16x4x3_t = type { <4 x i16>, <4 x i16>, <4 x i16> }			%struct.__neon_int16x4x3_t = type { <4 x i16>, <4 x i16>, <4 x i16> }

	;Check for a post-increment updating load with register increment.			;Check for a post-increment updating load with register increment.
	define <8 x i8> @vld3dupi8_update(i8** %ptr, i32 %inc) nounwind {			define <8 x i8> @vld3dupi8_update(i8** %ptr, i32 %inc) nounwind {
	;CHECK-LABEL: vld3dupi8_update:			;CHECK-LABEL: vld3dupi8_update:
	;CHECK: vld3.8 {d16[], d17[], d18[]}, [r2], r1			;CHECK: vld3.8 {d16[], d17[], d18[]}, [r2], r1
	%A = load i8, i8* %ptr			%A = load i8, i8* %ptr
	%tmp0 = tail call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3lane.v8i8(i8* %A, <8 x i8> undef, <8 x i8> undef, <8 x i8> undef, i32 0, i32 8)			%tmp0 = tail call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3lane.v8i8.p0i8(i8* %A, <8 x i8> undef, <8 x i8> undef, <8 x i8> undef, i32 0, i32 8)
	%tmp1 = extractvalue %struct.__neon_int8x8x3_t %tmp0, 0			%tmp1 = extractvalue %struct.__neon_int8x8x3_t %tmp0, 0
	%tmp2 = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer			%tmp2 = shufflevector <8 x i8> %tmp1, <8 x i8> undef, <8 x i32> zeroinitializer
	%tmp3 = extractvalue %struct.__neon_int8x8x3_t %tmp0, 1			%tmp3 = extractvalue %struct.__neon_int8x8x3_t %tmp0, 1
	%tmp4 = shufflevector <8 x i8> %tmp3, <8 x i8> undef, <8 x i32> zeroinitializer			%tmp4 = shufflevector <8 x i8> %tmp3, <8 x i8> undef, <8 x i32> zeroinitializer
	%tmp5 = extractvalue %struct.__neon_int8x8x3_t %tmp0, 2			%tmp5 = extractvalue %struct.__neon_int8x8x3_t %tmp0, 2
	%tmp6 = shufflevector <8 x i8> %tmp5, <8 x i8> undef, <8 x i32> zeroinitializer			%tmp6 = shufflevector <8 x i8> %tmp5, <8 x i8> undef, <8 x i32> zeroinitializer
	%tmp7 = add <8 x i8> %tmp2, %tmp4			%tmp7 = add <8 x i8> %tmp2, %tmp4
	%tmp8 = add <8 x i8> %tmp7, %tmp6			%tmp8 = add <8 x i8> %tmp7, %tmp6
	%tmp9 = getelementptr i8, i8* %A, i32 %inc			%tmp9 = getelementptr i8, i8* %A, i32 %inc
	store i8* %tmp9, i8** %ptr			store i8* %tmp9, i8** %ptr
	ret <8 x i8> %tmp8			ret <8 x i8> %tmp8
	}			}

	define <4 x i16> @vld3dupi16(i8* %A) nounwind {			define <4 x i16> @vld3dupi16(i8* %A) nounwind {
	;CHECK-LABEL: vld3dupi16:			;CHECK-LABEL: vld3dupi16:
	;Check the (default) alignment value. VLD3 does not support alignment.			;Check the (default) alignment value. VLD3 does not support alignment.
	;CHECK: vld3.16 {d16[], d17[], d18[]}, [r0]			;CHECK: vld3.16 {d16[], d17[], d18[]}, [r0]
	%tmp0 = tail call %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3lane.v4i16(i8* %A, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, i32 0, i32 8)			%tmp0 = tail call %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3lane.v4i16.p0i8(i8* %A, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, i32 0, i32 8)
	%tmp1 = extractvalue %struct.__neon_int16x4x3_t %tmp0, 0			%tmp1 = extractvalue %struct.__neon_int16x4x3_t %tmp0, 0
	%tmp2 = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp2 = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp3 = extractvalue %struct.__neon_int16x4x3_t %tmp0, 1			%tmp3 = extractvalue %struct.__neon_int16x4x3_t %tmp0, 1
	%tmp4 = shufflevector <4 x i16> %tmp3, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp4 = shufflevector <4 x i16> %tmp3, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp5 = extractvalue %struct.__neon_int16x4x3_t %tmp0, 2			%tmp5 = extractvalue %struct.__neon_int16x4x3_t %tmp0, 2
	%tmp6 = shufflevector <4 x i16> %tmp5, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp6 = shufflevector <4 x i16> %tmp5, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp7 = add <4 x i16> %tmp2, %tmp4			%tmp7 = add <4 x i16> %tmp2, %tmp4
	%tmp8 = add <4 x i16> %tmp7, %tmp6			%tmp8 = add <4 x i16> %tmp7, %tmp6
	ret <4 x i16> %tmp8			ret <4 x i16> %tmp8
	}			}

	declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3lane.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly			declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3lane.v8i8.p0i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly
	declare %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3lane.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly			declare %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3lane.v4i16.p0i8(i8*, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly

	%struct.__neon_int16x4x4_t = type { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }			%struct.__neon_int16x4x4_t = type { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }
	%struct.__neon_int32x2x4_t = type { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> }			%struct.__neon_int32x2x4_t = type { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> }

	;Check for a post-increment updating load.			;Check for a post-increment updating load.
	define <4 x i16> @vld4dupi16_update(i16** %ptr) nounwind {			define <4 x i16> @vld4dupi16_update(i16** %ptr) nounwind {
	;CHECK-LABEL: vld4dupi16_update:			;CHECK-LABEL: vld4dupi16_update:
	;CHECK: vld4.16 {d16[], d17[], d18[], d19[]}, [r1]!			;CHECK: vld4.16 {d16[], d17[], d18[], d19[]}, [r1]!
	%A = load i16, i16* %ptr			%A = load i16, i16* %ptr
	%A2 = bitcast i16* %A to i8*			%A2 = bitcast i16* %A to i8*
	%tmp0 = tail call %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4lane.v4i16(i8* %A2, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, i32 0, i32 1)			%tmp0 = tail call %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4lane.v4i16.p0i8(i8* %A2, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, i32 0, i32 1)
	%tmp1 = extractvalue %struct.__neon_int16x4x4_t %tmp0, 0			%tmp1 = extractvalue %struct.__neon_int16x4x4_t %tmp0, 0
	%tmp2 = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp2 = shufflevector <4 x i16> %tmp1, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp3 = extractvalue %struct.__neon_int16x4x4_t %tmp0, 1			%tmp3 = extractvalue %struct.__neon_int16x4x4_t %tmp0, 1
	%tmp4 = shufflevector <4 x i16> %tmp3, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp4 = shufflevector <4 x i16> %tmp3, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp5 = extractvalue %struct.__neon_int16x4x4_t %tmp0, 2			%tmp5 = extractvalue %struct.__neon_int16x4x4_t %tmp0, 2
	%tmp6 = shufflevector <4 x i16> %tmp5, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp6 = shufflevector <4 x i16> %tmp5, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp7 = extractvalue %struct.__neon_int16x4x4_t %tmp0, 3			%tmp7 = extractvalue %struct.__neon_int16x4x4_t %tmp0, 3
	%tmp8 = shufflevector <4 x i16> %tmp7, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp8 = shufflevector <4 x i16> %tmp7, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp9 = add <4 x i16> %tmp2, %tmp4			%tmp9 = add <4 x i16> %tmp2, %tmp4
	%tmp10 = add <4 x i16> %tmp6, %tmp8			%tmp10 = add <4 x i16> %tmp6, %tmp8
	%tmp11 = add <4 x i16> %tmp9, %tmp10			%tmp11 = add <4 x i16> %tmp9, %tmp10
	%tmp12 = getelementptr i16, i16* %A, i32 4			%tmp12 = getelementptr i16, i16* %A, i32 4
	store i16* %tmp12, i16** %ptr			store i16* %tmp12, i16** %ptr
	ret <4 x i16> %tmp11			ret <4 x i16> %tmp11
	}			}

	define <2 x i32> @vld4dupi32(i8* %A) nounwind {			define <2 x i32> @vld4dupi32(i8* %A) nounwind {
	;CHECK-LABEL: vld4dupi32:			;CHECK-LABEL: vld4dupi32:
	;Check the alignment value. An 8-byte alignment is allowed here even though			;Check the alignment value. An 8-byte alignment is allowed here even though
	;it is smaller than the total size of the memory being loaded.			;it is smaller than the total size of the memory being loaded.
	;CHECK: vld4.32 {d16[], d17[], d18[], d19[]}, [r0:64]			;CHECK: vld4.32 {d16[], d17[], d18[], d19[]}, [r0:64]
	%tmp0 = tail call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32(i8* %A, <2 x i32> undef, <2 x i32> undef, <2 x i32> undef, <2 x i32> undef, i32 0, i32 8)			%tmp0 = tail call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32.p0i8(i8* %A, <2 x i32> undef, <2 x i32> undef, <2 x i32> undef, <2 x i32> undef, i32 0, i32 8)
	%tmp1 = extractvalue %struct.__neon_int32x2x4_t %tmp0, 0			%tmp1 = extractvalue %struct.__neon_int32x2x4_t %tmp0, 0
	%tmp2 = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer			%tmp2 = shufflevector <2 x i32> %tmp1, <2 x i32> undef, <2 x i32> zeroinitializer
	%tmp3 = extractvalue %struct.__neon_int32x2x4_t %tmp0, 1			%tmp3 = extractvalue %struct.__neon_int32x2x4_t %tmp0, 1
	%tmp4 = shufflevector <2 x i32> %tmp3, <2 x i32> undef, <2 x i32> zeroinitializer			%tmp4 = shufflevector <2 x i32> %tmp3, <2 x i32> undef, <2 x i32> zeroinitializer
	%tmp5 = extractvalue %struct.__neon_int32x2x4_t %tmp0, 2			%tmp5 = extractvalue %struct.__neon_int32x2x4_t %tmp0, 2
	%tmp6 = shufflevector <2 x i32> %tmp5, <2 x i32> undef, <2 x i32> zeroinitializer			%tmp6 = shufflevector <2 x i32> %tmp5, <2 x i32> undef, <2 x i32> zeroinitializer
	%tmp7 = extractvalue %struct.__neon_int32x2x4_t %tmp0, 3			%tmp7 = extractvalue %struct.__neon_int32x2x4_t %tmp0, 3
	%tmp8 = shufflevector <2 x i32> %tmp7, <2 x i32> undef, <2 x i32> zeroinitializer			%tmp8 = shufflevector <2 x i32> %tmp7, <2 x i32> undef, <2 x i32> zeroinitializer
	%tmp9 = add <2 x i32> %tmp2, %tmp4			%tmp9 = add <2 x i32> %tmp2, %tmp4
	%tmp10 = add <2 x i32> %tmp6, %tmp8			%tmp10 = add <2 x i32> %tmp6, %tmp8
	%tmp11 = add <2 x i32> %tmp9, %tmp10			%tmp11 = add <2 x i32> %tmp9, %tmp10
	ret <2 x i32> %tmp11			ret <2 x i32> %tmp11
	}			}

	declare %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4lane.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly			declare %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4lane.v4i16.p0i8(i8*, <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly
	declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly			declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32.p0i8(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly

llvm/trunk/test/CodeGen/ARM/vldlane.ll

	Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	%struct.__neon_int32x4x2_t = type { <4 x i32>, <4 x i32> }			%struct.__neon_int32x4x2_t = type { <4 x i32>, <4 x i32> }
	%struct.__neon_float32x4x2_t = type { <4 x float>, <4 x float> }			%struct.__neon_float32x4x2_t = type { <4 x float>, <4 x float> }

	define <8 x i8> @vld2lanei8(i8* %A, <8 x i8>* %B) nounwind {			define <8 x i8> @vld2lanei8(i8* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vld2lanei8:			;CHECK-LABEL: vld2lanei8:
	;Check the alignment value. Max for this instruction is 16 bits:			;Check the alignment value. Max for this instruction is 16 bits:
	;CHECK: vld2.8 {d16[1], d17[1]}, [r0:16]			;CHECK: vld2.8 {d16[1], d17[1]}, [r0:16]
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	%tmp2 = call %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 4)			%tmp2 = call %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8.p0i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 4)
	%tmp3 = extractvalue %struct.__neon_int8x8x2_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int8x8x2_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int8x8x2_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int8x8x2_t %tmp2, 1
	%tmp5 = add <8 x i8> %tmp3, %tmp4			%tmp5 = add <8 x i8> %tmp3, %tmp4
	ret <8 x i8> %tmp5			ret <8 x i8> %tmp5
	}			}

	define <4 x i16> @vld2lanei16(i16* %A, <4 x i16>* %B) nounwind {			define <4 x i16> @vld2lanei16(i16* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vld2lanei16:			;CHECK-LABEL: vld2lanei16:
	;Check the alignment value. Max for this instruction is 32 bits:			;Check the alignment value. Max for this instruction is 32 bits:
	;CHECK: vld2.16 {d16[1], d17[1]}, [r0:32]			;CHECK: vld2.16 {d16[1], d17[1]}, [r0:32]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <4 x i16>, <4 x i16>* %B			%tmp1 = load <4 x i16>, <4 x i16>* %B
	%tmp2 = call %struct.__neon_int16x4x2_t @llvm.arm.neon.vld2lane.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 8)			%tmp2 = call %struct.__neon_int16x4x2_t @llvm.arm.neon.vld2lane.v4i16.p0i8(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 8)
	%tmp3 = extractvalue %struct.__neon_int16x4x2_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int16x4x2_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int16x4x2_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int16x4x2_t %tmp2, 1
	%tmp5 = add <4 x i16> %tmp3, %tmp4			%tmp5 = add <4 x i16> %tmp3, %tmp4
	ret <4 x i16> %tmp5			ret <4 x i16> %tmp5
	}			}

	define <2 x i32> @vld2lanei32(i32* %A, <2 x i32>* %B) nounwind {			define <2 x i32> @vld2lanei32(i32* %A, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vld2lanei32:			;CHECK-LABEL: vld2lanei32:
	;CHECK: vld2.32			;CHECK: vld2.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <2 x i32>, <2 x i32>* %B			%tmp1 = load <2 x i32>, <2 x i32>* %B
	%tmp2 = call %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2lane.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2lane.v2i32.p0i8(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_int32x2x2_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int32x2x2_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int32x2x2_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int32x2x2_t %tmp2, 1
	%tmp5 = add <2 x i32> %tmp3, %tmp4			%tmp5 = add <2 x i32> %tmp3, %tmp4
	ret <2 x i32> %tmp5			ret <2 x i32> %tmp5
	}			}

	;Check for a post-increment updating load.			;Check for a post-increment updating load.
	define <2 x i32> @vld2lanei32_update(i32** %ptr, <2 x i32>* %B) nounwind {			define <2 x i32> @vld2lanei32_update(i32** %ptr, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vld2lanei32_update:			;CHECK-LABEL: vld2lanei32_update:
	;CHECK: vld2.32 {d16[1], d17[1]}, [{{r[0-9]+}}]!			;CHECK: vld2.32 {d16[1], d17[1]}, [{{r[0-9]+}}]!
	%A = load i32, i32* %ptr			%A = load i32, i32* %ptr
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <2 x i32>, <2 x i32>* %B			%tmp1 = load <2 x i32>, <2 x i32>* %B
	%tmp2 = call %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2lane.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2lane.v2i32.p0i8(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_int32x2x2_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int32x2x2_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int32x2x2_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int32x2x2_t %tmp2, 1
	%tmp5 = add <2 x i32> %tmp3, %tmp4			%tmp5 = add <2 x i32> %tmp3, %tmp4
	%tmp6 = getelementptr i32, i32* %A, i32 2			%tmp6 = getelementptr i32, i32* %A, i32 2
	store i32* %tmp6, i32** %ptr			store i32* %tmp6, i32** %ptr
	ret <2 x i32> %tmp5			ret <2 x i32> %tmp5
	}			}

	define <2 x float> @vld2lanef(float* %A, <2 x float>* %B) nounwind {			define <2 x float> @vld2lanef(float* %A, <2 x float>* %B) nounwind {
	;CHECK-LABEL: vld2lanef:			;CHECK-LABEL: vld2lanef:
	;CHECK: vld2.32			;CHECK: vld2.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <2 x float>, <2 x float>* %B			%tmp1 = load <2 x float>, <2 x float>* %B
	%tmp2 = call %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2lane.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2lane.v2f32.p0i8(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_float32x2x2_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_float32x2x2_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_float32x2x2_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_float32x2x2_t %tmp2, 1
	%tmp5 = fadd <2 x float> %tmp3, %tmp4			%tmp5 = fadd <2 x float> %tmp3, %tmp4
	ret <2 x float> %tmp5			ret <2 x float> %tmp5
	}			}

	define <8 x i16> @vld2laneQi16(i16* %A, <8 x i16>* %B) nounwind {			define <8 x i16> @vld2laneQi16(i16* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vld2laneQi16:			;CHECK-LABEL: vld2laneQi16:
	;Check the (default) alignment.			;Check the (default) alignment.
	;CHECK: vld2.16 {d17[1], d19[1]}, [{{r[0-9]+}}]			;CHECK: vld2.16 {d17[1], d19[1]}, [{{r[0-9]+}}]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	%tmp2 = call %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2lane.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 5, i32 1)			%tmp2 = call %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2lane.v8i16.p0i8(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 5, i32 1)
	%tmp3 = extractvalue %struct.__neon_int16x8x2_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int16x8x2_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int16x8x2_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int16x8x2_t %tmp2, 1
	%tmp5 = add <8 x i16> %tmp3, %tmp4			%tmp5 = add <8 x i16> %tmp3, %tmp4
	ret <8 x i16> %tmp5			ret <8 x i16> %tmp5
	}			}

	define <4 x i32> @vld2laneQi32(i32* %A, <4 x i32>* %B) nounwind {			define <4 x i32> @vld2laneQi32(i32* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vld2laneQi32:			;CHECK-LABEL: vld2laneQi32:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;CHECK: vld2.32 {d17[0], d19[0]}, [{{r[0-9]+}}:64]			;CHECK: vld2.32 {d17[0], d19[0]}, [{{r[0-9]+}}:64]
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <4 x i32>, <4 x i32>* %B			%tmp1 = load <4 x i32>, <4 x i32>* %B
	%tmp2 = call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2lane.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 2, i32 16)			%tmp2 = call %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2lane.v4i32.p0i8(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 2, i32 16)
	%tmp3 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int32x4x2_t %tmp2, 1
	%tmp5 = add <4 x i32> %tmp3, %tmp4			%tmp5 = add <4 x i32> %tmp3, %tmp4
	ret <4 x i32> %tmp5			ret <4 x i32> %tmp5
	}			}

	define <4 x float> @vld2laneQf(float* %A, <4 x float>* %B) nounwind {			define <4 x float> @vld2laneQf(float* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vld2laneQf:			;CHECK-LABEL: vld2laneQf:
	;CHECK: vld2.32			;CHECK: vld2.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <4 x float>, <4 x float>* %B			%tmp1 = load <4 x float>, <4 x float>* %B
	%tmp2 = call %struct.__neon_float32x4x2_t @llvm.arm.neon.vld2lane.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_float32x4x2_t @llvm.arm.neon.vld2lane.v4f32.p0i8(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_float32x4x2_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_float32x4x2_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_float32x4x2_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_float32x4x2_t %tmp2, 1
	%tmp5 = fadd <4 x float> %tmp3, %tmp4			%tmp5 = fadd <4 x float> %tmp3, %tmp4
	ret <4 x float> %tmp5			ret <4 x float> %tmp5
	}			}

	declare %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8(i8*, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly			declare %struct.__neon_int8x8x2_t @llvm.arm.neon.vld2lane.v8i8.p0i8(i8*, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly
	declare %struct.__neon_int16x4x2_t @llvm.arm.neon.vld2lane.v4i16(i8*, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly			declare %struct.__neon_int16x4x2_t @llvm.arm.neon.vld2lane.v4i16.p0i8(i8*, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly
	declare %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2lane.v2i32(i8*, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly			declare %struct.__neon_int32x2x2_t @llvm.arm.neon.vld2lane.v2i32.p0i8(i8*, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly
	declare %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2lane.v2f32(i8*, <2 x float>, <2 x float>, i32, i32) nounwind readonly			declare %struct.__neon_float32x2x2_t @llvm.arm.neon.vld2lane.v2f32.p0i8(i8*, <2 x float>, <2 x float>, i32, i32) nounwind readonly

	declare %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2lane.v8i16(i8*, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly			declare %struct.__neon_int16x8x2_t @llvm.arm.neon.vld2lane.v8i16.p0i8(i8*, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly
	declare %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2lane.v4i32(i8*, <4 x i32>, <4 x i32>, i32, i32) nounwind readonly			declare %struct.__neon_int32x4x2_t @llvm.arm.neon.vld2lane.v4i32.p0i8(i8*, <4 x i32>, <4 x i32>, i32, i32) nounwind readonly
	declare %struct.__neon_float32x4x2_t @llvm.arm.neon.vld2lane.v4f32(i8*, <4 x float>, <4 x float>, i32, i32) nounwind readonly			declare %struct.__neon_float32x4x2_t @llvm.arm.neon.vld2lane.v4f32.p0i8(i8*, <4 x float>, <4 x float>, i32, i32) nounwind readonly

	%struct.__neon_int8x8x3_t = type { <8 x i8>, <8 x i8>, <8 x i8> }			%struct.__neon_int8x8x3_t = type { <8 x i8>, <8 x i8>, <8 x i8> }
	%struct.__neon_int16x4x3_t = type { <4 x i16>, <4 x i16>, <4 x i16> }			%struct.__neon_int16x4x3_t = type { <4 x i16>, <4 x i16>, <4 x i16> }
	%struct.__neon_int32x2x3_t = type { <2 x i32>, <2 x i32>, <2 x i32> }			%struct.__neon_int32x2x3_t = type { <2 x i32>, <2 x i32>, <2 x i32> }
	%struct.__neon_float32x2x3_t = type { <2 x float>, <2 x float>, <2 x float> }			%struct.__neon_float32x2x3_t = type { <2 x float>, <2 x float>, <2 x float> }

	%struct.__neon_int16x8x3_t = type { <8 x i16>, <8 x i16>, <8 x i16> }			%struct.__neon_int16x8x3_t = type { <8 x i16>, <8 x i16>, <8 x i16> }
	%struct.__neon_int32x4x3_t = type { <4 x i32>, <4 x i32>, <4 x i32> }			%struct.__neon_int32x4x3_t = type { <4 x i32>, <4 x i32>, <4 x i32> }
	%struct.__neon_float32x4x3_t = type { <4 x float>, <4 x float>, <4 x float> }			%struct.__neon_float32x4x3_t = type { <4 x float>, <4 x float>, <4 x float> }

	define <8 x i8> @vld3lanei8(i8* %A, <8 x i8>* %B) nounwind {			define <8 x i8> @vld3lanei8(i8* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vld3lanei8:			;CHECK-LABEL: vld3lanei8:
	;CHECK: vld3.8			;CHECK: vld3.8
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	%tmp2 = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3lane.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3lane.v8i8.p0i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_int8x8x3_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int8x8x3_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int8x8x3_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int8x8x3_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int8x8x3_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int8x8x3_t %tmp2, 2
	%tmp6 = add <8 x i8> %tmp3, %tmp4			%tmp6 = add <8 x i8> %tmp3, %tmp4
	%tmp7 = add <8 x i8> %tmp5, %tmp6			%tmp7 = add <8 x i8> %tmp5, %tmp6
	ret <8 x i8> %tmp7			ret <8 x i8> %tmp7
	}			}

	define <4 x i16> @vld3lanei16(i16* %A, <4 x i16>* %B) nounwind {			define <4 x i16> @vld3lanei16(i16* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vld3lanei16:			;CHECK-LABEL: vld3lanei16:
	;Check the (default) alignment value. VLD3 does not support alignment.			;Check the (default) alignment value. VLD3 does not support alignment.
	;CHECK: vld3.16 {d{{.}}[1], d{{.}}[1], d{{.*}}[1]}, [{{r[0-9]+}}]			;CHECK: vld3.16 {d{{.}}[1], d{{.}}[1], d{{.*}}[1]}, [{{r[0-9]+}}]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <4 x i16>, <4 x i16>* %B			%tmp1 = load <4 x i16>, <4 x i16>* %B
	%tmp2 = call %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3lane.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 8)			%tmp2 = call %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3lane.v4i16.p0i8(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 8)
	%tmp3 = extractvalue %struct.__neon_int16x4x3_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int16x4x3_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int16x4x3_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int16x4x3_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int16x4x3_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int16x4x3_t %tmp2, 2
	%tmp6 = add <4 x i16> %tmp3, %tmp4			%tmp6 = add <4 x i16> %tmp3, %tmp4
	%tmp7 = add <4 x i16> %tmp5, %tmp6			%tmp7 = add <4 x i16> %tmp5, %tmp6
	ret <4 x i16> %tmp7			ret <4 x i16> %tmp7
	}			}

	define <2 x i32> @vld3lanei32(i32* %A, <2 x i32>* %B) nounwind {			define <2 x i32> @vld3lanei32(i32* %A, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vld3lanei32:			;CHECK-LABEL: vld3lanei32:
	;CHECK: vld3.32			;CHECK: vld3.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <2 x i32>, <2 x i32>* %B			%tmp1 = load <2 x i32>, <2 x i32>* %B
	%tmp2 = call %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3lane.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3lane.v2i32.p0i8(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_int32x2x3_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int32x2x3_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int32x2x3_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int32x2x3_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int32x2x3_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int32x2x3_t %tmp2, 2
	%tmp6 = add <2 x i32> %tmp3, %tmp4			%tmp6 = add <2 x i32> %tmp3, %tmp4
	%tmp7 = add <2 x i32> %tmp5, %tmp6			%tmp7 = add <2 x i32> %tmp5, %tmp6
	ret <2 x i32> %tmp7			ret <2 x i32> %tmp7
	}			}

	define <2 x float> @vld3lanef(float* %A, <2 x float>* %B) nounwind {			define <2 x float> @vld3lanef(float* %A, <2 x float>* %B) nounwind {
	;CHECK-LABEL: vld3lanef:			;CHECK-LABEL: vld3lanef:
	;CHECK: vld3.32			;CHECK: vld3.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <2 x float>, <2 x float>* %B			%tmp1 = load <2 x float>, <2 x float>* %B
	%tmp2 = call %struct.__neon_float32x2x3_t @llvm.arm.neon.vld3lane.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_float32x2x3_t @llvm.arm.neon.vld3lane.v2f32.p0i8(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_float32x2x3_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_float32x2x3_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_float32x2x3_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_float32x2x3_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_float32x2x3_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_float32x2x3_t %tmp2, 2
	%tmp6 = fadd <2 x float> %tmp3, %tmp4			%tmp6 = fadd <2 x float> %tmp3, %tmp4
	%tmp7 = fadd <2 x float> %tmp5, %tmp6			%tmp7 = fadd <2 x float> %tmp5, %tmp6
	ret <2 x float> %tmp7			ret <2 x float> %tmp7
	}			}

	define <8 x i16> @vld3laneQi16(i16* %A, <8 x i16>* %B) nounwind {			define <8 x i16> @vld3laneQi16(i16* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vld3laneQi16:			;CHECK-LABEL: vld3laneQi16:
	;Check the (default) alignment value. VLD3 does not support alignment.			;Check the (default) alignment value. VLD3 does not support alignment.
	;CHECK: vld3.16 {d{{.}}[1], d{{.}}[1], d{{.*}}[1]}, [{{r[0-9]+}}]			;CHECK: vld3.16 {d{{.}}[1], d{{.}}[1], d{{.*}}[1]}, [{{r[0-9]+}}]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	%tmp2 = call %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3lane.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1, i32 8)			%tmp2 = call %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3lane.v8i16.p0i8(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1, i32 8)
	%tmp3 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 2
	%tmp6 = add <8 x i16> %tmp3, %tmp4			%tmp6 = add <8 x i16> %tmp3, %tmp4
	%tmp7 = add <8 x i16> %tmp5, %tmp6			%tmp7 = add <8 x i16> %tmp5, %tmp6
	ret <8 x i16> %tmp7			ret <8 x i16> %tmp7
	}			}

	;Check for a post-increment updating load with register increment.			;Check for a post-increment updating load with register increment.
	define <8 x i16> @vld3laneQi16_update(i16** %ptr, <8 x i16>* %B, i32 %inc) nounwind {			define <8 x i16> @vld3laneQi16_update(i16** %ptr, <8 x i16>* %B, i32 %inc) nounwind {
	;CHECK-LABEL: vld3laneQi16_update:			;CHECK-LABEL: vld3laneQi16_update:
	;CHECK: vld3.16 {d{{.}}[1], d{{.}}[1], d{{.*}}[1]}, [{{r[0-9]+}}], {{r[0-9]+}}			;CHECK: vld3.16 {d{{.}}[1], d{{.}}[1], d{{.*}}[1]}, [{{r[0-9]+}}], {{r[0-9]+}}
	%A = load i16, i16* %ptr			%A = load i16, i16* %ptr
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	%tmp2 = call %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3lane.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1, i32 8)			%tmp2 = call %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3lane.v8i16.p0i8(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1, i32 8)
	%tmp3 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int16x8x3_t %tmp2, 2
	%tmp6 = add <8 x i16> %tmp3, %tmp4			%tmp6 = add <8 x i16> %tmp3, %tmp4
	%tmp7 = add <8 x i16> %tmp5, %tmp6			%tmp7 = add <8 x i16> %tmp5, %tmp6
	%tmp8 = getelementptr i16, i16* %A, i32 %inc			%tmp8 = getelementptr i16, i16* %A, i32 %inc
	store i16* %tmp8, i16** %ptr			store i16* %tmp8, i16** %ptr
	ret <8 x i16> %tmp7			ret <8 x i16> %tmp7
	}			}

	define <4 x i32> @vld3laneQi32(i32* %A, <4 x i32>* %B) nounwind {			define <4 x i32> @vld3laneQi32(i32* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vld3laneQi32:			;CHECK-LABEL: vld3laneQi32:
	;CHECK: vld3.32			;CHECK: vld3.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <4 x i32>, <4 x i32>* %B			%tmp1 = load <4 x i32>, <4 x i32>* %B
	%tmp2 = call %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3lane.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 3, i32 1)			%tmp2 = call %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3lane.v4i32.p0i8(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 3, i32 1)
	%tmp3 = extractvalue %struct.__neon_int32x4x3_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int32x4x3_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int32x4x3_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int32x4x3_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int32x4x3_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int32x4x3_t %tmp2, 2
	%tmp6 = add <4 x i32> %tmp3, %tmp4			%tmp6 = add <4 x i32> %tmp3, %tmp4
	%tmp7 = add <4 x i32> %tmp5, %tmp6			%tmp7 = add <4 x i32> %tmp5, %tmp6
	ret <4 x i32> %tmp7			ret <4 x i32> %tmp7
	}			}

	define <4 x float> @vld3laneQf(float* %A, <4 x float>* %B) nounwind {			define <4 x float> @vld3laneQf(float* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vld3laneQf:			;CHECK-LABEL: vld3laneQf:
	;CHECK: vld3.32			;CHECK: vld3.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <4 x float>, <4 x float>* %B			%tmp1 = load <4 x float>, <4 x float>* %B
	%tmp2 = call %struct.__neon_float32x4x3_t @llvm.arm.neon.vld3lane.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_float32x4x3_t @llvm.arm.neon.vld3lane.v4f32.p0i8(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_float32x4x3_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_float32x4x3_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_float32x4x3_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_float32x4x3_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_float32x4x3_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_float32x4x3_t %tmp2, 2
	%tmp6 = fadd <4 x float> %tmp3, %tmp4			%tmp6 = fadd <4 x float> %tmp3, %tmp4
	%tmp7 = fadd <4 x float> %tmp5, %tmp6			%tmp7 = fadd <4 x float> %tmp5, %tmp6
	ret <4 x float> %tmp7			ret <4 x float> %tmp7
	}			}

	declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3lane.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly			declare %struct.__neon_int8x8x3_t @llvm.arm.neon.vld3lane.v8i8.p0i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly
	declare %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3lane.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly			declare %struct.__neon_int16x4x3_t @llvm.arm.neon.vld3lane.v4i16.p0i8(i8*, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly
	declare %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3lane.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly			declare %struct.__neon_int32x2x3_t @llvm.arm.neon.vld3lane.v2i32.p0i8(i8*, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly
	declare %struct.__neon_float32x2x3_t @llvm.arm.neon.vld3lane.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, i32, i32) nounwind readonly			declare %struct.__neon_float32x2x3_t @llvm.arm.neon.vld3lane.v2f32.p0i8(i8*, <2 x float>, <2 x float>, <2 x float>, i32, i32) nounwind readonly

	declare %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3lane.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly			declare %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3lane.v8i16.p0i8(i8*, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly
	declare %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3lane.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, i32, i32) nounwind readonly			declare %struct.__neon_int32x4x3_t @llvm.arm.neon.vld3lane.v4i32.p0i8(i8*, <4 x i32>, <4 x i32>, <4 x i32>, i32, i32) nounwind readonly
	declare %struct.__neon_float32x4x3_t @llvm.arm.neon.vld3lane.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, i32, i32) nounwind readonly			declare %struct.__neon_float32x4x3_t @llvm.arm.neon.vld3lane.v4f32.p0i8(i8*, <4 x float>, <4 x float>, <4 x float>, i32, i32) nounwind readonly

	%struct.__neon_int8x8x4_t = type { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> }			%struct.__neon_int8x8x4_t = type { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> }
	%struct.__neon_int16x4x4_t = type { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }			%struct.__neon_int16x4x4_t = type { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }
	%struct.__neon_int32x2x4_t = type { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> }			%struct.__neon_int32x2x4_t = type { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> }
	%struct.__neon_float32x2x4_t = type { <2 x float>, <2 x float>, <2 x float>, <2 x float> }			%struct.__neon_float32x2x4_t = type { <2 x float>, <2 x float>, <2 x float>, <2 x float> }

	%struct.__neon_int16x8x4_t = type { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }			%struct.__neon_int16x8x4_t = type { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }
	%struct.__neon_int32x4x4_t = type { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> }			%struct.__neon_int32x4x4_t = type { <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32> }
	%struct.__neon_float32x4x4_t = type { <4 x float>, <4 x float>, <4 x float>, <4 x float> }			%struct.__neon_float32x4x4_t = type { <4 x float>, <4 x float>, <4 x float>, <4 x float> }

	define <8 x i8> @vld4lanei8(i8* %A, <8 x i8>* %B) nounwind {			define <8 x i8> @vld4lanei8(i8* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vld4lanei8:			;CHECK-LABEL: vld4lanei8:
	;Check the alignment value. Max for this instruction is 32 bits:			;Check the alignment value. Max for this instruction is 32 bits:
	;CHECK: vld4.8 {d{{.}}[1], d{{.}}[1], d{{.}}[1], d{{.}}[1]}, [{{r[0-9]+}}:32]			;CHECK: vld4.8 {d{{.}}[1], d{{.}}[1], d{{.}}[1], d{{.}}[1]}, [{{r[0-9]+}}:32]
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	%tmp2 = call %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4lane.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 8)			%tmp2 = call %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4lane.v8i8.p0i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 8)
	%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 2
	%tmp6 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 3			%tmp6 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 3
	%tmp7 = add <8 x i8> %tmp3, %tmp4			%tmp7 = add <8 x i8> %tmp3, %tmp4
	%tmp8 = add <8 x i8> %tmp5, %tmp6			%tmp8 = add <8 x i8> %tmp5, %tmp6
	%tmp9 = add <8 x i8> %tmp7, %tmp8			%tmp9 = add <8 x i8> %tmp7, %tmp8
	ret <8 x i8> %tmp9			ret <8 x i8> %tmp9
	}			}

	;Check for a post-increment updating load.			;Check for a post-increment updating load.
	define <8 x i8> @vld4lanei8_update(i8** %ptr, <8 x i8>* %B) nounwind {			define <8 x i8> @vld4lanei8_update(i8** %ptr, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vld4lanei8_update:			;CHECK-LABEL: vld4lanei8_update:
	;CHECK: vld4.8 {d16[1], d17[1], d18[1], d19[1]}, [{{r[0-9]+}}:32]!			;CHECK: vld4.8 {d16[1], d17[1], d18[1], d19[1]}, [{{r[0-9]+}}:32]!
	%A = load i8, i8* %ptr			%A = load i8, i8* %ptr
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	%tmp2 = call %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4lane.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 8)			%tmp2 = call %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4lane.v8i8.p0i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 8)
	%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 2
	%tmp6 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 3			%tmp6 = extractvalue %struct.__neon_int8x8x4_t %tmp2, 3
	%tmp7 = add <8 x i8> %tmp3, %tmp4			%tmp7 = add <8 x i8> %tmp3, %tmp4
	%tmp8 = add <8 x i8> %tmp5, %tmp6			%tmp8 = add <8 x i8> %tmp5, %tmp6
	%tmp9 = add <8 x i8> %tmp7, %tmp8			%tmp9 = add <8 x i8> %tmp7, %tmp8
	%tmp10 = getelementptr i8, i8* %A, i32 4			%tmp10 = getelementptr i8, i8* %A, i32 4
	store i8* %tmp10, i8** %ptr			store i8* %tmp10, i8** %ptr
	ret <8 x i8> %tmp9			ret <8 x i8> %tmp9
	}			}

	define <4 x i16> @vld4lanei16(i16* %A, <4 x i16>* %B) nounwind {			define <4 x i16> @vld4lanei16(i16* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vld4lanei16:			;CHECK-LABEL: vld4lanei16:
	;Check that a power-of-two alignment smaller than the total size of the memory			;Check that a power-of-two alignment smaller than the total size of the memory
	;being loaded is ignored.			;being loaded is ignored.
	;CHECK: vld4.16 {d16[1], d17[1], d18[1], d19[1]}, [{{r[0-9]+}}]			;CHECK: vld4.16 {d16[1], d17[1], d18[1], d19[1]}, [{{r[0-9]+}}]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <4 x i16>, <4 x i16>* %B			%tmp1 = load <4 x i16>, <4 x i16>* %B
	%tmp2 = call %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4lane.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 4)			%tmp2 = call %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4lane.v4i16.p0i8(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 4)
	%tmp3 = extractvalue %struct.__neon_int16x4x4_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int16x4x4_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int16x4x4_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int16x4x4_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int16x4x4_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int16x4x4_t %tmp2, 2
	%tmp6 = extractvalue %struct.__neon_int16x4x4_t %tmp2, 3			%tmp6 = extractvalue %struct.__neon_int16x4x4_t %tmp2, 3
	%tmp7 = add <4 x i16> %tmp3, %tmp4			%tmp7 = add <4 x i16> %tmp3, %tmp4
	%tmp8 = add <4 x i16> %tmp5, %tmp6			%tmp8 = add <4 x i16> %tmp5, %tmp6
	%tmp9 = add <4 x i16> %tmp7, %tmp8			%tmp9 = add <4 x i16> %tmp7, %tmp8
	ret <4 x i16> %tmp9			ret <4 x i16> %tmp9
	}			}

	define <2 x i32> @vld4lanei32(i32* %A, <2 x i32>* %B) nounwind {			define <2 x i32> @vld4lanei32(i32* %A, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vld4lanei32:			;CHECK-LABEL: vld4lanei32:
	;Check the alignment value. An 8-byte alignment is allowed here even though			;Check the alignment value. An 8-byte alignment is allowed here even though
	;it is smaller than the total size of the memory being loaded.			;it is smaller than the total size of the memory being loaded.
	;CHECK: vld4.32 {d16[1], d17[1], d18[1], d19[1]}, [{{r[0-9]+}}:64]			;CHECK: vld4.32 {d16[1], d17[1], d18[1], d19[1]}, [{{r[0-9]+}}:64]
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <2 x i32>, <2 x i32>* %B			%tmp1 = load <2 x i32>, <2 x i32>* %B
	%tmp2 = call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 8)			%tmp2 = call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32.p0i8(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 8)
	%tmp3 = extractvalue %struct.__neon_int32x2x4_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int32x2x4_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int32x2x4_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int32x2x4_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int32x2x4_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int32x2x4_t %tmp2, 2
	%tmp6 = extractvalue %struct.__neon_int32x2x4_t %tmp2, 3			%tmp6 = extractvalue %struct.__neon_int32x2x4_t %tmp2, 3
	%tmp7 = add <2 x i32> %tmp3, %tmp4			%tmp7 = add <2 x i32> %tmp3, %tmp4
	%tmp8 = add <2 x i32> %tmp5, %tmp6			%tmp8 = add <2 x i32> %tmp5, %tmp6
	%tmp9 = add <2 x i32> %tmp7, %tmp8			%tmp9 = add <2 x i32> %tmp7, %tmp8
	ret <2 x i32> %tmp9			ret <2 x i32> %tmp9
	}			}

	define <2 x float> @vld4lanef(float* %A, <2 x float>* %B) nounwind {			define <2 x float> @vld4lanef(float* %A, <2 x float>* %B) nounwind {
	;CHECK-LABEL: vld4lanef:			;CHECK-LABEL: vld4lanef:
	;CHECK: vld4.32			;CHECK: vld4.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <2 x float>, <2 x float>* %B			%tmp1 = load <2 x float>, <2 x float>* %B
	%tmp2 = call %struct.__neon_float32x2x4_t @llvm.arm.neon.vld4lane.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_float32x2x4_t @llvm.arm.neon.vld4lane.v2f32.p0i8(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_float32x2x4_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_float32x2x4_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_float32x2x4_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_float32x2x4_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_float32x2x4_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_float32x2x4_t %tmp2, 2
	%tmp6 = extractvalue %struct.__neon_float32x2x4_t %tmp2, 3			%tmp6 = extractvalue %struct.__neon_float32x2x4_t %tmp2, 3
	%tmp7 = fadd <2 x float> %tmp3, %tmp4			%tmp7 = fadd <2 x float> %tmp3, %tmp4
	%tmp8 = fadd <2 x float> %tmp5, %tmp6			%tmp8 = fadd <2 x float> %tmp5, %tmp6
	%tmp9 = fadd <2 x float> %tmp7, %tmp8			%tmp9 = fadd <2 x float> %tmp7, %tmp8
	ret <2 x float> %tmp9			ret <2 x float> %tmp9
	}			}

	define <8 x i16> @vld4laneQi16(i16* %A, <8 x i16>* %B) nounwind {			define <8 x i16> @vld4laneQi16(i16* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vld4laneQi16:			;CHECK-LABEL: vld4laneQi16:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;CHECK: vld4.16 {d16[1], d18[1], d20[1], d22[1]}, [{{r[0-9]+}}:64]			;CHECK: vld4.16 {d16[1], d18[1], d20[1], d22[1]}, [{{r[0-9]+}}:64]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	%tmp2 = call %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4lane.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1, i32 16)			%tmp2 = call %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4lane.v8i16.p0i8(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1, i32 16)
	%tmp3 = extractvalue %struct.__neon_int16x8x4_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int16x8x4_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int16x8x4_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int16x8x4_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int16x8x4_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int16x8x4_t %tmp2, 2
	%tmp6 = extractvalue %struct.__neon_int16x8x4_t %tmp2, 3			%tmp6 = extractvalue %struct.__neon_int16x8x4_t %tmp2, 3
	%tmp7 = add <8 x i16> %tmp3, %tmp4			%tmp7 = add <8 x i16> %tmp3, %tmp4
	%tmp8 = add <8 x i16> %tmp5, %tmp6			%tmp8 = add <8 x i16> %tmp5, %tmp6
	%tmp9 = add <8 x i16> %tmp7, %tmp8			%tmp9 = add <8 x i16> %tmp7, %tmp8
	ret <8 x i16> %tmp9			ret <8 x i16> %tmp9
	}			}

	define <4 x i32> @vld4laneQi32(i32* %A, <4 x i32>* %B) nounwind {			define <4 x i32> @vld4laneQi32(i32* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vld4laneQi32:			;CHECK-LABEL: vld4laneQi32:
	;Check the (default) alignment.			;Check the (default) alignment.
	;CHECK: vld4.32 {d17[0], d19[0], d21[0], d23[0]}, [{{r[0-9]+}}]			;CHECK: vld4.32 {d17[0], d19[0], d21[0], d23[0]}, [{{r[0-9]+}}]
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <4 x i32>, <4 x i32>* %B			%tmp1 = load <4 x i32>, <4 x i32>* %B
	%tmp2 = call %struct.__neon_int32x4x4_t @llvm.arm.neon.vld4lane.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 2, i32 1)			%tmp2 = call %struct.__neon_int32x4x4_t @llvm.arm.neon.vld4lane.v4i32.p0i8(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 2, i32 1)
	%tmp3 = extractvalue %struct.__neon_int32x4x4_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_int32x4x4_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_int32x4x4_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_int32x4x4_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_int32x4x4_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_int32x4x4_t %tmp2, 2
	%tmp6 = extractvalue %struct.__neon_int32x4x4_t %tmp2, 3			%tmp6 = extractvalue %struct.__neon_int32x4x4_t %tmp2, 3
	%tmp7 = add <4 x i32> %tmp3, %tmp4			%tmp7 = add <4 x i32> %tmp3, %tmp4
	%tmp8 = add <4 x i32> %tmp5, %tmp6			%tmp8 = add <4 x i32> %tmp5, %tmp6
	%tmp9 = add <4 x i32> %tmp7, %tmp8			%tmp9 = add <4 x i32> %tmp7, %tmp8
	ret <4 x i32> %tmp9			ret <4 x i32> %tmp9
	}			}

	define <4 x float> @vld4laneQf(float* %A, <4 x float>* %B) nounwind {			define <4 x float> @vld4laneQf(float* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vld4laneQf:			;CHECK-LABEL: vld4laneQf:
	;CHECK: vld4.32			;CHECK: vld4.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <4 x float>, <4 x float>* %B			%tmp1 = load <4 x float>, <4 x float>* %B
	%tmp2 = call %struct.__neon_float32x4x4_t @llvm.arm.neon.vld4lane.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)			%tmp2 = call %struct.__neon_float32x4x4_t @llvm.arm.neon.vld4lane.v4f32.p0i8(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)
	%tmp3 = extractvalue %struct.__neon_float32x4x4_t %tmp2, 0			%tmp3 = extractvalue %struct.__neon_float32x4x4_t %tmp2, 0
	%tmp4 = extractvalue %struct.__neon_float32x4x4_t %tmp2, 1			%tmp4 = extractvalue %struct.__neon_float32x4x4_t %tmp2, 1
	%tmp5 = extractvalue %struct.__neon_float32x4x4_t %tmp2, 2			%tmp5 = extractvalue %struct.__neon_float32x4x4_t %tmp2, 2
	%tmp6 = extractvalue %struct.__neon_float32x4x4_t %tmp2, 3			%tmp6 = extractvalue %struct.__neon_float32x4x4_t %tmp2, 3
	%tmp7 = fadd <4 x float> %tmp3, %tmp4			%tmp7 = fadd <4 x float> %tmp3, %tmp4
	%tmp8 = fadd <4 x float> %tmp5, %tmp6			%tmp8 = fadd <4 x float> %tmp5, %tmp6
	%tmp9 = fadd <4 x float> %tmp7, %tmp8			%tmp9 = fadd <4 x float> %tmp7, %tmp8
	ret <4 x float> %tmp9			ret <4 x float> %tmp9
	}			}

	declare %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4lane.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly			declare %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4lane.v8i8.p0i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind readonly
	declare %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4lane.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly			declare %struct.__neon_int16x4x4_t @llvm.arm.neon.vld4lane.v4i16.p0i8(i8*, <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind readonly
	declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly			declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4lane.v2i32.p0i8(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind readonly
	declare %struct.__neon_float32x2x4_t @llvm.arm.neon.vld4lane.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, <2 x float>, i32, i32) nounwind readonly			declare %struct.__neon_float32x2x4_t @llvm.arm.neon.vld4lane.v2f32.p0i8(i8*, <2 x float>, <2 x float>, <2 x float>, <2 x float>, i32, i32) nounwind readonly

	declare %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4lane.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly			declare %struct.__neon_int16x8x4_t @llvm.arm.neon.vld4lane.v8i16.p0i8(i8*, <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind readonly
	declare %struct.__neon_int32x4x4_t @llvm.arm.neon.vld4lane.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32, i32) nounwind readonly			declare %struct.__neon_int32x4x4_t @llvm.arm.neon.vld4lane.v4i32.p0i8(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32, i32) nounwind readonly
	declare %struct.__neon_float32x4x4_t @llvm.arm.neon.vld4lane.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32, i32) nounwind readonly			declare %struct.__neon_float32x4x4_t @llvm.arm.neon.vld4lane.v4f32.p0i8(i8*, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32, i32) nounwind readonly

	; Radar 8776599: If one of the operands to a QQQQ REG_SEQUENCE is a register			; Radar 8776599: If one of the operands to a QQQQ REG_SEQUENCE is a register
	; in the QPR_VFP2 regclass, it needs to be copied to a QPR regclass because			; in the QPR_VFP2 regclass, it needs to be copied to a QPR regclass because
	; we don't currently have a QQQQ_VFP2 super-regclass. (The "0" for the low			; we don't currently have a QQQQ_VFP2 super-regclass. (The "0" for the low
	; part of %ins67 is supposed to be loaded by a VLDRS instruction in this test.)			; part of %ins67 is supposed to be loaded by a VLDRS instruction in this test.)
	define <8 x i16> @test_qqqq_regsequence_subreg([6 x i64] %b) nounwind {			define <8 x i16> @test_qqqq_regsequence_subreg([6 x i64] %b) nounwind {
	;CHECK-LABEL: test_qqqq_regsequence_subreg:			;CHECK-LABEL: test_qqqq_regsequence_subreg:
	;CHECK: vld3.16			;CHECK: vld3.16
	%tmp63 = extractvalue [6 x i64] %b, 5			%tmp63 = extractvalue [6 x i64] %b, 5
	%tmp64 = zext i64 %tmp63 to i128			%tmp64 = zext i64 %tmp63 to i128
	%tmp65 = shl i128 %tmp64, 64			%tmp65 = shl i128 %tmp64, 64
	%ins67 = or i128 %tmp65, 0			%ins67 = or i128 %tmp65, 0
	%tmp78 = bitcast i128 %ins67 to <8 x i16>			%tmp78 = bitcast i128 %ins67 to <8 x i16>
	%vld3_lane = tail call %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3lane.v8i16(i8* undef, <8 x i16> undef, <8 x i16> undef, <8 x i16> %tmp78, i32 1, i32 2)			%vld3_lane = tail call %struct.__neon_int16x8x3_t @llvm.arm.neon.vld3lane.v8i16.p0i8(i8* undef, <8 x i16> undef, <8 x i16> undef, <8 x i16> %tmp78, i32 1, i32 2)
	%tmp3 = extractvalue %struct.__neon_int16x8x3_t %vld3_lane, 0			%tmp3 = extractvalue %struct.__neon_int16x8x3_t %vld3_lane, 0
	%tmp4 = extractvalue %struct.__neon_int16x8x3_t %vld3_lane, 1			%tmp4 = extractvalue %struct.__neon_int16x8x3_t %vld3_lane, 1
	%tmp5 = extractvalue %struct.__neon_int16x8x3_t %vld3_lane, 2			%tmp5 = extractvalue %struct.__neon_int16x8x3_t %vld3_lane, 2
	%tmp6 = add <8 x i16> %tmp3, %tmp4			%tmp6 = add <8 x i16> %tmp3, %tmp4
	%tmp7 = add <8 x i16> %tmp5, %tmp6			%tmp7 = add <8 x i16> %tmp5, %tmp6
	ret <8 x i16> %tmp7			ret <8 x i16> %tmp7
	}			}

	declare void @llvm.trap() nounwind			declare void @llvm.trap() nounwind

llvm/trunk/test/CodeGen/ARM/vmov.ll

	Show First 20 Lines • Show All 387 Lines • ▼ Show 20 Lines
	entry:			entry:
	;CHECK-LABEL: any_extend:			;CHECK-LABEL: any_extend:
	;CHECK: vmovl			;CHECK: vmovl
	%and.i186 = zext <4 x i1> %x to <4 x i32>			%and.i186 = zext <4 x i1> %x to <4 x i32>
	%add.i185 = sub <4 x i32> %and.i186, %y			%add.i185 = sub <4 x i32> %and.i186, %y
	%sub.i = sub <4 x i32> %add.i185, zeroinitializer			%sub.i = sub <4 x i32> %add.i185, zeroinitializer
	%add.i = add <4 x i32> %sub.i, zeroinitializer			%add.i = add <4 x i32> %sub.i, zeroinitializer
	%vmovn.i = trunc <4 x i32> %add.i to <4 x i16>			%vmovn.i = trunc <4 x i32> %add.i to <4 x i16>
	tail call void @llvm.arm.neon.vst1.v4i16(i8* undef, <4 x i16> %vmovn.i, i32 2)			tail call void @llvm.arm.neon.vst1.p0i8.v4i16(i8* undef, <4 x i16> %vmovn.i, i32 2)
	unreachable			unreachable
	}			}

	declare void @llvm.arm.neon.vst1.v4i16(i8*, <4 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v4i16(i8*, <4 x i16>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/vmul.ll

Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines
define void @distribute(i16* %dst, i8* %src, i32 %mul) nounwind {		define void @distribute(i16* %dst, i8* %src, i32 %mul) nounwind {
entry:		entry:
; CHECK-LABEL: distribute:		; CHECK-LABEL: distribute:
; CHECK: vmull.u8 [[REG1:(q[0-9]+)]], d{{.*}}, [[REG2:(d[0-9]+)]]		; CHECK: vmull.u8 [[REG1:(q[0-9]+)]], d{{.*}}, [[REG2:(d[0-9]+)]]
; CHECK: vmlal.u8 [[REG1]], d{{.*}}, [[REG2]]		; CHECK: vmlal.u8 [[REG1]], d{{.*}}, [[REG2]]
%0 = trunc i32 %mul to i8		%0 = trunc i32 %mul to i8
%1 = insertelement <8 x i8> undef, i8 %0, i32 0		%1 = insertelement <8 x i8> undef, i8 %0, i32 0
%2 = shufflevector <8 x i8> %1, <8 x i8> undef, <8 x i32> zeroinitializer		%2 = shufflevector <8 x i8> %1, <8 x i8> undef, <8 x i32> zeroinitializer
%3 = tail call <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* %src, i32 1)		%3 = tail call <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* %src, i32 1)
%4 = bitcast <16 x i8> %3 to <2 x double>		%4 = bitcast <16 x i8> %3 to <2 x double>
%5 = extractelement <2 x double> %4, i32 1		%5 = extractelement <2 x double> %4, i32 1
%6 = bitcast double %5 to <8 x i8>		%6 = bitcast double %5 to <8 x i8>
%7 = zext <8 x i8> %6 to <8 x i16>		%7 = zext <8 x i8> %6 to <8 x i16>
%8 = zext <8 x i8> %2 to <8 x i16>		%8 = zext <8 x i8> %2 to <8 x i16>
%9 = extractelement <2 x double> %4, i32 0		%9 = extractelement <2 x double> %4, i32 0
%10 = bitcast double %9 to <8 x i8>		%10 = bitcast double %9 to <8 x i8>
%11 = zext <8 x i8> %10 to <8 x i16>		%11 = zext <8 x i8> %10 to <8 x i16>
%12 = add <8 x i16> %7, %11		%12 = add <8 x i16> %7, %11
%13 = mul <8 x i16> %12, %8		%13 = mul <8 x i16> %12, %8
%14 = bitcast i16* %dst to i8*		%14 = bitcast i16* %dst to i8*
tail call void @llvm.arm.neon.vst1.v8i16(i8* %14, <8 x i16> %13, i32 2)		tail call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %14, <8 x i16> %13, i32 2)
ret void		ret void
}		}

declare <16 x i8> @llvm.arm.neon.vld1.v16i8(i8*, i32) nounwind readonly		declare <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8*, i32) nounwind readonly

declare void @llvm.arm.neon.vst1.v8i16(i8*, <8 x i16>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v8i16(i8*, <8 x i16>, i32) nounwind

; Take advantage of the Cortex-A8 multiplier accumulator forward.		; Take advantage of the Cortex-A8 multiplier accumulator forward.

%struct.uint8x8_t = type { <8 x i8> }		%struct.uint8x8_t = type { <8 x i8> }

define void @distribute2(%struct.uint8x8_t* nocapture %dst, i8* %src, i32 %mul) nounwind {		define void @distribute2(%struct.uint8x8_t* nocapture %dst, i8* %src, i32 %mul) nounwind {
entry:		entry:
; CHECK: distribute2		; CHECK: distribute2
; CHECK-NOT: vadd.i8		; CHECK-NOT: vadd.i8
; CHECK: vmul.i8		; CHECK: vmul.i8
; CHECK: vmla.i8		; CHECK: vmla.i8
%0 = trunc i32 %mul to i8		%0 = trunc i32 %mul to i8
%1 = insertelement <8 x i8> undef, i8 %0, i32 0		%1 = insertelement <8 x i8> undef, i8 %0, i32 0
%2 = shufflevector <8 x i8> %1, <8 x i8> undef, <8 x i32> zeroinitializer		%2 = shufflevector <8 x i8> %1, <8 x i8> undef, <8 x i32> zeroinitializer
%3 = tail call <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* %src, i32 1)		%3 = tail call <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* %src, i32 1)
%4 = bitcast <16 x i8> %3 to <2 x double>		%4 = bitcast <16 x i8> %3 to <2 x double>
%5 = extractelement <2 x double> %4, i32 1		%5 = extractelement <2 x double> %4, i32 1
%6 = bitcast double %5 to <8 x i8>		%6 = bitcast double %5 to <8 x i8>
%7 = extractelement <2 x double> %4, i32 0		%7 = extractelement <2 x double> %4, i32 0
%8 = bitcast double %7 to <8 x i8>		%8 = bitcast double %7 to <8 x i8>
%9 = add <8 x i8> %6, %8		%9 = add <8 x i8> %6, %8
%10 = mul <8 x i8> %9, %2		%10 = mul <8 x i8> %9, %2
%11 = getelementptr inbounds %struct.uint8x8_t, %struct.uint8x8_t* %dst, i32 0, i32 0		%11 = getelementptr inbounds %struct.uint8x8_t, %struct.uint8x8_t* %dst, i32 0, i32 0
store <8 x i8> %10, <8 x i8>* %11, align 8		store <8 x i8> %10, <8 x i8>* %11, align 8
ret void		ret void
}		}

define void @distribute2_commutative(%struct.uint8x8_t* nocapture %dst, i8* %src, i32 %mul) nounwind {		define void @distribute2_commutative(%struct.uint8x8_t* nocapture %dst, i8* %src, i32 %mul) nounwind {
entry:		entry:
; CHECK: distribute2_commutative		; CHECK: distribute2_commutative
; CHECK-NOT: vadd.i8		; CHECK-NOT: vadd.i8
; CHECK: vmul.i8		; CHECK: vmul.i8
; CHECK: vmla.i8		; CHECK: vmla.i8
%0 = trunc i32 %mul to i8		%0 = trunc i32 %mul to i8
%1 = insertelement <8 x i8> undef, i8 %0, i32 0		%1 = insertelement <8 x i8> undef, i8 %0, i32 0
%2 = shufflevector <8 x i8> %1, <8 x i8> undef, <8 x i32> zeroinitializer		%2 = shufflevector <8 x i8> %1, <8 x i8> undef, <8 x i32> zeroinitializer
%3 = tail call <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* %src, i32 1)		%3 = tail call <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* %src, i32 1)
%4 = bitcast <16 x i8> %3 to <2 x double>		%4 = bitcast <16 x i8> %3 to <2 x double>
%5 = extractelement <2 x double> %4, i32 1		%5 = extractelement <2 x double> %4, i32 1
%6 = bitcast double %5 to <8 x i8>		%6 = bitcast double %5 to <8 x i8>
%7 = extractelement <2 x double> %4, i32 0		%7 = extractelement <2 x double> %4, i32 0
%8 = bitcast double %7 to <8 x i8>		%8 = bitcast double %7 to <8 x i8>
%9 = add <8 x i8> %6, %8		%9 = add <8 x i8> %6, %8
%10 = mul <8 x i8> %2, %9		%10 = mul <8 x i8> %2, %9
%11 = getelementptr inbounds %struct.uint8x8_t, %struct.uint8x8_t* %dst, i32 0, i32 0		%11 = getelementptr inbounds %struct.uint8x8_t, %struct.uint8x8_t* %dst, i32 0, i32 0
Show All 40 Lines	for.body: ; preds = %for.cond.loopexit, %for.body.lr.ph
br i1 undef, label %for.cond.loopexit, label %for.body33.lr.ph		br i1 undef, label %for.cond.loopexit, label %for.body33.lr.ph

for.body33.lr.ph: ; preds = %for.body		for.body33.lr.ph: ; preds = %for.body
%.sub = select i1 undef, i32 0, i32 undef		%.sub = select i1 undef, i32 0, i32 undef
br label %for.body33		br label %for.body33

for.body33: ; preds = %for.body33, %for.body33.lr.ph		for.body33: ; preds = %for.body33, %for.body33.lr.ph
%add45 = add i32 undef, undef		%add45 = add i32 undef, undef
%vld155 = tail call <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* undef, i32 1)		%vld155 = tail call <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* undef, i32 1)
%0 = load i32, i32* undef, align 4		%0 = load i32, i32* undef, align 4
%shuffle.i250 = shufflevector <2 x i64> undef, <2 x i64> undef, <1 x i32> zeroinitializer		%shuffle.i250 = shufflevector <2 x i64> undef, <2 x i64> undef, <1 x i32> zeroinitializer
%1 = bitcast <1 x i64> %shuffle.i250 to <8 x i8>		%1 = bitcast <1 x i64> %shuffle.i250 to <8 x i8>
%vmovl.i249 = zext <8 x i8> %1 to <8 x i16>		%vmovl.i249 = zext <8 x i8> %1 to <8 x i16>
%shuffle.i246 = shufflevector <2 x i64> undef, <2 x i64> undef, <1 x i32> zeroinitializer		%shuffle.i246 = shufflevector <2 x i64> undef, <2 x i64> undef, <1 x i32> zeroinitializer
%shuffle.i240 = shufflevector <2 x i64> undef, <2 x i64> undef, <1 x i32> <i32 1>		%shuffle.i240 = shufflevector <2 x i64> undef, <2 x i64> undef, <1 x i32> <i32 1>
%2 = bitcast <1 x i64> %shuffle.i240 to <8 x i8>		%2 = bitcast <1 x i64> %shuffle.i240 to <8 x i8>
%3 = bitcast <16 x i8> undef to <2 x i64>		%3 = bitcast <16 x i8> undef to <2 x i64>
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/vst1.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s

	define void @vst1i8(i8* %A, <8 x i8>* %B) nounwind {			define void @vst1i8(i8* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vst1i8:			;CHECK-LABEL: vst1i8:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;CHECK: vst1.8 {d16}, [r0:64]			;CHECK: vst1.8 {d16}, [r0:64]
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	call void @llvm.arm.neon.vst1.v8i8(i8* %A, <8 x i8> %tmp1, i32 16)			call void @llvm.arm.neon.vst1.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, i32 16)
	ret void			ret void
	}			}

	define void @vst1i16(i16* %A, <4 x i16>* %B) nounwind {			define void @vst1i16(i16* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vst1i16:			;CHECK-LABEL: vst1i16:
	;CHECK: vst1.16			;CHECK: vst1.16
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <4 x i16>, <4 x i16>* %B			%tmp1 = load <4 x i16>, <4 x i16>* %B
	call void @llvm.arm.neon.vst1.v4i16(i8* %tmp0, <4 x i16> %tmp1, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v4i16(i8* %tmp0, <4 x i16> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst1i32(i32* %A, <2 x i32>* %B) nounwind {			define void @vst1i32(i32* %A, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vst1i32:			;CHECK-LABEL: vst1i32:
	;CHECK: vst1.32			;CHECK: vst1.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <2 x i32>, <2 x i32>* %B			%tmp1 = load <2 x i32>, <2 x i32>* %B
	call void @llvm.arm.neon.vst1.v2i32(i8* %tmp0, <2 x i32> %tmp1, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v2i32(i8* %tmp0, <2 x i32> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst1f(float* %A, <2 x float>* %B) nounwind {			define void @vst1f(float* %A, <2 x float>* %B) nounwind {
	;CHECK-LABEL: vst1f:			;CHECK-LABEL: vst1f:
	;CHECK: vst1.32			;CHECK: vst1.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <2 x float>, <2 x float>* %B			%tmp1 = load <2 x float>, <2 x float>* %B
	call void @llvm.arm.neon.vst1.v2f32(i8* %tmp0, <2 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v2f32(i8* %tmp0, <2 x float> %tmp1, i32 1)
	ret void			ret void
	}			}

	;Check for a post-increment updating store.			;Check for a post-increment updating store.
	define void @vst1f_update(float** %ptr, <2 x float>* %B) nounwind {			define void @vst1f_update(float** %ptr, <2 x float>* %B) nounwind {
	;CHECK-LABEL: vst1f_update:			;CHECK-LABEL: vst1f_update:
	;CHECK: vst1.32 {d16}, [r1]!			;CHECK: vst1.32 {d16}, [r1]!
	%A = load float, float* %ptr			%A = load float, float* %ptr
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <2 x float>, <2 x float>* %B			%tmp1 = load <2 x float>, <2 x float>* %B
	call void @llvm.arm.neon.vst1.v2f32(i8* %tmp0, <2 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v2f32(i8* %tmp0, <2 x float> %tmp1, i32 1)
	%tmp2 = getelementptr float, float* %A, i32 2			%tmp2 = getelementptr float, float* %A, i32 2
	store float* %tmp2, float** %ptr			store float* %tmp2, float** %ptr
	ret void			ret void
	}			}

	define void @vst1i64(i64* %A, <1 x i64>* %B) nounwind {			define void @vst1i64(i64* %A, <1 x i64>* %B) nounwind {
	;CHECK-LABEL: vst1i64:			;CHECK-LABEL: vst1i64:
	;CHECK: vst1.64			;CHECK: vst1.64
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = load <1 x i64>, <1 x i64>* %B			%tmp1 = load <1 x i64>, <1 x i64>* %B
	call void @llvm.arm.neon.vst1.v1i64(i8* %tmp0, <1 x i64> %tmp1, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst1Qi8(i8* %A, <16 x i8>* %B) nounwind {			define void @vst1Qi8(i8* %A, <16 x i8>* %B) nounwind {
	;CHECK-LABEL: vst1Qi8:			;CHECK-LABEL: vst1Qi8:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vst1.8 {d16, d17}, [r0:64]			;CHECK: vst1.8 {d16, d17}, [r0:64]
	%tmp1 = load <16 x i8>, <16 x i8>* %B			%tmp1 = load <16 x i8>, <16 x i8>* %B
	call void @llvm.arm.neon.vst1.v16i8(i8* %A, <16 x i8> %tmp1, i32 8)			call void @llvm.arm.neon.vst1.p0i8.v16i8(i8* %A, <16 x i8> %tmp1, i32 8)
	ret void			ret void
	}			}

	define void @vst1Qi16(i16* %A, <8 x i16>* %B) nounwind {			define void @vst1Qi16(i16* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vst1Qi16:			;CHECK-LABEL: vst1Qi16:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vst1.16 {d16, d17}, [r0:128]			;CHECK: vst1.16 {d16, d17}, [r0:128]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	call void @llvm.arm.neon.vst1.v8i16(i8* %tmp0, <8 x i16> %tmp1, i32 32)			call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %tmp0, <8 x i16> %tmp1, i32 32)
	ret void			ret void
	}			}

	;Check for a post-increment updating store with register increment.			;Check for a post-increment updating store with register increment.
	define void @vst1Qi16_update(i16** %ptr, <8 x i16>* %B, i32 %inc) nounwind {			define void @vst1Qi16_update(i16** %ptr, <8 x i16>* %B, i32 %inc) nounwind {
	;CHECK-LABEL: vst1Qi16_update:			;CHECK-LABEL: vst1Qi16_update:
	;CHECK: vst1.16 {d16, d17}, [r1:64], r2			;CHECK: vst1.16 {d16, d17}, [r1:64], r2
	%A = load i16, i16* %ptr			%A = load i16, i16* %ptr
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	call void @llvm.arm.neon.vst1.v8i16(i8* %tmp0, <8 x i16> %tmp1, i32 8)			call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* %tmp0, <8 x i16> %tmp1, i32 8)
	%tmp2 = getelementptr i16, i16* %A, i32 %inc			%tmp2 = getelementptr i16, i16* %A, i32 %inc
	store i16* %tmp2, i16** %ptr			store i16* %tmp2, i16** %ptr
	ret void			ret void
	}			}

	define void @vst1Qi32(i32* %A, <4 x i32>* %B) nounwind {			define void @vst1Qi32(i32* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vst1Qi32:			;CHECK-LABEL: vst1Qi32:
	;CHECK: vst1.32			;CHECK: vst1.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <4 x i32>, <4 x i32>* %B			%tmp1 = load <4 x i32>, <4 x i32>* %B
	call void @llvm.arm.neon.vst1.v4i32(i8* %tmp0, <4 x i32> %tmp1, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v4i32(i8* %tmp0, <4 x i32> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst1Qf(float* %A, <4 x float>* %B) nounwind {			define void @vst1Qf(float* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vst1Qf:			;CHECK-LABEL: vst1Qf:
	;CHECK: vst1.32			;CHECK: vst1.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <4 x float>, <4 x float>* %B			%tmp1 = load <4 x float>, <4 x float>* %B
	call void @llvm.arm.neon.vst1.v4f32(i8* %tmp0, <4 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v4f32(i8* %tmp0, <4 x float> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst1Qi64(i64* %A, <2 x i64>* %B) nounwind {			define void @vst1Qi64(i64* %A, <2 x i64>* %B) nounwind {
	;CHECK-LABEL: vst1Qi64:			;CHECK-LABEL: vst1Qi64:
	;CHECK: vst1.64			;CHECK: vst1.64
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = load <2 x i64>, <2 x i64>* %B			%tmp1 = load <2 x i64>, <2 x i64>* %B
	call void @llvm.arm.neon.vst1.v2i64(i8* %tmp0, <2 x i64> %tmp1, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v2i64(i8* %tmp0, <2 x i64> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst1Qf64(double* %A, <2 x double>* %B) nounwind {			define void @vst1Qf64(double* %A, <2 x double>* %B) nounwind {
	;CHECK-LABEL: vst1Qf64:			;CHECK-LABEL: vst1Qf64:
	;CHECK: vst1.64			;CHECK: vst1.64
	%tmp0 = bitcast double* %A to i8*			%tmp0 = bitcast double* %A to i8*
	%tmp1 = load <2 x double>, <2 x double>* %B			%tmp1 = load <2 x double>, <2 x double>* %B
	call void @llvm.arm.neon.vst1.v2f64(i8* %tmp0, <2 x double> %tmp1, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v2f64(i8* %tmp0, <2 x double> %tmp1, i32 1)
	ret void			ret void
	}			}

	declare void @llvm.arm.neon.vst1.v8i8(i8*, <8 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v8i8(i8*, <8 x i8>, i32) nounwind
	declare void @llvm.arm.neon.vst1.v4i16(i8*, <4 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v4i16(i8*, <4 x i16>, i32) nounwind
	declare void @llvm.arm.neon.vst1.v2i32(i8*, <2 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v2i32(i8*, <2 x i32>, i32) nounwind
	declare void @llvm.arm.neon.vst1.v2f32(i8*, <2 x float>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v2f32(i8*, <2 x float>, i32) nounwind
	declare void @llvm.arm.neon.vst1.v1i64(i8*, <1 x i64>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v1i64(i8*, <1 x i64>, i32) nounwind

	declare void @llvm.arm.neon.vst1.v16i8(i8*, <16 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v16i8(i8*, <16 x i8>, i32) nounwind
	declare void @llvm.arm.neon.vst1.v8i16(i8*, <8 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v8i16(i8*, <8 x i16>, i32) nounwind
	declare void @llvm.arm.neon.vst1.v4i32(i8*, <4 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v4i32(i8*, <4 x i32>, i32) nounwind
	declare void @llvm.arm.neon.vst1.v4f32(i8*, <4 x float>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v4f32(i8*, <4 x float>, i32) nounwind
	declare void @llvm.arm.neon.vst1.v2i64(i8*, <2 x i64>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v2i64(i8*, <2 x i64>, i32) nounwind
	declare void @llvm.arm.neon.vst1.v2f64(i8*, <2 x double>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v2f64(i8*, <2 x double>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/vst2.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s

	define void @vst2i8(i8* %A, <8 x i8>* %B) nounwind {			define void @vst2i8(i8* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vst2i8:			;CHECK-LABEL: vst2i8:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vst2.8 {d16, d17}, [r0:64]			;CHECK: vst2.8 {d16, d17}, [r0:64]
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	call void @llvm.arm.neon.vst2.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 8)			call void @llvm.arm.neon.vst2.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 8)
	ret void			ret void
	}			}

	;Check for a post-increment updating store with register increment.			;Check for a post-increment updating store with register increment.
	define void @vst2i8_update(i8** %ptr, <8 x i8>* %B, i32 %inc) nounwind {			define void @vst2i8_update(i8** %ptr, <8 x i8>* %B, i32 %inc) nounwind {
	;CHECK-LABEL: vst2i8_update:			;CHECK-LABEL: vst2i8_update:
	;CHECK: vst2.8 {d16, d17}, [r1], r2			;CHECK: vst2.8 {d16, d17}, [r1], r2
	%A = load i8, i8* %ptr			%A = load i8, i8* %ptr
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	call void @llvm.arm.neon.vst2.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 4)			call void @llvm.arm.neon.vst2.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 4)
	%tmp2 = getelementptr i8, i8* %A, i32 %inc			%tmp2 = getelementptr i8, i8* %A, i32 %inc
	store i8* %tmp2, i8** %ptr			store i8* %tmp2, i8** %ptr
	ret void			ret void
	}			}

	define void @vst2i16(i16* %A, <4 x i16>* %B) nounwind {			define void @vst2i16(i16* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vst2i16:			;CHECK-LABEL: vst2i16:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vst2.16 {d16, d17}, [r0:128]			;CHECK: vst2.16 {d16, d17}, [r0:128]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <4 x i16>, <4 x i16>* %B			%tmp1 = load <4 x i16>, <4 x i16>* %B
	call void @llvm.arm.neon.vst2.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 32)			call void @llvm.arm.neon.vst2.p0i8.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 32)
	ret void			ret void
	}			}

	define void @vst2i32(i32* %A, <2 x i32>* %B) nounwind {			define void @vst2i32(i32* %A, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vst2i32:			;CHECK-LABEL: vst2i32:
	;CHECK: vst2.32			;CHECK: vst2.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <2 x i32>, <2 x i32>* %B			%tmp1 = load <2 x i32>, <2 x i32>* %B
	call void @llvm.arm.neon.vst2.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1)			call void @llvm.arm.neon.vst2.p0i8.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst2f(float* %A, <2 x float>* %B) nounwind {			define void @vst2f(float* %A, <2 x float>* %B) nounwind {
	;CHECK-LABEL: vst2f:			;CHECK-LABEL: vst2f:
	;CHECK: vst2.32			;CHECK: vst2.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <2 x float>, <2 x float>* %B			%tmp1 = load <2 x float>, <2 x float>* %B
	call void @llvm.arm.neon.vst2.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst2.p0i8.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst2i64(i64* %A, <1 x i64>* %B) nounwind {			define void @vst2i64(i64* %A, <1 x i64>* %B) nounwind {
	;CHECK-LABEL: vst2i64:			;CHECK-LABEL: vst2i64:
	;Check the alignment value. Max for this instruction is 128 bits:			;Check the alignment value. Max for this instruction is 128 bits:
	;CHECK: vst1.64 {d16, d17}, [r0:128]			;CHECK: vst1.64 {d16, d17}, [r0:128]
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = load <1 x i64>, <1 x i64>* %B			%tmp1 = load <1 x i64>, <1 x i64>* %B
	call void @llvm.arm.neon.vst2.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 32)			call void @llvm.arm.neon.vst2.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 32)
	ret void			ret void
	}			}

	;Check for a post-increment updating store.			;Check for a post-increment updating store.
	define void @vst2i64_update(i64** %ptr, <1 x i64>* %B) nounwind {			define void @vst2i64_update(i64** %ptr, <1 x i64>* %B) nounwind {
	;CHECK-LABEL: vst2i64_update:			;CHECK-LABEL: vst2i64_update:
	;CHECK: vst1.64 {d16, d17}, [r1:64]!			;CHECK: vst1.64 {d16, d17}, [r1:64]!
	%A = load i64, i64* %ptr			%A = load i64, i64* %ptr
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = load <1 x i64>, <1 x i64>* %B			%tmp1 = load <1 x i64>, <1 x i64>* %B
	call void @llvm.arm.neon.vst2.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 8)			call void @llvm.arm.neon.vst2.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 8)
	%tmp2 = getelementptr i64, i64* %A, i32 2			%tmp2 = getelementptr i64, i64* %A, i32 2
	store i64* %tmp2, i64** %ptr			store i64* %tmp2, i64** %ptr
	ret void			ret void
	}			}

	define void @vst2Qi8(i8* %A, <16 x i8>* %B) nounwind {			define void @vst2Qi8(i8* %A, <16 x i8>* %B) nounwind {
	;CHECK-LABEL: vst2Qi8:			;CHECK-LABEL: vst2Qi8:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vst2.8 {d16, d17, d18, d19}, [r0:64]			;CHECK: vst2.8 {d16, d17, d18, d19}, [r0:64]
	%tmp1 = load <16 x i8>, <16 x i8>* %B			%tmp1 = load <16 x i8>, <16 x i8>* %B
	call void @llvm.arm.neon.vst2.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 8)			call void @llvm.arm.neon.vst2.p0i8.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 8)
	ret void			ret void
	}			}

	define void @vst2Qi16(i16* %A, <8 x i16>* %B) nounwind {			define void @vst2Qi16(i16* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vst2Qi16:			;CHECK-LABEL: vst2Qi16:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vst2.16 {d16, d17, d18, d19}, [r0:128]			;CHECK: vst2.16 {d16, d17, d18, d19}, [r0:128]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	call void @llvm.arm.neon.vst2.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 16)			call void @llvm.arm.neon.vst2.p0i8.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 16)
	ret void			ret void
	}			}

	define void @vst2Qi32(i32* %A, <4 x i32>* %B) nounwind {			define void @vst2Qi32(i32* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vst2Qi32:			;CHECK-LABEL: vst2Qi32:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vst2.32 {d16, d17, d18, d19}, [r0:256]			;CHECK: vst2.32 {d16, d17, d18, d19}, [r0:256]
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <4 x i32>, <4 x i32>* %B			%tmp1 = load <4 x i32>, <4 x i32>* %B
	call void @llvm.arm.neon.vst2.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 64)			call void @llvm.arm.neon.vst2.p0i8.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 64)
	ret void			ret void
	}			}

	define void @vst2Qf(float* %A, <4 x float>* %B) nounwind {			define void @vst2Qf(float* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vst2Qf:			;CHECK-LABEL: vst2Qf:
	;CHECK: vst2.32			;CHECK: vst2.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <4 x float>, <4 x float>* %B			%tmp1 = load <4 x float>, <4 x float>* %B
	call void @llvm.arm.neon.vst2.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst2.p0i8.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, i32 1)
	ret void			ret void
	}			}

	define i8* @vst2update(i8* %out, <4 x i16>* %B) nounwind {			define i8* @vst2update(i8* %out, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vst2update:			;CHECK-LABEL: vst2update:
	;CHECK: vst2.16 {d16, d17}, [r0]!			;CHECK: vst2.16 {d16, d17}, [r0]!
	%tmp1 = load <4 x i16>, <4 x i16>* %B			%tmp1 = load <4 x i16>, <4 x i16>* %B
	tail call void @llvm.arm.neon.vst2.v4i16(i8* %out, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 2)			tail call void @llvm.arm.neon.vst2.p0i8.v4i16(i8* %out, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 2)
	%t5 = getelementptr inbounds i8, i8* %out, i32 16			%t5 = getelementptr inbounds i8, i8* %out, i32 16
	ret i8* %t5			ret i8* %t5
	}			}

	define i8* @vst2update2(i8 * %out, <4 x float> * %this) nounwind optsize ssp align 2 {			define i8* @vst2update2(i8 * %out, <4 x float> * %this) nounwind optsize ssp align 2 {
	;CHECK-LABEL: vst2update2:			;CHECK-LABEL: vst2update2:
	;CHECK: vst2.32 {d16, d17, d18, d19}, [r0]!			;CHECK: vst2.32 {d16, d17, d18, d19}, [r0]!
	%tmp1 = load <4 x float>, <4 x float>* %this			%tmp1 = load <4 x float>, <4 x float>* %this
	call void @llvm.arm.neon.vst2.v4f32(i8* %out, <4 x float> %tmp1, <4 x float> %tmp1, i32 4) nounwind			call void @llvm.arm.neon.vst2.p0i8.v4f32(i8* %out, <4 x float> %tmp1, <4 x float> %tmp1, i32 4) nounwind
	%tmp2 = getelementptr inbounds i8, i8* %out, i32 32			%tmp2 = getelementptr inbounds i8, i8* %out, i32 32
	ret i8* %tmp2			ret i8* %tmp2
	}			}

	declare void @llvm.arm.neon.vst2.v8i8(i8*, <8 x i8>, <8 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v8i8(i8*, <8 x i8>, <8 x i8>, i32) nounwind
	declare void @llvm.arm.neon.vst2.v4i16(i8*, <4 x i16>, <4 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v4i16(i8*, <4 x i16>, <4 x i16>, i32) nounwind
	declare void @llvm.arm.neon.vst2.v2i32(i8*, <2 x i32>, <2 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v2i32(i8*, <2 x i32>, <2 x i32>, i32) nounwind
	declare void @llvm.arm.neon.vst2.v2f32(i8*, <2 x float>, <2 x float>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v2f32(i8*, <2 x float>, <2 x float>, i32) nounwind
	declare void @llvm.arm.neon.vst2.v1i64(i8*, <1 x i64>, <1 x i64>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v1i64(i8*, <1 x i64>, <1 x i64>, i32) nounwind

	declare void @llvm.arm.neon.vst2.v16i8(i8*, <16 x i8>, <16 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v16i8(i8*, <16 x i8>, <16 x i8>, i32) nounwind
	declare void @llvm.arm.neon.vst2.v8i16(i8*, <8 x i16>, <8 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v8i16(i8*, <8 x i16>, <8 x i16>, i32) nounwind
	declare void @llvm.arm.neon.vst2.v4i32(i8*, <4 x i32>, <4 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v4i32(i8*, <4 x i32>, <4 x i32>, i32) nounwind
	declare void @llvm.arm.neon.vst2.v4f32(i8*, <4 x float>, <4 x float>, i32) nounwind			declare void @llvm.arm.neon.vst2.p0i8.v4f32(i8*, <4 x float>, <4 x float>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/vst3.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon -fast-isel=0 -O0 %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+neon -fast-isel=0 -O0 %s -o - \| FileCheck %s

	define void @vst3i8(i8* %A, <8 x i8>* %B) nounwind {			define void @vst3i8(i8* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vst3i8:			;CHECK-LABEL: vst3i8:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;This test runs at -O0 so do not check for specific register numbers.			;This test runs at -O0 so do not check for specific register numbers.
	;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]			;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	call void @llvm.arm.neon.vst3.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 32)			call void @llvm.arm.neon.vst3.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 32)
	ret void			ret void
	}			}

	define void @vst3i16(i16* %A, <4 x i16>* %B) nounwind {			define void @vst3i16(i16* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vst3i16:			;CHECK-LABEL: vst3i16:
	;CHECK: vst3.16			;CHECK: vst3.16
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <4 x i16>, <4 x i16>* %B			%tmp1 = load <4 x i16>, <4 x i16>* %B
	call void @llvm.arm.neon.vst3.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst3i32(i32* %A, <2 x i32>* %B) nounwind {			define void @vst3i32(i32* %A, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vst3i32:			;CHECK-LABEL: vst3i32:
	;CHECK: vst3.32			;CHECK: vst3.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <2 x i32>, <2 x i32>* %B			%tmp1 = load <2 x i32>, <2 x i32>* %B
	call void @llvm.arm.neon.vst3.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1)
	ret void			ret void
	}			}

	;Check for a post-increment updating store.			;Check for a post-increment updating store.
	define void @vst3i32_update(i32** %ptr, <2 x i32>* %B) nounwind {			define void @vst3i32_update(i32** %ptr, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vst3i32_update:			;CHECK-LABEL: vst3i32_update:
	;CHECK: vst3.32 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}]!			;CHECK: vst3.32 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}]!
	%A = load i32, i32* %ptr			%A = load i32, i32* %ptr
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <2 x i32>, <2 x i32>* %B			%tmp1 = load <2 x i32>, <2 x i32>* %B
	call void @llvm.arm.neon.vst3.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1)
	%tmp2 = getelementptr i32, i32* %A, i32 6			%tmp2 = getelementptr i32, i32* %A, i32 6
	store i32* %tmp2, i32** %ptr			store i32* %tmp2, i32** %ptr
	ret void			ret void
	}			}

	define void @vst3f(float* %A, <2 x float>* %B) nounwind {			define void @vst3f(float* %A, <2 x float>* %B) nounwind {
	;CHECK-LABEL: vst3f:			;CHECK-LABEL: vst3f:
	;CHECK: vst3.32			;CHECK: vst3.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <2 x float>, <2 x float>* %B			%tmp1 = load <2 x float>, <2 x float>* %B
	call void @llvm.arm.neon.vst3.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst3i64(i64* %A, <1 x i64>* %B) nounwind {			define void @vst3i64(i64* %A, <1 x i64>* %B) nounwind {
	;CHECK-LABEL: vst3i64:			;CHECK-LABEL: vst3i64:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;This test runs at -O0 so do not check for specific register numbers.			;This test runs at -O0 so do not check for specific register numbers.
	;CHECK: vst1.64 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]			;CHECK: vst1.64 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = load <1 x i64>, <1 x i64>* %B			%tmp1 = load <1 x i64>, <1 x i64>* %B
	call void @llvm.arm.neon.vst3.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 16)			call void @llvm.arm.neon.vst3.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 16)
	ret void			ret void
	}			}

	define void @vst3i64_update(i64** %ptr, <1 x i64>* %B) nounwind {			define void @vst3i64_update(i64** %ptr, <1 x i64>* %B) nounwind {
	;CHECK-LABEL: vst3i64_update			;CHECK-LABEL: vst3i64_update
	;CHECK: vst1.64 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}]!			;CHECK: vst1.64 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}]!
	%A = load i64, i64* %ptr			%A = load i64, i64* %ptr
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = load <1 x i64>, <1 x i64>* %B			%tmp1 = load <1 x i64>, <1 x i64>* %B
	call void @llvm.arm.neon.vst3.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)
	%tmp2 = getelementptr i64, i64* %A, i32 3			%tmp2 = getelementptr i64, i64* %A, i32 3
	store i64* %tmp2, i64** %ptr			store i64* %tmp2, i64** %ptr
	ret void			ret void
	}			}

	define void @vst3Qi8(i8* %A, <16 x i8>* %B) nounwind {			define void @vst3Qi8(i8* %A, <16 x i8>* %B) nounwind {
	;CHECK-LABEL: vst3Qi8:			;CHECK-LABEL: vst3Qi8:
	;Check the alignment value. Max for this instruction is 64 bits:			;Check the alignment value. Max for this instruction is 64 bits:
	;This test runs at -O0 so do not check for specific register numbers.			;This test runs at -O0 so do not check for specific register numbers.
	;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]!			;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]!
	;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]			;CHECK: vst3.8 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}:64]
	%tmp1 = load <16 x i8>, <16 x i8>* %B			%tmp1 = load <16 x i8>, <16 x i8>* %B
	call void @llvm.arm.neon.vst3.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 32)			call void @llvm.arm.neon.vst3.p0i8.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 32)
	ret void			ret void
	}			}

	define void @vst3Qi16(i16* %A, <8 x i16>* %B) nounwind {			define void @vst3Qi16(i16* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vst3Qi16:			;CHECK-LABEL: vst3Qi16:
	;CHECK: vst3.16			;CHECK: vst3.16
	;CHECK: vst3.16			;CHECK: vst3.16
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	call void @llvm.arm.neon.vst3.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1)
	ret void			ret void
	}			}

	;Check for a post-increment updating store.			;Check for a post-increment updating store.
	define void @vst3Qi16_update(i16** %ptr, <8 x i16>* %B) nounwind {			define void @vst3Qi16_update(i16** %ptr, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vst3Qi16_update:			;CHECK-LABEL: vst3Qi16_update:
	;CHECK: vst3.16 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}]!			;CHECK: vst3.16 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}]!
	;CHECK: vst3.16 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}]!			;CHECK: vst3.16 {d{{.}}, d{{.}}, d{{.}}}, [r{{.}}]!
	%A = load i16, i16* %ptr			%A = load i16, i16* %ptr
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	call void @llvm.arm.neon.vst3.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1)
	%tmp2 = getelementptr i16, i16* %A, i32 24			%tmp2 = getelementptr i16, i16* %A, i32 24
	store i16* %tmp2, i16** %ptr			store i16* %tmp2, i16** %ptr
	ret void			ret void
	}			}

	define void @vst3Qi32(i32* %A, <4 x i32>* %B) nounwind {			define void @vst3Qi32(i32* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vst3Qi32:			;CHECK-LABEL: vst3Qi32:
	;CHECK: vst3.32			;CHECK: vst3.32
	;CHECK: vst3.32			;CHECK: vst3.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <4 x i32>, <4 x i32>* %B			%tmp1 = load <4 x i32>, <4 x i32>* %B
	call void @llvm.arm.neon.vst3.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst3Qf(float* %A, <4 x float>* %B) nounwind {			define void @vst3Qf(float* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vst3Qf:			;CHECK-LABEL: vst3Qf:
	;CHECK: vst3.32			;CHECK: vst3.32
	;CHECK: vst3.32			;CHECK: vst3.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <4 x float>, <4 x float>* %B			%tmp1 = load <4 x float>, <4 x float>* %B
	call void @llvm.arm.neon.vst3.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst3.p0i8.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1)
	ret void			ret void
	}			}

	declare void @llvm.arm.neon.vst3.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32) nounwind
	declare void @llvm.arm.neon.vst3.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, i32) nounwind
	declare void @llvm.arm.neon.vst3.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, i32) nounwind
	declare void @llvm.arm.neon.vst3.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, i32) nounwind
	declare void @llvm.arm.neon.vst3.v1i64(i8*, <1 x i64>, <1 x i64>, <1 x i64>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v1i64(i8*, <1 x i64>, <1 x i64>, <1 x i64>, i32) nounwind

	declare void @llvm.arm.neon.vst3.v16i8(i8*, <16 x i8>, <16 x i8>, <16 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v16i8(i8*, <16 x i8>, <16 x i8>, <16 x i8>, i32) nounwind
	declare void @llvm.arm.neon.vst3.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, i32) nounwind
	declare void @llvm.arm.neon.vst3.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, i32) nounwind
	declare void @llvm.arm.neon.vst3.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, i32) nounwind			declare void @llvm.arm.neon.vst3.p0i8.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/vst4.ll

	; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm-eabi -mattr=+neon %s -o - \| FileCheck %s

	define void @vst4i8(i8* %A, <8 x i8>* %B) nounwind {			define void @vst4i8(i8* %A, <8 x i8>* %B) nounwind {
	;CHECK-LABEL: vst4i8:			;CHECK-LABEL: vst4i8:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vst4.8 {d16, d17, d18, d19}, [r0:64]			;CHECK: vst4.8 {d16, d17, d18, d19}, [r0:64]
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	call void @llvm.arm.neon.vst4.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 8)			call void @llvm.arm.neon.vst4.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 8)
	ret void			ret void
	}			}

	;Check for a post-increment updating store with register increment.			;Check for a post-increment updating store with register increment.
	define void @vst4i8_update(i8** %ptr, <8 x i8>* %B, i32 %inc) nounwind {			define void @vst4i8_update(i8** %ptr, <8 x i8>* %B, i32 %inc) nounwind {
	;CHECK-LABEL: vst4i8_update:			;CHECK-LABEL: vst4i8_update:
	;CHECK: vst4.8 {d16, d17, d18, d19}, [r1:128], r2			;CHECK: vst4.8 {d16, d17, d18, d19}, [r1:128], r2
	%A = load i8, i8* %ptr			%A = load i8, i8* %ptr
	%tmp1 = load <8 x i8>, <8 x i8>* %B			%tmp1 = load <8 x i8>, <8 x i8>* %B
	call void @llvm.arm.neon.vst4.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 16)			call void @llvm.arm.neon.vst4.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 16)
	%tmp2 = getelementptr i8, i8* %A, i32 %inc			%tmp2 = getelementptr i8, i8* %A, i32 %inc
	store i8* %tmp2, i8** %ptr			store i8* %tmp2, i8** %ptr
	ret void			ret void
	}			}

	define void @vst4i16(i16* %A, <4 x i16>* %B) nounwind {			define void @vst4i16(i16* %A, <4 x i16>* %B) nounwind {
	;CHECK-LABEL: vst4i16:			;CHECK-LABEL: vst4i16:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vst4.16 {d16, d17, d18, d19}, [r0:128]			;CHECK: vst4.16 {d16, d17, d18, d19}, [r0:128]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <4 x i16>, <4 x i16>* %B			%tmp1 = load <4 x i16>, <4 x i16>* %B
	call void @llvm.arm.neon.vst4.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 16)			call void @llvm.arm.neon.vst4.p0i8.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 16)
	ret void			ret void
	}			}

	define void @vst4i32(i32* %A, <2 x i32>* %B) nounwind {			define void @vst4i32(i32* %A, <2 x i32>* %B) nounwind {
	;CHECK-LABEL: vst4i32:			;CHECK-LABEL: vst4i32:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vst4.32 {d16, d17, d18, d19}, [r0:256]			;CHECK: vst4.32 {d16, d17, d18, d19}, [r0:256]
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <2 x i32>, <2 x i32>* %B			%tmp1 = load <2 x i32>, <2 x i32>* %B
	call void @llvm.arm.neon.vst4.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 32)			call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 32)
	ret void			ret void
	}			}

	define void @vst4f(float* %A, <2 x float>* %B) nounwind {			define void @vst4f(float* %A, <2 x float>* %B) nounwind {
	;CHECK-LABEL: vst4f:			;CHECK-LABEL: vst4f:
	;CHECK: vst4.32			;CHECK: vst4.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <2 x float>, <2 x float>* %B			%tmp1 = load <2 x float>, <2 x float>* %B
	call void @llvm.arm.neon.vst4.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst4.p0i8.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst4i64(i64* %A, <1 x i64>* %B) nounwind {			define void @vst4i64(i64* %A, <1 x i64>* %B) nounwind {
	;CHECK-LABEL: vst4i64:			;CHECK-LABEL: vst4i64:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vst1.64 {d16, d17, d18, d19}, [r0:256]			;CHECK: vst1.64 {d16, d17, d18, d19}, [r0:256]
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = load <1 x i64>, <1 x i64>* %B			%tmp1 = load <1 x i64>, <1 x i64>* %B
	call void @llvm.arm.neon.vst4.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 64)			call void @llvm.arm.neon.vst4.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 64)
	ret void			ret void
	}			}

	define void @vst4i64_update(i64** %ptr, <1 x i64>* %B) nounwind {			define void @vst4i64_update(i64** %ptr, <1 x i64>* %B) nounwind {
	;CHECK-LABEL: vst4i64_update:			;CHECK-LABEL: vst4i64_update:
	;CHECK: vst1.64 {d16, d17, d18, d19}, [r1]!			;CHECK: vst1.64 {d16, d17, d18, d19}, [r1]!
	%A = load i64, i64* %ptr			%A = load i64, i64* %ptr
	%tmp0 = bitcast i64* %A to i8*			%tmp0 = bitcast i64* %A to i8*
	%tmp1 = load <1 x i64>, <1 x i64>* %B			%tmp1 = load <1 x i64>, <1 x i64>* %B
	call void @llvm.arm.neon.vst4.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)			call void @llvm.arm.neon.vst4.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)
	%tmp2 = getelementptr i64, i64* %A, i32 4			%tmp2 = getelementptr i64, i64* %A, i32 4
	store i64* %tmp2, i64** %ptr			store i64* %tmp2, i64** %ptr
	ret void			ret void
	}			}

	define void @vst4Qi8(i8* %A, <16 x i8>* %B) nounwind {			define void @vst4Qi8(i8* %A, <16 x i8>* %B) nounwind {
	;CHECK-LABEL: vst4Qi8:			;CHECK-LABEL: vst4Qi8:
	;Check the alignment value. Max for this instruction is 256 bits:			;Check the alignment value. Max for this instruction is 256 bits:
	;CHECK: vst4.8 {d16, d18, d20, d22}, [r0:256]!			;CHECK: vst4.8 {d16, d18, d20, d22}, [r0:256]!
	;CHECK: vst4.8 {d17, d19, d21, d23}, [r0:256]			;CHECK: vst4.8 {d17, d19, d21, d23}, [r0:256]
	%tmp1 = load <16 x i8>, <16 x i8>* %B			%tmp1 = load <16 x i8>, <16 x i8>* %B
	call void @llvm.arm.neon.vst4.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 64)			call void @llvm.arm.neon.vst4.p0i8.v16i8(i8* %A, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, <16 x i8> %tmp1, i32 64)
	ret void			ret void
	}			}

	define void @vst4Qi16(i16* %A, <8 x i16>* %B) nounwind {			define void @vst4Qi16(i16* %A, <8 x i16>* %B) nounwind {
	;CHECK-LABEL: vst4Qi16:			;CHECK-LABEL: vst4Qi16:
	;Check for no alignment specifier.			;Check for no alignment specifier.
	;CHECK: vst4.16 {d16, d18, d20, d22}, [r0]!			;CHECK: vst4.16 {d16, d18, d20, d22}, [r0]!
	;CHECK: vst4.16 {d17, d19, d21, d23}, [r0]			;CHECK: vst4.16 {d17, d19, d21, d23}, [r0]
	%tmp0 = bitcast i16* %A to i8*			%tmp0 = bitcast i16* %A to i8*
	%tmp1 = load <8 x i16>, <8 x i16>* %B			%tmp1 = load <8 x i16>, <8 x i16>* %B
	call void @llvm.arm.neon.vst4.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1)			call void @llvm.arm.neon.vst4.p0i8.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst4Qi32(i32* %A, <4 x i32>* %B) nounwind {			define void @vst4Qi32(i32* %A, <4 x i32>* %B) nounwind {
	;CHECK-LABEL: vst4Qi32:			;CHECK-LABEL: vst4Qi32:
	;CHECK: vst4.32			;CHECK: vst4.32
	;CHECK: vst4.32			;CHECK: vst4.32
	%tmp0 = bitcast i32* %A to i8*			%tmp0 = bitcast i32* %A to i8*
	%tmp1 = load <4 x i32>, <4 x i32>* %B			%tmp1 = load <4 x i32>, <4 x i32>* %B
	call void @llvm.arm.neon.vst4.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 1)			call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 1)
	ret void			ret void
	}			}

	define void @vst4Qf(float* %A, <4 x float>* %B) nounwind {			define void @vst4Qf(float* %A, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vst4Qf:			;CHECK-LABEL: vst4Qf:
	;CHECK: vst4.32			;CHECK: vst4.32
	;CHECK: vst4.32			;CHECK: vst4.32
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <4 x float>, <4 x float>* %B			%tmp1 = load <4 x float>, <4 x float>* %B
	call void @llvm.arm.neon.vst4.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst4.p0i8.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1)
	ret void			ret void
	}			}

	;Check for a post-increment updating store.			;Check for a post-increment updating store.
	define void @vst4Qf_update(float** %ptr, <4 x float>* %B) nounwind {			define void @vst4Qf_update(float** %ptr, <4 x float>* %B) nounwind {
	;CHECK-LABEL: vst4Qf_update:			;CHECK-LABEL: vst4Qf_update:
	;CHECK: vst4.32 {d16, d18, d20, d22}, [r1]!			;CHECK: vst4.32 {d16, d18, d20, d22}, [r1]!
	;CHECK: vst4.32 {d17, d19, d21, d23}, [r1]!			;CHECK: vst4.32 {d17, d19, d21, d23}, [r1]!
	%A = load float, float* %ptr			%A = load float, float* %ptr
	%tmp0 = bitcast float* %A to i8*			%tmp0 = bitcast float* %A to i8*
	%tmp1 = load <4 x float>, <4 x float>* %B			%tmp1 = load <4 x float>, <4 x float>* %B
	call void @llvm.arm.neon.vst4.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1)			call void @llvm.arm.neon.vst4.p0i8.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1)
	%tmp2 = getelementptr float, float* %A, i32 16			%tmp2 = getelementptr float, float* %A, i32 16
	store float* %tmp2, float** %ptr			store float* %tmp2, float** %ptr
	ret void			ret void
	}			}

	declare void @llvm.arm.neon.vst4.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, i32) nounwind
	declare void @llvm.arm.neon.vst4.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>, i32) nounwind
	declare void @llvm.arm.neon.vst4.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32) nounwind
	declare void @llvm.arm.neon.vst4.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, <2 x float>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, <2 x float>, i32) nounwind
	declare void @llvm.arm.neon.vst4.v1i64(i8*, <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v1i64(i8*, <1 x i64>, <1 x i64>, <1 x i64>, <1 x i64>, i32) nounwind

	declare void @llvm.arm.neon.vst4.v16i8(i8*, <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v16i8(i8*, <16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>, i32) nounwind
	declare void @llvm.arm.neon.vst4.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>, i32) nounwind
	declare void @llvm.arm.neon.vst4.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32) nounwind
	declare void @llvm.arm.neon.vst4.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32) nounwind

llvm/trunk/test/CodeGen/ARM/vstlane.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	; // CHE-CK: vst1.32 {d17[1]}, [r0]
ret void		ret void
}		}

define void @vst2lanei8(i8* %A, <8 x i8>* %B) nounwind {		define void @vst2lanei8(i8* %A, <8 x i8>* %B) nounwind {
;CHECK-LABEL: vst2lanei8:		;CHECK-LABEL: vst2lanei8:
;Check the alignment value. Max for this instruction is 16 bits:		;Check the alignment value. Max for this instruction is 16 bits:
;CHECK: vst2.8 {d16[1], d17[1]}, [r0:16]		;CHECK: vst2.8 {d16[1], d17[1]}, [r0:16]
%tmp1 = load <8 x i8>, <8 x i8>* %B		%tmp1 = load <8 x i8>, <8 x i8>* %B
call void @llvm.arm.neon.vst2lane.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 4)		call void @llvm.arm.neon.vst2lane.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 4)
ret void		ret void
}		}

define void @vst2lanei16(i16* %A, <4 x i16>* %B) nounwind {		define void @vst2lanei16(i16* %A, <4 x i16>* %B) nounwind {
;CHECK-LABEL: vst2lanei16:		;CHECK-LABEL: vst2lanei16:
;Check the alignment value. Max for this instruction is 32 bits:		;Check the alignment value. Max for this instruction is 32 bits:
;CHECK: vst2.16 {d16[1], d17[1]}, [r0:32]		;CHECK: vst2.16 {d16[1], d17[1]}, [r0:32]
%tmp0 = bitcast i16* %A to i8*		%tmp0 = bitcast i16* %A to i8*
%tmp1 = load <4 x i16>, <4 x i16>* %B		%tmp1 = load <4 x i16>, <4 x i16>* %B
call void @llvm.arm.neon.vst2lane.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 8)		call void @llvm.arm.neon.vst2lane.p0i8.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 8)
ret void		ret void
}		}

;Check for a post-increment updating store with register increment.		;Check for a post-increment updating store with register increment.
define void @vst2lanei16_update(i16** %ptr, <4 x i16>* %B, i32 %inc) nounwind {		define void @vst2lanei16_update(i16** %ptr, <4 x i16>* %B, i32 %inc) nounwind {
;CHECK-LABEL: vst2lanei16_update:		;CHECK-LABEL: vst2lanei16_update:
;CHECK: vst2.16 {d16[1], d17[1]}, [r1], r2		;CHECK: vst2.16 {d16[1], d17[1]}, [r1], r2
%A = load i16, i16* %ptr		%A = load i16, i16* %ptr
%tmp0 = bitcast i16* %A to i8*		%tmp0 = bitcast i16* %A to i8*
%tmp1 = load <4 x i16>, <4 x i16>* %B		%tmp1 = load <4 x i16>, <4 x i16>* %B
call void @llvm.arm.neon.vst2lane.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 2)		call void @llvm.arm.neon.vst2lane.p0i8.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 2)
%tmp2 = getelementptr i16, i16* %A, i32 %inc		%tmp2 = getelementptr i16, i16* %A, i32 %inc
store i16* %tmp2, i16** %ptr		store i16* %tmp2, i16** %ptr
ret void		ret void
}		}

define void @vst2lanei32(i32* %A, <2 x i32>* %B) nounwind {		define void @vst2lanei32(i32* %A, <2 x i32>* %B) nounwind {
;CHECK-LABEL: vst2lanei32:		;CHECK-LABEL: vst2lanei32:
;CHECK: vst2.32		;CHECK: vst2.32
%tmp0 = bitcast i32* %A to i8*		%tmp0 = bitcast i32* %A to i8*
%tmp1 = load <2 x i32>, <2 x i32>* %B		%tmp1 = load <2 x i32>, <2 x i32>* %B
call void @llvm.arm.neon.vst2lane.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)		call void @llvm.arm.neon.vst2lane.p0i8.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)
ret void		ret void
}		}

define void @vst2lanef(float* %A, <2 x float>* %B) nounwind {		define void @vst2lanef(float* %A, <2 x float>* %B) nounwind {
;CHECK-LABEL: vst2lanef:		;CHECK-LABEL: vst2lanef:
;CHECK: vst2.32		;CHECK: vst2.32
%tmp0 = bitcast float* %A to i8*		%tmp0 = bitcast float* %A to i8*
%tmp1 = load <2 x float>, <2 x float>* %B		%tmp1 = load <2 x float>, <2 x float>* %B
call void @llvm.arm.neon.vst2lane.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)		call void @llvm.arm.neon.vst2lane.p0i8.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)
ret void		ret void
}		}

define void @vst2laneQi16(i16* %A, <8 x i16>* %B) nounwind {		define void @vst2laneQi16(i16* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: vst2laneQi16:		;CHECK-LABEL: vst2laneQi16:
;Check the (default) alignment.		;Check the (default) alignment.
;CHECK: vst2.16 {d17[1], d19[1]}, [r0]		;CHECK: vst2.16 {d17[1], d19[1]}, [r0]
%tmp0 = bitcast i16* %A to i8*		%tmp0 = bitcast i16* %A to i8*
%tmp1 = load <8 x i16>, <8 x i16>* %B		%tmp1 = load <8 x i16>, <8 x i16>* %B
call void @llvm.arm.neon.vst2lane.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 5, i32 1)		call void @llvm.arm.neon.vst2lane.p0i8.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 5, i32 1)
ret void		ret void
}		}

define void @vst2laneQi32(i32* %A, <4 x i32>* %B) nounwind {		define void @vst2laneQi32(i32* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: vst2laneQi32:		;CHECK-LABEL: vst2laneQi32:
;Check the alignment value. Max for this instruction is 64 bits:		;Check the alignment value. Max for this instruction is 64 bits:
;CHECK: vst2.32 {d17[0], d19[0]}, [r0:64]		;CHECK: vst2.32 {d17[0], d19[0]}, [r0:64]
%tmp0 = bitcast i32* %A to i8*		%tmp0 = bitcast i32* %A to i8*
%tmp1 = load <4 x i32>, <4 x i32>* %B		%tmp1 = load <4 x i32>, <4 x i32>* %B
call void @llvm.arm.neon.vst2lane.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 2, i32 16)		call void @llvm.arm.neon.vst2lane.p0i8.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 2, i32 16)
ret void		ret void
}		}

define void @vst2laneQf(float* %A, <4 x float>* %B) nounwind {		define void @vst2laneQf(float* %A, <4 x float>* %B) nounwind {
;CHECK-LABEL: vst2laneQf:		;CHECK-LABEL: vst2laneQf:
;CHECK: vst2.32		;CHECK: vst2.32
%tmp0 = bitcast float* %A to i8*		%tmp0 = bitcast float* %A to i8*
%tmp1 = load <4 x float>, <4 x float>* %B		%tmp1 = load <4 x float>, <4 x float>* %B
call void @llvm.arm.neon.vst2lane.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, i32 3, i32 1)		call void @llvm.arm.neon.vst2lane.p0i8.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, i32 3, i32 1)
ret void		ret void
}		}

declare void @llvm.arm.neon.vst2lane.v8i8(i8*, <8 x i8>, <8 x i8>, i32, i32) nounwind		declare void @llvm.arm.neon.vst2lane.p0i8.v8i8(i8*, <8 x i8>, <8 x i8>, i32, i32) nounwind
declare void @llvm.arm.neon.vst2lane.v4i16(i8*, <4 x i16>, <4 x i16>, i32, i32) nounwind		declare void @llvm.arm.neon.vst2lane.p0i8.v4i16(i8*, <4 x i16>, <4 x i16>, i32, i32) nounwind
declare void @llvm.arm.neon.vst2lane.v2i32(i8*, <2 x i32>, <2 x i32>, i32, i32) nounwind		declare void @llvm.arm.neon.vst2lane.p0i8.v2i32(i8*, <2 x i32>, <2 x i32>, i32, i32) nounwind
declare void @llvm.arm.neon.vst2lane.v2f32(i8*, <2 x float>, <2 x float>, i32, i32) nounwind		declare void @llvm.arm.neon.vst2lane.p0i8.v2f32(i8*, <2 x float>, <2 x float>, i32, i32) nounwind

declare void @llvm.arm.neon.vst2lane.v8i16(i8*, <8 x i16>, <8 x i16>, i32, i32) nounwind		declare void @llvm.arm.neon.vst2lane.p0i8.v8i16(i8*, <8 x i16>, <8 x i16>, i32, i32) nounwind
declare void @llvm.arm.neon.vst2lane.v4i32(i8*, <4 x i32>, <4 x i32>, i32, i32) nounwind		declare void @llvm.arm.neon.vst2lane.p0i8.v4i32(i8*, <4 x i32>, <4 x i32>, i32, i32) nounwind
declare void @llvm.arm.neon.vst2lane.v4f32(i8*, <4 x float>, <4 x float>, i32, i32) nounwind		declare void @llvm.arm.neon.vst2lane.p0i8.v4f32(i8*, <4 x float>, <4 x float>, i32, i32) nounwind

define void @vst3lanei8(i8* %A, <8 x i8>* %B) nounwind {		define void @vst3lanei8(i8* %A, <8 x i8>* %B) nounwind {
;CHECK-LABEL: vst3lanei8:		;CHECK-LABEL: vst3lanei8:
;CHECK: vst3.8		;CHECK: vst3.8
%tmp1 = load <8 x i8>, <8 x i8>* %B		%tmp1 = load <8 x i8>, <8 x i8>* %B
call void @llvm.arm.neon.vst3lane.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 1)		call void @llvm.arm.neon.vst3lane.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 1)
ret void		ret void
}		}

define void @vst3lanei16(i16* %A, <4 x i16>* %B) nounwind {		define void @vst3lanei16(i16* %A, <4 x i16>* %B) nounwind {
;CHECK-LABEL: vst3lanei16:		;CHECK-LABEL: vst3lanei16:
;Check the (default) alignment value. VST3 does not support alignment.		;Check the (default) alignment value. VST3 does not support alignment.
;CHECK: vst3.16 {d16[1], d17[1], d18[1]}, [r0]		;CHECK: vst3.16 {d16[1], d17[1], d18[1]}, [r0]
%tmp0 = bitcast i16* %A to i8*		%tmp0 = bitcast i16* %A to i8*
%tmp1 = load <4 x i16>, <4 x i16>* %B		%tmp1 = load <4 x i16>, <4 x i16>* %B
call void @llvm.arm.neon.vst3lane.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 8)		call void @llvm.arm.neon.vst3lane.p0i8.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 8)
ret void		ret void
}		}

define void @vst3lanei32(i32* %A, <2 x i32>* %B) nounwind {		define void @vst3lanei32(i32* %A, <2 x i32>* %B) nounwind {
;CHECK-LABEL: vst3lanei32:		;CHECK-LABEL: vst3lanei32:
;CHECK: vst3.32		;CHECK: vst3.32
%tmp0 = bitcast i32* %A to i8*		%tmp0 = bitcast i32* %A to i8*
%tmp1 = load <2 x i32>, <2 x i32>* %B		%tmp1 = load <2 x i32>, <2 x i32>* %B
call void @llvm.arm.neon.vst3lane.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)		call void @llvm.arm.neon.vst3lane.p0i8.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 1)
ret void		ret void
}		}

define void @vst3lanef(float* %A, <2 x float>* %B) nounwind {		define void @vst3lanef(float* %A, <2 x float>* %B) nounwind {
;CHECK-LABEL: vst3lanef:		;CHECK-LABEL: vst3lanef:
;CHECK: vst3.32		;CHECK: vst3.32
%tmp0 = bitcast float* %A to i8*		%tmp0 = bitcast float* %A to i8*
%tmp1 = load <2 x float>, <2 x float>* %B		%tmp1 = load <2 x float>, <2 x float>* %B
call void @llvm.arm.neon.vst3lane.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)		call void @llvm.arm.neon.vst3lane.p0i8.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)
ret void		ret void
}		}

define void @vst3laneQi16(i16* %A, <8 x i16>* %B) nounwind {		define void @vst3laneQi16(i16* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: vst3laneQi16:		;CHECK-LABEL: vst3laneQi16:
;Check the (default) alignment value. VST3 does not support alignment.		;Check the (default) alignment value. VST3 does not support alignment.
;CHECK: vst3.16 {d17[2], d19[2], d21[2]}, [r0]		;CHECK: vst3.16 {d17[2], d19[2], d21[2]}, [r0]
%tmp0 = bitcast i16* %A to i8*		%tmp0 = bitcast i16* %A to i8*
%tmp1 = load <8 x i16>, <8 x i16>* %B		%tmp1 = load <8 x i16>, <8 x i16>* %B
call void @llvm.arm.neon.vst3lane.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 6, i32 8)		call void @llvm.arm.neon.vst3lane.p0i8.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 6, i32 8)
ret void		ret void
}		}

define void @vst3laneQi32(i32* %A, <4 x i32>* %B) nounwind {		define void @vst3laneQi32(i32* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: vst3laneQi32:		;CHECK-LABEL: vst3laneQi32:
;CHECK: vst3.32		;CHECK: vst3.32
%tmp0 = bitcast i32* %A to i8*		%tmp0 = bitcast i32* %A to i8*
%tmp1 = load <4 x i32>, <4 x i32>* %B		%tmp1 = load <4 x i32>, <4 x i32>* %B
call void @llvm.arm.neon.vst3lane.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 0, i32 1)		call void @llvm.arm.neon.vst3lane.p0i8.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 0, i32 1)
ret void		ret void
}		}

;Check for a post-increment updating store.		;Check for a post-increment updating store.
define void @vst3laneQi32_update(i32** %ptr, <4 x i32>* %B) nounwind {		define void @vst3laneQi32_update(i32** %ptr, <4 x i32>* %B) nounwind {
;CHECK-LABEL: vst3laneQi32_update:		;CHECK-LABEL: vst3laneQi32_update:
;CHECK: vst3.32 {d16[0], d18[0], d20[0]}, [r1]!		;CHECK: vst3.32 {d16[0], d18[0], d20[0]}, [r1]!
%A = load i32, i32* %ptr		%A = load i32, i32* %ptr
%tmp0 = bitcast i32* %A to i8*		%tmp0 = bitcast i32* %A to i8*
%tmp1 = load <4 x i32>, <4 x i32>* %B		%tmp1 = load <4 x i32>, <4 x i32>* %B
call void @llvm.arm.neon.vst3lane.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 0, i32 1)		call void @llvm.arm.neon.vst3lane.p0i8.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 0, i32 1)
%tmp2 = getelementptr i32, i32* %A, i32 3		%tmp2 = getelementptr i32, i32* %A, i32 3
store i32* %tmp2, i32** %ptr		store i32* %tmp2, i32** %ptr
ret void		ret void
}		}

define void @vst3laneQf(float* %A, <4 x float>* %B) nounwind {		define void @vst3laneQf(float* %A, <4 x float>* %B) nounwind {
;CHECK-LABEL: vst3laneQf:		;CHECK-LABEL: vst3laneQf:
;CHECK: vst3.32		;CHECK: vst3.32
%tmp0 = bitcast float* %A to i8*		%tmp0 = bitcast float* %A to i8*
%tmp1 = load <4 x float>, <4 x float>* %B		%tmp1 = load <4 x float>, <4 x float>* %B
call void @llvm.arm.neon.vst3lane.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)		call void @llvm.arm.neon.vst3lane.p0i8.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)
ret void		ret void
}		}

declare void @llvm.arm.neon.vst3lane.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind		declare void @llvm.arm.neon.vst3lane.p0i8.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind
declare void @llvm.arm.neon.vst3lane.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind		declare void @llvm.arm.neon.vst3lane.p0i8.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind
declare void @llvm.arm.neon.vst3lane.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind		declare void @llvm.arm.neon.vst3lane.p0i8.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind
declare void @llvm.arm.neon.vst3lane.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, i32, i32) nounwind		declare void @llvm.arm.neon.vst3lane.p0i8.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, i32, i32) nounwind

declare void @llvm.arm.neon.vst3lane.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind		declare void @llvm.arm.neon.vst3lane.p0i8.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind
declare void @llvm.arm.neon.vst3lane.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, i32, i32) nounwind		declare void @llvm.arm.neon.vst3lane.p0i8.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, i32, i32) nounwind
declare void @llvm.arm.neon.vst3lane.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, i32, i32) nounwind		declare void @llvm.arm.neon.vst3lane.p0i8.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, i32, i32) nounwind


define void @vst4lanei8(i8* %A, <8 x i8>* %B) nounwind {		define void @vst4lanei8(i8* %A, <8 x i8>* %B) nounwind {
;CHECK-LABEL: vst4lanei8:		;CHECK-LABEL: vst4lanei8:
;Check the alignment value. Max for this instruction is 32 bits:		;Check the alignment value. Max for this instruction is 32 bits:
;CHECK: vst4.8 {d16[1], d17[1], d18[1], d19[1]}, [r0:32]		;CHECK: vst4.8 {d16[1], d17[1], d18[1], d19[1]}, [r0:32]
%tmp1 = load <8 x i8>, <8 x i8>* %B		%tmp1 = load <8 x i8>, <8 x i8>* %B
call void @llvm.arm.neon.vst4lane.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 8)		call void @llvm.arm.neon.vst4lane.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 8)
ret void		ret void
}		}

;Check for a post-increment updating store.		;Check for a post-increment updating store.
define void @vst4lanei8_update(i8** %ptr, <8 x i8>* %B) nounwind {		define void @vst4lanei8_update(i8** %ptr, <8 x i8>* %B) nounwind {
;CHECK-LABEL: vst4lanei8_update:		;CHECK-LABEL: vst4lanei8_update:
;CHECK: vst4.8 {d16[1], d17[1], d18[1], d19[1]}, [r1:32]!		;CHECK: vst4.8 {d16[1], d17[1], d18[1], d19[1]}, [r1:32]!
%A = load i8, i8* %ptr		%A = load i8, i8* %ptr
%tmp1 = load <8 x i8>, <8 x i8>* %B		%tmp1 = load <8 x i8>, <8 x i8>* %B
call void @llvm.arm.neon.vst4lane.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 8)		call void @llvm.arm.neon.vst4lane.p0i8.v8i8(i8* %A, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, <8 x i8> %tmp1, i32 1, i32 8)
%tmp2 = getelementptr i8, i8* %A, i32 4		%tmp2 = getelementptr i8, i8* %A, i32 4
store i8* %tmp2, i8** %ptr		store i8* %tmp2, i8** %ptr
ret void		ret void
}		}

define void @vst4lanei16(i16* %A, <4 x i16>* %B) nounwind {		define void @vst4lanei16(i16* %A, <4 x i16>* %B) nounwind {
;CHECK-LABEL: vst4lanei16:		;CHECK-LABEL: vst4lanei16:
;CHECK: vst4.16		;CHECK: vst4.16
%tmp0 = bitcast i16* %A to i8*		%tmp0 = bitcast i16* %A to i8*
%tmp1 = load <4 x i16>, <4 x i16>* %B		%tmp1 = load <4 x i16>, <4 x i16>* %B
call void @llvm.arm.neon.vst4lane.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 1)		call void @llvm.arm.neon.vst4lane.p0i8.v4i16(i8* %tmp0, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, <4 x i16> %tmp1, i32 1, i32 1)
ret void		ret void
}		}

define void @vst4lanei32(i32* %A, <2 x i32>* %B) nounwind {		define void @vst4lanei32(i32* %A, <2 x i32>* %B) nounwind {
;CHECK-LABEL: vst4lanei32:		;CHECK-LABEL: vst4lanei32:
;Check the alignment value. Max for this instruction is 128 bits:		;Check the alignment value. Max for this instruction is 128 bits:
;CHECK: vst4.32 {d16[1], d17[1], d18[1], d19[1]}, [r0:128]		;CHECK: vst4.32 {d16[1], d17[1], d18[1], d19[1]}, [r0:128]
%tmp0 = bitcast i32* %A to i8*		%tmp0 = bitcast i32* %A to i8*
%tmp1 = load <2 x i32>, <2 x i32>* %B		%tmp1 = load <2 x i32>, <2 x i32>* %B
call void @llvm.arm.neon.vst4lane.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 16)		call void @llvm.arm.neon.vst4lane.p0i8.v2i32(i8* %tmp0, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, <2 x i32> %tmp1, i32 1, i32 16)
ret void		ret void
}		}

define void @vst4lanef(float* %A, <2 x float>* %B) nounwind {		define void @vst4lanef(float* %A, <2 x float>* %B) nounwind {
;CHECK-LABEL: vst4lanef:		;CHECK-LABEL: vst4lanef:
;CHECK: vst4.32		;CHECK: vst4.32
%tmp0 = bitcast float* %A to i8*		%tmp0 = bitcast float* %A to i8*
%tmp1 = load <2 x float>, <2 x float>* %B		%tmp1 = load <2 x float>, <2 x float>* %B
call void @llvm.arm.neon.vst4lane.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)		call void @llvm.arm.neon.vst4lane.p0i8.v2f32(i8* %tmp0, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, <2 x float> %tmp1, i32 1, i32 1)
ret void		ret void
}		}

define void @vst4laneQi16(i16* %A, <8 x i16>* %B) nounwind {		define void @vst4laneQi16(i16* %A, <8 x i16>* %B) nounwind {
;CHECK-LABEL: vst4laneQi16:		;CHECK-LABEL: vst4laneQi16:
;Check the alignment value. Max for this instruction is 64 bits:		;Check the alignment value. Max for this instruction is 64 bits:
;CHECK: vst4.16 {d17[3], d19[3], d21[3], d23[3]}, [r0:64]		;CHECK: vst4.16 {d17[3], d19[3], d21[3], d23[3]}, [r0:64]
%tmp0 = bitcast i16* %A to i8*		%tmp0 = bitcast i16* %A to i8*
%tmp1 = load <8 x i16>, <8 x i16>* %B		%tmp1 = load <8 x i16>, <8 x i16>* %B
call void @llvm.arm.neon.vst4lane.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 7, i32 16)		call void @llvm.arm.neon.vst4lane.p0i8.v8i16(i8* %tmp0, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, <8 x i16> %tmp1, i32 7, i32 16)
ret void		ret void
}		}

define void @vst4laneQi32(i32* %A, <4 x i32>* %B) nounwind {		define void @vst4laneQi32(i32* %A, <4 x i32>* %B) nounwind {
;CHECK-LABEL: vst4laneQi32:		;CHECK-LABEL: vst4laneQi32:
;Check the (default) alignment.		;Check the (default) alignment.
;CHECK: vst4.32 {d17[0], d19[0], d21[0], d23[0]}, [r0]		;CHECK: vst4.32 {d17[0], d19[0], d21[0], d23[0]}, [r0]
%tmp0 = bitcast i32* %A to i8*		%tmp0 = bitcast i32* %A to i8*
%tmp1 = load <4 x i32>, <4 x i32>* %B		%tmp1 = load <4 x i32>, <4 x i32>* %B
call void @llvm.arm.neon.vst4lane.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 2, i32 1)		call void @llvm.arm.neon.vst4lane.p0i8.v4i32(i8* %tmp0, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, <4 x i32> %tmp1, i32 2, i32 1)
ret void		ret void
}		}

define void @vst4laneQf(float* %A, <4 x float>* %B) nounwind {		define void @vst4laneQf(float* %A, <4 x float>* %B) nounwind {
;CHECK-LABEL: vst4laneQf:		;CHECK-LABEL: vst4laneQf:
;CHECK: vst4.32		;CHECK: vst4.32
%tmp0 = bitcast float* %A to i8*		%tmp0 = bitcast float* %A to i8*
%tmp1 = load <4 x float>, <4 x float>* %B		%tmp1 = load <4 x float>, <4 x float>* %B
call void @llvm.arm.neon.vst4lane.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)		call void @llvm.arm.neon.vst4lane.p0i8.v4f32(i8* %tmp0, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, <4 x float> %tmp1, i32 1, i32 1)
ret void		ret void
}		}

; Make sure this doesn't crash; PR10258		; Make sure this doesn't crash; PR10258
define <8 x i16> @variable_insertelement(<8 x i16> %a, i16 %b, i32 %c) nounwind readnone {		define <8 x i16> @variable_insertelement(<8 x i16> %a, i16 %b, i32 %c) nounwind readnone {
;CHECK-LABEL: variable_insertelement:		;CHECK-LABEL: variable_insertelement:
%r = insertelement <8 x i16> %a, i16 %b, i32 %c		%r = insertelement <8 x i16> %a, i16 %b, i32 %c
ret <8 x i16> %r		ret <8 x i16> %r
}		}

declare void @llvm.arm.neon.vst4lane.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind		declare void @llvm.arm.neon.vst4lane.p0i8.v8i8(i8*, <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8>, i32, i32) nounwind
declare void @llvm.arm.neon.vst4lane.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind		declare void @llvm.arm.neon.vst4lane.p0i8.v4i16(i8*, <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>, i32, i32) nounwind
declare void @llvm.arm.neon.vst4lane.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind		declare void @llvm.arm.neon.vst4lane.p0i8.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32, i32) nounwind
declare void @llvm.arm.neon.vst4lane.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, <2 x float>, i32, i32) nounwind		declare void @llvm.arm.neon.vst4lane.p0i8.v2f32(i8*, <2 x float>, <2 x float>, <2 x float>, <2 x float>, i32, i32) nounwind

declare void @llvm.arm.neon.vst4lane.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind		declare void @llvm.arm.neon.vst4lane.p0i8.v8i16(i8*, <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>, i32, i32) nounwind
declare void @llvm.arm.neon.vst4lane.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32, i32) nounwind		declare void @llvm.arm.neon.vst4lane.p0i8.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32, i32) nounwind
declare void @llvm.arm.neon.vst4lane.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32, i32) nounwind		declare void @llvm.arm.neon.vst4lane.p0i8.v4f32(i8*, <4 x float>, <4 x float>, <4 x float>, <4 x float>, i32, i32) nounwind

llvm/trunk/test/CodeGen/Thumb2/crash.ll

Show All 9 Lines	entry:
%1 = load <4 x i32>, <4 x i32>* %0, align 16 ; <<4 x i32>> [#uses=1]		%1 = load <4 x i32>, <4 x i32>* %0, align 16 ; <<4 x i32>> [#uses=1]
%2 = bitcast i32* %sp1 to <4 x i32>* ; <<4 x i32>*> [#uses=1]		%2 = bitcast i32* %sp1 to <4 x i32>* ; <<4 x i32>*> [#uses=1]
%3 = load <4 x i32>, <4 x i32>* %2, align 16 ; <<4 x i32>> [#uses=1]		%3 = load <4 x i32>, <4 x i32>* %2, align 16 ; <<4 x i32>> [#uses=1]
%4 = bitcast i32* %sp2 to <4 x i32>* ; <<4 x i32>*> [#uses=1]		%4 = bitcast i32* %sp2 to <4 x i32>* ; <<4 x i32>*> [#uses=1]
%5 = load <4 x i32>, <4 x i32>* %4, align 16 ; <<4 x i32>> [#uses=1]		%5 = load <4 x i32>, <4 x i32>* %4, align 16 ; <<4 x i32>> [#uses=1]
%6 = bitcast i32* %sp3 to <4 x i32>* ; <<4 x i32>*> [#uses=1]		%6 = bitcast i32* %sp3 to <4 x i32>* ; <<4 x i32>*> [#uses=1]
%7 = load <4 x i32>, <4 x i32>* %6, align 16 ; <<4 x i32>> [#uses=1]		%7 = load <4 x i32>, <4 x i32>* %6, align 16 ; <<4 x i32>> [#uses=1]
%8 = bitcast i32* %dp to i8* ; <i8*> [#uses=1]		%8 = bitcast i32* %dp to i8* ; <i8*> [#uses=1]
tail call void @llvm.arm.neon.vst4.v4i32(i8* %8, <4 x i32> %1, <4 x i32> %3, <4 x i32> %5, <4 x i32> %7, i32 1)		tail call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* %8, <4 x i32> %1, <4 x i32> %3, <4 x i32> %5, <4 x i32> %7, i32 1)
ret void		ret void
}		}

declare void @llvm.arm.neon.vst4.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32) nounwind		declare void @llvm.arm.neon.vst4.p0i8.v4i32(i8*, <4 x i32>, <4 x i32>, <4 x i32>, <4 x i32>, i32) nounwind

@sbuf = common global [16 x i32] zeroinitializer, align 16 ; <[16 x i32]*> [#uses=5]		@sbuf = common global [16 x i32] zeroinitializer, align 16 ; <[16 x i32]*> [#uses=5]
@dbuf = common global [16 x i32] zeroinitializer ; <[16 x i32]*> [#uses=2]		@dbuf = common global [16 x i32] zeroinitializer ; <[16 x i32]*> [#uses=2]

; This function creates 4 chained INSERT_SUBREGS and then invokes the register scavenger.		; This function creates 4 chained INSERT_SUBREGS and then invokes the register scavenger.
; The first INSERT_SUBREG needs an <undef> use operand for that to work.		; The first INSERT_SUBREG needs an <undef> use operand for that to work.
define arm_apcscc i32 @main() nounwind {		define arm_apcscc i32 @main() nounwind {
bb.nph:		bb.nph:
Show All 9 Lines	bb: ; preds = %bb, %bb.nph
%exitcond = icmp eq i32 %1, 16 ; <i1> [#uses=1]		%exitcond = icmp eq i32 %1, 16 ; <i1> [#uses=1]
br i1 %exitcond, label %bb2, label %bb		br i1 %exitcond, label %bb2, label %bb

bb2: ; preds = %bb		bb2: ; preds = %bb
%2 = load <4 x i32>, <4 x i32>* bitcast ([16 x i32]* @sbuf to <4 x i32>*), align 16 ; <<4 x i32>> [#uses=1]		%2 = load <4 x i32>, <4 x i32>* bitcast ([16 x i32]* @sbuf to <4 x i32>*), align 16 ; <<4 x i32>> [#uses=1]
%3 = load <4 x i32>, <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @sbuf, i32 0, i32 4) to <4 x i32>*), align 16 ; <<4 x i32>> [#uses=1]		%3 = load <4 x i32>, <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @sbuf, i32 0, i32 4) to <4 x i32>*), align 16 ; <<4 x i32>> [#uses=1]
%4 = load <4 x i32>, <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @sbuf, i32 0, i32 8) to <4 x i32>*), align 16 ; <<4 x i32>> [#uses=1]		%4 = load <4 x i32>, <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @sbuf, i32 0, i32 8) to <4 x i32>*), align 16 ; <<4 x i32>> [#uses=1]
%5 = load <4 x i32>, <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @sbuf, i32 0, i32 12) to <4 x i32>*), align 16 ; <<4 x i32>> [#uses=1]		%5 = load <4 x i32>, <4 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @sbuf, i32 0, i32 12) to <4 x i32>*), align 16 ; <<4 x i32>> [#uses=1]
tail call void @llvm.arm.neon.vst4.v4i32(i8* bitcast ([16 x i32]* @dbuf to i8*), <4 x i32> %2, <4 x i32> %3, <4 x i32> %4, <4 x i32> %5, i32 1) nounwind		tail call void @llvm.arm.neon.vst4.p0i8.v4i32(i8* bitcast ([16 x i32]* @dbuf to i8*), <4 x i32> %2, <4 x i32> %3, <4 x i32> %4, <4 x i32> %5, i32 1) nounwind
ret i32 0		ret i32 0
}		}

; PR12389		; PR12389
; Make sure the DPair register class can spill.		; Make sure the DPair register class can spill.
define void @pr12389(i8* %p) nounwind ssp {		define void @pr12389(i8* %p) nounwind ssp {
entry:		entry:
%vld1 = tail call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %p, i32 1)		%vld1 = tail call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* %p, i32 1)
tail call void asm sideeffect "", "~{q0},~{q1},~{q2},~{q3},~{q4},~{q5},~{q6},~{q7},~{q8},~{q9},~{q10},~{q11},~{q12},~{q13},~{q14},~{q15}"() nounwind		tail call void asm sideeffect "", "~{q0},~{q1},~{q2},~{q3},~{q4},~{q5},~{q6},~{q7},~{q8},~{q9},~{q10},~{q11},~{q12},~{q13},~{q14},~{q15}"() nounwind
tail call void @llvm.arm.neon.vst1.v4f32(i8* %p, <4 x float> %vld1, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v4f32(i8* %p, <4 x float> %vld1, i32 1)
ret void		ret void
}		}

declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly		declare <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8*, i32) nounwind readonly

declare void @llvm.arm.neon.vst1.v4f32(i8*, <4 x float>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v4f32(i8*, <4 x float>, i32) nounwind

; <rdar://problem/11101911>		; <rdar://problem/11101911>
; When an strd is expanded into two str instructions, make sure the first str		; When an strd is expanded into two str instructions, make sure the first str
; doesn't kill the base register. This can happen if the base register is the		; doesn't kill the base register. This can happen if the base register is the
; same as the data register.		; same as the data register.
%class = type { i8, %class, i32 }		%class = type { i8, %class, i32 }
define void @f11101911(%class* %this, i32 %num) ssp align 2 {		define void @f11101911(%class* %this, i32 %num) ssp align 2 {
entry:		entry:
Show All 15 Lines

llvm/trunk/test/CodeGen/Thumb2/machine-licm.ll

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; CHECK: vmov.f32 q{{.*}}, #1.000000e+00			; CHECK: vmov.f32 q{{.*}}, #1.000000e+00
	br i1 undef, label %bb1, label %bb2			br i1 undef, label %bb1, label %bb2

	bb1:			bb1:
	; CHECK: %bb1			; CHECK: %bb1
	%indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %entry ]			%indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %entry ]
	%tmp1 = shl i32 %indvar, 2			%tmp1 = shl i32 %indvar, 2
	%gep1 = getelementptr i8, i8* %ptr1, i32 %tmp1			%gep1 = getelementptr i8, i8* %ptr1, i32 %tmp1
	%tmp2 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %gep1, i32 1)			%tmp2 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* %gep1, i32 1)
	%tmp3 = call <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, <4 x float> %tmp2)			%tmp3 = call <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, <4 x float> %tmp2)
	%gep2 = getelementptr i8, i8* %ptr2, i32 %tmp1			%gep2 = getelementptr i8, i8* %ptr2, i32 %tmp1
	call void @llvm.arm.neon.vst1.v4f32(i8* %gep2, <4 x float> %tmp3, i32 1)			call void @llvm.arm.neon.vst1.p0i8.v4f32(i8* %gep2, <4 x float> %tmp3, i32 1)
	%indvar.next = add i32 %indvar, 1			%indvar.next = add i32 %indvar, 1
	%cond = icmp eq i32 %indvar.next, 10			%cond = icmp eq i32 %indvar.next, 10
	br i1 %cond, label %bb2, label %bb1			br i1 %cond, label %bb2, label %bb1

	bb2:			bb2:
	ret void			ret void
	}			}

	; CHECK-NOT: LCPI1_0:			; CHECK-NOT: LCPI1_0:

	declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly			declare <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8*, i32) nounwind readonly

	declare void @llvm.arm.neon.vst1.v4f32(i8*, <4 x float>, i32) nounwind			declare void @llvm.arm.neon.vst1.p0i8.v4f32(i8*, <4 x float>, i32) nounwind

	declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>, <4 x float>) nounwind readnone			declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>, <4 x float>) nounwind readnone

	; rdar://8241368			; rdar://8241368
	; isel should not fold immediate into eor's which would have prevented LICM.			; isel should not fold immediate into eor's which would have prevented LICM.
	define zeroext i16 @t3(i8 zeroext %data, i16 zeroext %crc) nounwind readnone {			define zeroext i16 @t3(i8 zeroext %data, i16 zeroext %crc) nounwind readnone {
	; CHECK-LABEL: t3:			; CHECK-LABEL: t3:
	bb.nph:			bb.nph:
	Show All 34 Lines

llvm/trunk/test/CodeGen/Thumb2/thumb2-spill-q.ll

	; RUN: llc < %s -mtriple=thumbv7-elf -mattr=+neon -arm-atomic-cfg-tidy=0 \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7-elf -mattr=+neon -arm-atomic-cfg-tidy=0 \| FileCheck %s
	; PR4789			; PR4789

	%bar = type { float, float, float }			%bar = type { float, float, float }
	%baz = type { i32, [16 x %bar], [16 x float], [16 x i32], i8 }			%baz = type { i32, [16 x %bar], [16 x float], [16 x i32], i8 }
	%foo = type { <4 x float> }			%foo = type { <4 x float> }
	%quux = type { i32 (...)*, %baz, i32 }			%quux = type { i32 (...)*, %baz, i32 }
	%quuz = type { %quux, i32, %bar, [128 x i8], [16 x %foo], %foo, %foo, %foo }			%quuz = type { %quux, i32, %bar, [128 x i8], [16 x %foo], %foo, %foo, %foo }

	declare <4 x float> @llvm.arm.neon.vld1.v4f32(i8*, i32) nounwind readonly			declare <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8*, i32) nounwind readonly

	define void @aaa(%quuz* %this, i8* %block) {			define void @aaa(%quuz* %this, i8* %block) {
	; CHECK-LABEL: aaa:			; CHECK-LABEL: aaa:
	; CHECK: bfc r4, #0, #4			; CHECK: bfc r4, #0, #4
	; CHECK: vst1.64 {{.}}[{{.}}:128]			; CHECK: vst1.64 {{.}}[{{.}}:128]
	; CHECK: vld1.64 {{.}}[{{.}}:128]			; CHECK: vld1.64 {{.}}[{{.}}:128]
	entry:			entry:
	%aligned_vec = alloca <4 x float>, align 16			%aligned_vec = alloca <4 x float>, align 16
	%"alloca point" = bitcast i32 0 to i32			%"alloca point" = bitcast i32 0 to i32
	%vecptr = bitcast <4 x float>* %aligned_vec to i8*			%vecptr = bitcast <4 x float>* %aligned_vec to i8*
	%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* %vecptr, i32 1) nounwind			%0 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* %vecptr, i32 1) nounwind
	store float 6.300000e+01, float* undef, align 4			store float 6.300000e+01, float* undef, align 4
	%1 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind ; <<4 x float>> [#uses=1]			%1 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind ; <<4 x float>> [#uses=1]
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%2 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind ; <<4 x float>> [#uses=1]			%2 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind ; <<4 x float>> [#uses=1]
	%ld3 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld3 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld4 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld4 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld5 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld5 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld6 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld6 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld7 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld7 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld8 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld8 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld9 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld9 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld10 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld10 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld11 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld11 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%ld12 = call <4 x float> @llvm.arm.neon.vld1.v4f32(i8* undef, i32 1) nounwind			%ld12 = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8* undef, i32 1) nounwind
	store float 0.000000e+00, float* undef, align 4			store float 0.000000e+00, float* undef, align 4
	%val173 = load <4 x float>, <4 x float>* undef ; <<4 x float>> [#uses=1]			%val173 = load <4 x float>, <4 x float>* undef ; <<4 x float>> [#uses=1]
	br label %bb4			br label %bb4

	bb4: ; preds = %bb193, %entry			bb4: ; preds = %bb193, %entry
	%besterror.0.2264 = phi <4 x float> [ undef, %entry ], [ %besterror.0.0, %bb193 ] ; <<4 x float>> [#uses=2]			%besterror.0.2264 = phi <4 x float> [ undef, %entry ], [ %besterror.0.0, %bb193 ] ; <<4 x float>> [#uses=2]
	%part0.0.0261 = phi <4 x float> [ zeroinitializer, %entry ], [ %23, %bb193 ] ; <<4 x float>> [#uses=2]			%part0.0.0261 = phi <4 x float> [ zeroinitializer, %entry ], [ %23, %bb193 ] ; <<4 x float>> [#uses=2]
	%3 = fmul <4 x float> zeroinitializer, %0 ; <<4 x float>> [#uses=2]			%3 = fmul <4 x float> zeroinitializer, %0 ; <<4 x float>> [#uses=2]
	Show All 39 Lines

llvm/trunk/test/CodeGen/Thumb2/v8_IT_1.ll

	; RUN: llc < %s -mtriple=thumbv8 -mattr=+neon \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv8 -mattr=+neon \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv7 -mattr=+neon -arm-restrict-it \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv7 -mattr=+neon -arm-restrict-it \| FileCheck %s

	;CHECK-LABEL: select_s_v_v:			;CHECK-LABEL: select_s_v_v:
	;CHECK-NOT: it			;CHECK-NOT: it
	;CHECK: bx			;CHECK: bx
	define <16 x i8> @select_s_v_v(i32 %avail, i8* %bar) {			define <16 x i8> @select_s_v_v(i32 %avail, i8* %bar) {
	entry:			entry:
	%vld1 = call <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* %bar, i32 1)			%vld1 = call <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* %bar, i32 1)
	%and = and i32 %avail, 1			%and = and i32 %avail, 1
	%tobool = icmp eq i32 %and, 0			%tobool = icmp eq i32 %and, 0
	%vld1. = select i1 %tobool, <16 x i8> %vld1, <16 x i8> zeroinitializer			%vld1. = select i1 %tobool, <16 x i8> %vld1, <16 x i8> zeroinitializer
	ret <16 x i8> %vld1.			ret <16 x i8> %vld1.
	}			}

	declare <16 x i8> @llvm.arm.neon.vld1.v16i8(i8* , i32 )			declare <16 x i8> @llvm.arm.neon.vld1.v16i8.p0i8(i8* , i32 )

llvm/trunk/test/Transforms/InstCombine/neon-intrinsics.ll

	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	; The alignment arguments for NEON load/store intrinsics can be increased			; The alignment arguments for NEON load/store intrinsics can be increased
	; by instcombine. Check for this.			; by instcombine. Check for this.

	; CHECK: vld4.v2i32({{.*}}, i32 32)			; CHECK: vld4.v2i32.p0i8({{.*}}, i32 32)
	; CHECK: vst4.v2i32({{.*}}, i32 16)			; CHECK: vst4.p0i8.v2i32({{.*}}, i32 16)

	@x = common global [8 x i32] zeroinitializer, align 32			@x = common global [8 x i32] zeroinitializer, align 32
	@y = common global [8 x i32] zeroinitializer, align 16			@y = common global [8 x i32] zeroinitializer, align 16

	%struct.__neon_int32x2x4_t = type { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> }			%struct.__neon_int32x2x4_t = type { <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32> }

	define void @test() nounwind ssp {			define void @test() nounwind ssp {
	%tmp1 = call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32(i8* bitcast ([8 x i32]* @x to i8*), i32 1)			%tmp1 = call %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32.p0i8(i8* bitcast ([8 x i32]* @x to i8*), i32 1)
	%tmp2 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 0			%tmp2 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 0
	%tmp3 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 1			%tmp3 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 1
	%tmp4 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 2			%tmp4 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 2
	%tmp5 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 3			%tmp5 = extractvalue %struct.__neon_int32x2x4_t %tmp1, 3
	call void @llvm.arm.neon.vst4.v2i32(i8* bitcast ([8 x i32]* @y to i8*), <2 x i32> %tmp2, <2 x i32> %tmp3, <2 x i32> %tmp4, <2 x i32> %tmp5, i32 1)			call void @llvm.arm.neon.vst4.p0i8.v2i32(i8* bitcast ([8 x i32]* @y to i8*), <2 x i32> %tmp2, <2 x i32> %tmp3, <2 x i32> %tmp4, <2 x i32> %tmp5, i32 1)
	ret void			ret void
	}			}

	declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32(i8*, i32) nounwind readonly			declare %struct.__neon_int32x2x4_t @llvm.arm.neon.vld4.v2i32.p0i8(i8*, i32) nounwind readonly
	declare void @llvm.arm.neon.vst4.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32) nounwind			declare void @llvm.arm.neon.vst4.p0i8.v2i32(i8*, <2 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, i32) nounwind

llvm/trunk/test/Transforms/LoopStrengthReduce/ARM/ivchain-ARM.ll

Show First 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	.lr.ph: ; preds = %0
%10 = mul i32 %limit, -64		%10 = mul i32 %limit, -64
br label %11		br label %11

; <label>:11 ; preds = %11, %.lr.ph		; <label>:11 ; preds = %11, %.lr.ph
%.05 = phi i8* [ %ref_data, %.lr.ph ], [ %42, %11 ]		%.05 = phi i8* [ %ref_data, %.lr.ph ], [ %42, %11 ]
%counter.04 = phi i32 [ 0, %.lr.ph ], [ %44, %11 ]		%counter.04 = phi i32 [ 0, %.lr.ph ], [ %44, %11 ]
%result.03 = phi <16 x i8> [ zeroinitializer, %.lr.ph ], [ %41, %11 ]		%result.03 = phi <16 x i8> [ zeroinitializer, %.lr.ph ], [ %41, %11 ]
%.012 = phi <16 x i8>* [ %data, %.lr.ph ], [ %43, %11 ]		%.012 = phi <16 x i8>* [ %data, %.lr.ph ], [ %43, %11 ]
%12 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64(i8* %.05, i32 1) nounwind		%12 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8* %.05, i32 1) nounwind
%13 = getelementptr inbounds i8, i8* %.05, i32 %ref_stride		%13 = getelementptr inbounds i8, i8* %.05, i32 %ref_stride
%14 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64(i8* %13, i32 1) nounwind		%14 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8* %13, i32 1) nounwind
%15 = shufflevector <1 x i64> %12, <1 x i64> %14, <2 x i32> <i32 0, i32 1>		%15 = shufflevector <1 x i64> %12, <1 x i64> %14, <2 x i32> <i32 0, i32 1>
%16 = bitcast <2 x i64> %15 to <16 x i8>		%16 = bitcast <2 x i64> %15 to <16 x i8>
%17 = getelementptr inbounds <16 x i8>, <16 x i8>* %.012, i32 1		%17 = getelementptr inbounds <16 x i8>, <16 x i8>* %.012, i32 1
store <16 x i8> %16, <16 x i8>* %.012, align 4		store <16 x i8> %16, <16 x i8>* %.012, align 4
%18 = getelementptr inbounds i8, i8* %.05, i32 %2		%18 = getelementptr inbounds i8, i8* %.05, i32 %2
%19 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64(i8* %18, i32 1) nounwind		%19 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8* %18, i32 1) nounwind
%20 = getelementptr inbounds i8, i8* %.05, i32 %3		%20 = getelementptr inbounds i8, i8* %.05, i32 %3
%21 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64(i8* %20, i32 1) nounwind		%21 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8* %20, i32 1) nounwind
%22 = shufflevector <1 x i64> %19, <1 x i64> %21, <2 x i32> <i32 0, i32 1>		%22 = shufflevector <1 x i64> %19, <1 x i64> %21, <2 x i32> <i32 0, i32 1>
%23 = bitcast <2 x i64> %22 to <16 x i8>		%23 = bitcast <2 x i64> %22 to <16 x i8>
%24 = getelementptr inbounds <16 x i8>, <16 x i8>* %.012, i32 2		%24 = getelementptr inbounds <16 x i8>, <16 x i8>* %.012, i32 2
store <16 x i8> %23, <16 x i8>* %17, align 4		store <16 x i8> %23, <16 x i8>* %17, align 4
%25 = getelementptr inbounds i8, i8* %.05, i32 %4		%25 = getelementptr inbounds i8, i8* %.05, i32 %4
%26 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64(i8* %25, i32 1) nounwind		%26 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8* %25, i32 1) nounwind
%27 = getelementptr inbounds i8, i8* %.05, i32 %5		%27 = getelementptr inbounds i8, i8* %.05, i32 %5
%28 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64(i8* %27, i32 1) nounwind		%28 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8* %27, i32 1) nounwind
%29 = shufflevector <1 x i64> %26, <1 x i64> %28, <2 x i32> <i32 0, i32 1>		%29 = shufflevector <1 x i64> %26, <1 x i64> %28, <2 x i32> <i32 0, i32 1>
%30 = bitcast <2 x i64> %29 to <16 x i8>		%30 = bitcast <2 x i64> %29 to <16 x i8>
%31 = getelementptr inbounds <16 x i8>, <16 x i8>* %.012, i32 3		%31 = getelementptr inbounds <16 x i8>, <16 x i8>* %.012, i32 3
store <16 x i8> %30, <16 x i8>* %24, align 4		store <16 x i8> %30, <16 x i8>* %24, align 4
%32 = getelementptr inbounds i8, i8* %.05, i32 %6		%32 = getelementptr inbounds i8, i8* %.05, i32 %6
%33 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64(i8* %32, i32 1) nounwind		%33 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8* %32, i32 1) nounwind
%34 = getelementptr inbounds i8, i8* %.05, i32 %7		%34 = getelementptr inbounds i8, i8* %.05, i32 %7
%35 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64(i8* %34, i32 1) nounwind		%35 = tail call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8* %34, i32 1) nounwind
%36 = shufflevector <1 x i64> %33, <1 x i64> %35, <2 x i32> <i32 0, i32 1>		%36 = shufflevector <1 x i64> %33, <1 x i64> %35, <2 x i32> <i32 0, i32 1>
%37 = bitcast <2 x i64> %36 to <16 x i8>		%37 = bitcast <2 x i64> %36 to <16 x i8>
store <16 x i8> %37, <16 x i8>* %31, align 4		store <16 x i8> %37, <16 x i8>* %31, align 4
%38 = add <16 x i8> %16, %23		%38 = add <16 x i8> %16, %23
%39 = add <16 x i8> %38, %30		%39 = add <16 x i8> %38, %30
%40 = add <16 x i8> %39, %37		%40 = add <16 x i8> %39, %37
%41 = add <16 x i8> %result.03, %40		%41 = add <16 x i8> %result.03, %40
%42 = getelementptr i8, i8* %.05, i32 %9		%42 = getelementptr i8, i8* %.05, i32 %9
%43 = getelementptr inbounds <16 x i8>, <16 x i8>* %.012, i32 -64		%43 = getelementptr inbounds <16 x i8>, <16 x i8>* %.012, i32 -64
%44 = add nsw i32 %counter.04, 1		%44 = add nsw i32 %counter.04, 1
%exitcond = icmp eq i32 %44, %limit		%exitcond = icmp eq i32 %44, %limit
br i1 %exitcond, label %._crit_edge, label %11		br i1 %exitcond, label %._crit_edge, label %11

._crit_edge: ; preds = %11		._crit_edge: ; preds = %11
%scevgep = getelementptr <16 x i8>, <16 x i8>* %data, i32 %10		%scevgep = getelementptr <16 x i8>, <16 x i8>* %data, i32 %10
br label %45		br label %45

; <label>:45 ; preds = %._crit_edge, %0		; <label>:45 ; preds = %._crit_edge, %0
%result.0.lcssa = phi <16 x i8> [ %41, %._crit_edge ], [ zeroinitializer, %0 ]		%result.0.lcssa = phi <16 x i8> [ %41, %._crit_edge ], [ zeroinitializer, %0 ]
%.01.lcssa = phi <16 x i8>* [ %scevgep, %._crit_edge ], [ %data, %0 ]		%.01.lcssa = phi <16 x i8>* [ %scevgep, %._crit_edge ], [ %data, %0 ]
store <16 x i8> %result.0.lcssa, <16 x i8>* %.01.lcssa, align 4		store <16 x i8> %result.0.lcssa, <16 x i8>* %.01.lcssa, align 4
ret void		ret void
}		}

declare <1 x i64> @llvm.arm.neon.vld1.v1i64(i8*, i32) nounwind readonly		declare <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8*, i32) nounwind readonly

; Handle chains in which the same offset is used for both loads and		; Handle chains in which the same offset is used for both loads and
; stores to the same array.		; stores to the same array.
; rdar://11410078.		; rdar://11410078.
;		;
; A9: @testReuse		; A9: @testReuse
; A9: %for.body		; A9: %for.body
; A9: vld1.8 {d{{[0-9]+}}}, [[BASE:[r[0-9]+]]], [[INC:r[0-9]]]		; A9: vld1.8 {d{{[0-9]+}}}, [[BASE:[r[0-9]+]]], [[INC:r[0-9]]]
Show All 21 Lines	entry:
%idx.neg6 = sub i32 0, %mul5		%idx.neg6 = sub i32 0, %mul5
%idx.neg10 = sub i32 0, %stride		%idx.neg10 = sub i32 0, %stride
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%i.0110 = phi i32 [ 0, %entry ], [ %inc, %for.body ]		%i.0110 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
%src.addr = phi i8* [ %src, %entry ], [ %add.ptr45, %for.body ]		%src.addr = phi i8* [ %src, %entry ], [ %add.ptr45, %for.body ]
%add.ptr = getelementptr inbounds i8, i8* %src.addr, i32 %idx.neg		%add.ptr = getelementptr inbounds i8, i8* %src.addr, i32 %idx.neg
%vld1 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8(i8* %add.ptr, i32 1)		%vld1 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8* %add.ptr, i32 1)
%add.ptr3 = getelementptr inbounds i8, i8* %src.addr, i32 %idx.neg2		%add.ptr3 = getelementptr inbounds i8, i8* %src.addr, i32 %idx.neg2
%vld2 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8(i8* %add.ptr3, i32 1)		%vld2 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8* %add.ptr3, i32 1)
%add.ptr7 = getelementptr inbounds i8, i8* %src.addr, i32 %idx.neg6		%add.ptr7 = getelementptr inbounds i8, i8* %src.addr, i32 %idx.neg6
%vld3 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8(i8* %add.ptr7, i32 1)		%vld3 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8* %add.ptr7, i32 1)
%add.ptr11 = getelementptr inbounds i8, i8* %src.addr, i32 %idx.neg10		%add.ptr11 = getelementptr inbounds i8, i8* %src.addr, i32 %idx.neg10
%vld4 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8(i8* %add.ptr11, i32 1)		%vld4 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8* %add.ptr11, i32 1)
%vld5 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8(i8* %src.addr, i32 1)		%vld5 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8* %src.addr, i32 1)
%add.ptr17 = getelementptr inbounds i8, i8* %src.addr, i32 %stride		%add.ptr17 = getelementptr inbounds i8, i8* %src.addr, i32 %stride
%vld6 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8(i8* %add.ptr17, i32 1)		%vld6 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8* %add.ptr17, i32 1)
%add.ptr20 = getelementptr inbounds i8, i8* %src.addr, i32 %mul5		%add.ptr20 = getelementptr inbounds i8, i8* %src.addr, i32 %mul5
%vld7 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8(i8* %add.ptr20, i32 1)		%vld7 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8* %add.ptr20, i32 1)
%add.ptr23 = getelementptr inbounds i8, i8* %src.addr, i32 %mul1		%add.ptr23 = getelementptr inbounds i8, i8* %src.addr, i32 %mul1
%vld8 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8(i8* %add.ptr23, i32 1)		%vld8 = tail call <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8* %add.ptr23, i32 1)
%vadd1 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld1, <8 x i8> %vld2) nounwind		%vadd1 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld1, <8 x i8> %vld2) nounwind
%vadd2 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld2, <8 x i8> %vld3) nounwind		%vadd2 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld2, <8 x i8> %vld3) nounwind
%vadd3 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld3, <8 x i8> %vld4) nounwind		%vadd3 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld3, <8 x i8> %vld4) nounwind
%vadd4 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld4, <8 x i8> %vld5) nounwind		%vadd4 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld4, <8 x i8> %vld5) nounwind
%vadd5 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld5, <8 x i8> %vld6) nounwind		%vadd5 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld5, <8 x i8> %vld6) nounwind
%vadd6 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld6, <8 x i8> %vld7) nounwind		%vadd6 = tail call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %vld6, <8 x i8> %vld7) nounwind
tail call void @llvm.arm.neon.vst1.v8i8(i8* %add.ptr3, <8 x i8> %vadd1, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v8i8(i8* %add.ptr3, <8 x i8> %vadd1, i32 1)
tail call void @llvm.arm.neon.vst1.v8i8(i8* %add.ptr7, <8 x i8> %vadd2, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v8i8(i8* %add.ptr7, <8 x i8> %vadd2, i32 1)
tail call void @llvm.arm.neon.vst1.v8i8(i8* %add.ptr11, <8 x i8> %vadd3, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v8i8(i8* %add.ptr11, <8 x i8> %vadd3, i32 1)
tail call void @llvm.arm.neon.vst1.v8i8(i8* %src.addr, <8 x i8> %vadd4, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v8i8(i8* %src.addr, <8 x i8> %vadd4, i32 1)
tail call void @llvm.arm.neon.vst1.v8i8(i8* %add.ptr17, <8 x i8> %vadd5, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v8i8(i8* %add.ptr17, <8 x i8> %vadd5, i32 1)
tail call void @llvm.arm.neon.vst1.v8i8(i8* %add.ptr20, <8 x i8> %vadd6, i32 1)		tail call void @llvm.arm.neon.vst1.p0i8.v8i8(i8* %add.ptr20, <8 x i8> %vadd6, i32 1)
%inc = add nsw i32 %i.0110, 1		%inc = add nsw i32 %i.0110, 1
%add.ptr45 = getelementptr inbounds i8, i8* %src.addr, i32 8		%add.ptr45 = getelementptr inbounds i8, i8* %src.addr, i32 8
%exitcond = icmp eq i32 %inc, 4		%exitcond = icmp eq i32 %inc, 4
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

declare <8 x i8> @llvm.arm.neon.vld1.v8i8(i8*, i32) nounwind readonly		declare <8 x i8> @llvm.arm.neon.vld1.v8i8.p0i8(i8*, i32) nounwind readonly

declare void @llvm.arm.neon.vst1.v8i8(i8*, <8 x i8>, i32) nounwind		declare void @llvm.arm.neon.vst1.p0i8.v8i8(i8*, <8 x i8>, i32) nounwind

declare <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8>, <8 x i8>) nounwind readnone		declare <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8>, <8 x i8>) nounwind readnone

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Take into account address spaces in interleaved access vectorizationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 36081

llvm/trunk/include/llvm/IR/IntrinsicsARM.td

llvm/trunk/lib/IR/AutoUpgrade.cpp

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

llvm/trunk/test/Analysis/BasicAA/cs-cs.ll

llvm/trunk/test/Analysis/BasicAA/intrinsics.ll

llvm/trunk/test/Analysis/TypeBasedAliasAnalysis/intrinsics.ll

llvm/trunk/test/CodeGen/ARM/2010-05-20-NEONSpillCrash.ll

llvm/trunk/test/CodeGen/ARM/2010-05-21-BuildVector.ll

llvm/trunk/test/CodeGen/ARM/2010-06-11-vmovdrr-bitcast.ll

llvm/trunk/test/CodeGen/ARM/2010-06-29-PartialRedefFastAlloc.ll

llvm/trunk/test/CodeGen/ARM/2011-08-12-vmovqqqq-pseudo.ll

llvm/trunk/test/CodeGen/ARM/2012-01-24-RegSequenceLiveRange.ll

llvm/trunk/test/CodeGen/ARM/2012-05-10-PreferVMOVtoVDUP32.ll

llvm/trunk/test/CodeGen/ARM/2012-08-27-CopyPhysRegCrash.ll

llvm/trunk/test/CodeGen/ARM/2013-10-11-select-stalls.ll

llvm/trunk/test/CodeGen/ARM/2014-01-09-pseudo_expand_implicit_reg.ll

llvm/trunk/test/CodeGen/ARM/arm-interleaved-accesses.ll

llvm/trunk/test/CodeGen/ARM/coalesce-subregs.ll

llvm/trunk/test/CodeGen/ARM/dagcombine-concatvector.ll

llvm/trunk/test/CodeGen/ARM/neon_spill.ll

llvm/trunk/test/CodeGen/ARM/out-of-registers.ll

llvm/trunk/test/CodeGen/ARM/reg_sequence.ll

llvm/trunk/test/CodeGen/ARM/spill-q.ll

llvm/trunk/test/CodeGen/ARM/vcge.ll

llvm/trunk/test/CodeGen/ARM/vector-DAGCombine.ll

llvm/trunk/test/CodeGen/ARM/vld-vst-upgrade.ll

llvm/trunk/test/CodeGen/ARM/vld1.ll

llvm/trunk/test/CodeGen/ARM/vld2.ll

llvm/trunk/test/CodeGen/ARM/vld3.ll

llvm/trunk/test/CodeGen/ARM/vld4.ll

llvm/trunk/test/CodeGen/ARM/vlddup.ll

llvm/trunk/test/CodeGen/ARM/vldlane.ll

llvm/trunk/test/CodeGen/ARM/vmov.ll

llvm/trunk/test/CodeGen/ARM/vmul.ll

llvm/trunk/test/CodeGen/ARM/vst1.ll

llvm/trunk/test/CodeGen/ARM/vst2.ll

llvm/trunk/test/CodeGen/ARM/vst3.ll

llvm/trunk/test/CodeGen/ARM/vst4.ll

llvm/trunk/test/CodeGen/ARM/vstlane.ll

llvm/trunk/test/CodeGen/Thumb2/crash.ll

llvm/trunk/test/CodeGen/Thumb2/machine-licm.ll

llvm/trunk/test/CodeGen/Thumb2/thumb2-spill-q.ll

llvm/trunk/test/CodeGen/Thumb2/v8_IT_1.ll

llvm/trunk/test/Transforms/InstCombine/neon-intrinsics.ll

llvm/trunk/test/Transforms/LoopStrengthReduce/ARM/ivchain-ARM.ll

[ARM] Take into account address spaces in interleaved access vectorization
ClosedPublic