This is an archive of the discontinued LLVM Phabricator instance.

Not sure if I am missing something, but the new regression tests arm-v8.2a-neon-intrinsics.c and aarch64-neon-intrinsics.c are failing for me (after first applying patch D32511 and then this one here in D34161). For example, the very first test in arm-v8.2a-neon-intrinsics.c checks that the function argument is <4 x half> %a, but <4 x half> %1 is generated. With anything other than -O0 you will get the %a (and then it becomes a tailcall).

In D34161#781106, @SjoerdMeijer wrote:

Not sure if I am missing something, but the new regression tests arm-v8.2a-neon-intrinsics.c and aarch64-neon-intrinsics.c are failing for me (after first applying patch D32511 and then this one here in D34161). For example, the very first test in arm-v8.2a-neon-intrinsics.c checks that the function argument is <4 x half> %a, but <4 x half> %1 is generated. With anything other than -O0 you will get the %a (and then it becomes a tailcall).

I renamed arm-v8.2a-neon-intrinsics.c to aarch64-v8.2a-neon-intrinsics.c and modified the +fp16 flag. Could it be that you are still running the old file from the old patch with the old flag? If that is not the case, then I will add this:

For the newly added file aarch64-v8.2a-neon-intrinsics.c, It passed for me in this patch and in the original one (D32511). However, it failed for other people (in the same way as you are describing: %1 vs. %a) in the original patch and the changes were reverted. That lead me to execute this test individually in D32511. It gave me a warning about the +fp16 flag and I replaced it with the correct +fullfp16 flag after consulting with you about it. I was hoping that fixing the flag and the warning will make the test pass for all. Note that the conversion of <4 x half> %1, to ##<4 x half> %a# is done by mem2reg pass (opt -S mem2reg). I am not sure why this optimization does not work for you on this test. I can either 1) replace the occurrence of variables such as %a by * in the test because we are not really testing the optimization after all or 2) send me your cmake command so that I can build and test in the same way to be able to reproduce and investigate the problem on my side.

For the old test arm-neon-intrinsics.c, I did not run it in D32511 because it has REQUIRES: long-tests and our testing configuration did not run those. Once I run it, I can reproduce the problem and I fixed it in this patch. Does it still fail for you?

Just to avoid any confusion/mistakes, can you upload the final patch that you intend to commit?
My understanding is that will include: D32511 + D34161 - arm-v8.2a-neon-intrinsics.c.
And I was indeed accidentally running the old test, but its replacement aarch64-neon-intrinsics.c that
I was also running is giving me similar problems. This looks like a very easy fix to me: if you add -O2
you don't need run it separately through opt and mem2reg (with and without gives exactly the result for me anyway),
and the only thing you need to change is e.g this expected string:

[[ABS:%.*]] = call <4 x half> @llvm.fabs.v4f16(<4 x half> %a)

to match this:

%vabs1.i = tail call <4 x half> @llvm.fabs.v4f16(<4 x half> %a)

where the only difference is "tail".

So I agree that is a straightforward bugfix and also the correct thing to do (i.e. using option +fullfp16).

Ok, I will upload the combination of the two patches here.

I am kind of undecided between your suggestion of using -O2 or my previous solution of removing 'opt -S -mem2reg'. the former makes the test more readable but unstable given that future changes to optimizations at -O2 level may result in updating the tests (but, I do no expect much changes given that they are small tests).

I am resubmitting the combination of D32511 and D34161 in here to avoid any confusion.

Also, I experimented with our build system and figure out a way to reproduce the problem we were discussing and it looks like adding the flag -disable-O0-optnone fixes the issue as this will not disable "opt -memreg" optimization after running clang.

Hi Abderrazek,
Thanks! I've run the patch through some testing, and it looks all good now!
Inlined is one nit that I missed earlier, but no need for another review.
Cheers,
Sjoerd.

clang/lib/Basic/Targets.cpp
6176 ↗	(On Diff #103097)	Renaming this to HasFullFP16 is more consistent.

Closed by commit rL305820: [AArch64] ADD ARMv.2-A FP16 vector intrinsics (authored by az). · Explain WhyJun 20 2017, 11:55 AM

This revision was automatically updated to reflect the committed changes.

az marked an inline comment as done.

Unfortunately this is causing problems in testing:

SplitVectorOperand Op #3: t8: ch = llvm.arm.neon.vst4lane<ST32[%0](align=2)> t3, TargetConstant:i32<699>, FrameIndex:i32<0>, undef:v4f16, undef:v4f16, undef:v4f16, undef:v4f16, Constant:i32<1>, Constant:i32<2>
fatal error: error in backend: Do not know how to split this operator's operand!

This assert is triggered in lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:1457

Reproducer:

typedef __attribute__((neon_vector_type(4))) short uint16x4_t;
void dotests_1082() {
__fp16 result[1];
  uint16x4_t __s1_0_3;
  uint16x4_t __s1_0_2;
  uint16x4_t __s1_0_1;
  uint16x4_t __s1_0_0;
  __builtin_neon_vst4_lane_v(result, __s1_0_0, __s1_0_1, __s1_0_2, __s1_0_3,
                             1, 8);
}

Compile with:

clang -O2 -target armv8-linux-gnueabihf -mcpu=cortex-a57 -D__ARM_NEON_FP16_INTRINSICS -S t.c

Befor this patch the IR looked like this:

call void @llvm.arm.neon.vst4lane.p0i8.v4i16(i8* nonnull %0, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, <4 x i16> undef, i32 1, i32 2)

And now after:

call void @llvm.arm.neon.vst4lane.p0i8.v4f16(i8* nonnull %0, <4 x half> undef, <4 x half> undef, <4 x half> undef, <4 x half> undef, i32 1, i32 2)

Note that the i16 types have changed into half and the intrinsic from i16 to f16.
It has problems with legalising type v4f16.

SjoerdMeijer mentioned this in D35011: [ARM] add v4f16 and v8f16 as legal types.Jul 5 2017, 6:44 AM

SjoerdMeijer mentioned this in rL307277: This reverts r305820 (ARMv.2-A FP16 vector intrinsics) because it shows.Jul 6 2017, 9:38 AM

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Basic/

arm_neon.td

185 lines

lib/

Basic/

Targets.cpp

10 lines

CodeGen/

CGBuiltin.cpp

183 lines

CodeGenModule.cpp

1 line

CodeGenTypeCache.h

2 lines

test/

CodeGen/

aarch64-neon-intrinsics.c

230 lines

aarch64-neon-ldst-one.c

228 lines

aarch64-v8.2a-neon-intrinsics.c

1633 lines

arm_neon_intrinsics.c

240 lines

utils/

TableGen/

NeonEmitter.cpp

6 lines

Diff 103247

cfe/trunk/include/clang/Basic/arm_neon.td

	Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
	// prototype: return (arg, arg, ...)			// prototype: return (arg, arg, ...)
	//			//
	// v: void			// v: void
	// t: best-fit integer (int/poly args)			// t: best-fit integer (int/poly args)
	// x: signed integer (int/float args)			// x: signed integer (int/float args)
	// u: unsigned integer (int/float args)			// u: unsigned integer (int/float args)
	// f: float (int args)			// f: float (int args)
	// F: double (int args)			// F: double (int args)
				// H: half (int args)
	// d: default			// d: default
	// g: default, ignore 'Q' size modifier.			// g: default, ignore 'Q' size modifier.
	// j: default, force 'Q' size modifier.			// j: default, force 'Q' size modifier.
	// w: double width elements, same num elts			// w: double width elements, same num elts
	// n: double width elements, half num elts			// n: double width elements, half num elts
	// h: half width elements, double num elts			// h: half width elements, double num elts
	// q: half width elements, quad num elts			// q: half width elements, quad num elts
	// e: half width elements, double num elts, unsigned			// e: half width elements, double num elts, unsigned
	▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	def OP_MLALHi_N : Op<(call "vmlal_n", $p0, (call "vget_high", $p1), $p2)>;			def OP_MLALHi_N : Op<(call "vmlal_n", $p0, (call "vget_high", $p1), $p2)>;
	def OP_MLS : Op<(op "-", $p0, (op "*", $p1, $p2))>;			def OP_MLS : Op<(op "-", $p0, (op "*", $p1, $p2))>;
	def OP_FMLS : Op<(call "vfma", $p0, (op "-", $p1), $p2)>;			def OP_FMLS : Op<(call "vfma", $p0, (op "-", $p1), $p2)>;
	def OP_MLSL : Op<(op "-", $p0, (call "vmull", $p1, $p2))>;			def OP_MLSL : Op<(op "-", $p0, (call "vmull", $p1, $p2))>;
	def OP_MLSLHi : Op<(call "vmlsl", $p0, (call "vget_high", $p1),			def OP_MLSLHi : Op<(call "vmlsl", $p0, (call "vget_high", $p1),
	(call "vget_high", $p2))>;			(call "vget_high", $p2))>;
	def OP_MLSLHi_N : Op<(call "vmlsl_n", $p0, (call "vget_high", $p1), $p2)>;			def OP_MLSLHi_N : Op<(call "vmlsl_n", $p0, (call "vget_high", $p1), $p2)>;
	def OP_MUL_N : Op<(op "*", $p0, (dup $p1))>;			def OP_MUL_N : Op<(op "*", $p0, (dup $p1))>;
				def OP_MULX_N : Op<(call "vmulx", $p0, (dup $p1))>;
	def OP_MLA_N : Op<(op "+", $p0, (op "*", $p1, (dup $p2)))>;			def OP_MLA_N : Op<(op "+", $p0, (op "*", $p1, (dup $p2)))>;
	def OP_MLS_N : Op<(op "-", $p0, (op "*", $p1, (dup $p2)))>;			def OP_MLS_N : Op<(op "-", $p0, (op "*", $p1, (dup $p2)))>;
	def OP_FMLA_N : Op<(call "vfma", $p0, $p1, (dup $p2))>;			def OP_FMLA_N : Op<(call "vfma", $p0, $p1, (dup $p2))>;
	def OP_FMLS_N : Op<(call "vfma", $p0, (op "-", $p1), (dup $p2))>;			def OP_FMLS_N : Op<(call "vfma", $p0, (op "-", $p1), (dup $p2))>;
	def OP_MLAL_N : Op<(op "+", $p0, (call "vmull", $p1, (dup $p2)))>;			def OP_MLAL_N : Op<(op "+", $p0, (call "vmull", $p1, (dup $p2)))>;
	def OP_MLSL_N : Op<(op "-", $p0, (call "vmull", $p1, (dup $p2)))>;			def OP_MLSL_N : Op<(op "-", $p0, (call "vmull", $p1, (dup $p2)))>;
	def OP_MUL_LN : Op<(op "*", $p0, (splat $p1, $p2))>;			def OP_MUL_LN : Op<(op "*", $p0, (splat $p1, $p2))>;
	def OP_MULX_LN : Op<(call "vmulx", $p0, (splat $p1, $p2))>;			def OP_MULX_LN : Op<(call "vmulx", $p0, (splat $p1, $p2))>;
	▲ Show 20 Lines • Show All 1,300 Lines • ▼ Show 20 Lines
	// Signed Saturating Rounding Doubling Multiply Subtract Returning High Half			// Signed Saturating Rounding Doubling Multiply Subtract Returning High Half
	def SCALAR_SQRDMLSH_LANE : SOpInst<"vqrdmlsh_lane", "sssdi", "SsSi", OP_SCALAR_QRDMLSH_LN>;			def SCALAR_SQRDMLSH_LANE : SOpInst<"vqrdmlsh_lane", "sssdi", "SsSi", OP_SCALAR_QRDMLSH_LN>;
	def SCALAR_SQRDMLSH_LANEQ : SOpInst<"vqrdmlsh_laneq", "sssji", "SsSi", OP_SCALAR_QRDMLSH_LN>;			def SCALAR_SQRDMLSH_LANEQ : SOpInst<"vqrdmlsh_laneq", "sssji", "SsSi", OP_SCALAR_QRDMLSH_LN>;
	}			}

	def SCALAR_VDUP_LANE : IInst<"vdup_lane", "sdi", "ScSsSiSlSfSdSUcSUsSUiSUlSPcSPs">;			def SCALAR_VDUP_LANE : IInst<"vdup_lane", "sdi", "ScSsSiSlSfSdSUcSUsSUiSUlSPcSPs">;
	def SCALAR_VDUP_LANEQ : IInst<"vdup_laneq", "sji", "ScSsSiSlSfSdSUcSUsSUiSUlSPcSPs">;			def SCALAR_VDUP_LANEQ : IInst<"vdup_laneq", "sji", "ScSsSiSlSfSdSUcSUsSUiSUlSPcSPs">;
	}			}

				// ARMv8.2-A FP16 intrinsics.
				let ArchGuard = "defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && defined(__aarch64__)" in {

				// ARMv8.2-A FP16 one-operand vector intrinsics.

				// Comparison
				def CMEQH : SInst<"vceqz", "ud", "hQh">;
				def CMGEH : SInst<"vcgez", "ud", "hQh">;
				def CMGTH : SInst<"vcgtz", "ud", "hQh">;
				def CMLEH : SInst<"vclez", "ud", "hQh">;
				def CMLTH : SInst<"vcltz", "ud", "hQh">;

				// Vector conversion
				def VCVT_F16 : SInst<"vcvt_f16", "Hd", "sUsQsQUs">;
				def VCVT_S16 : SInst<"vcvt_s16", "xd", "hQh">;
				def VCVT_U16 : SInst<"vcvt_u16", "ud", "hQh">;
				def VCVTA_S16 : SInst<"vcvta_s16", "xd", "hQh">;
				def VCVTA_U16 : SInst<"vcvta_u16", "ud", "hQh">;
				def VCVTM_S16 : SInst<"vcvtm_s16", "xd", "hQh">;
				def VCVTM_U16 : SInst<"vcvtm_u16", "ud", "hQh">;
				def VCVTN_S16 : SInst<"vcvtn_s16", "xd", "hQh">;
				def VCVTN_U16 : SInst<"vcvtn_u16", "ud", "hQh">;
				def VCVTP_S16 : SInst<"vcvtp_s16", "xd", "hQh">;
				def VCVTP_U16 : SInst<"vcvtp_u16", "ud", "hQh">;

				// Vector rounding
				def FRINTZH : SInst<"vrnd", "dd", "hQh">;
				def FRINTNH : SInst<"vrndn", "dd", "hQh">;
				def FRINTAH : SInst<"vrnda", "dd", "hQh">;
				def FRINTPH : SInst<"vrndp", "dd", "hQh">;
				def FRINTMH : SInst<"vrndm", "dd", "hQh">;
				def FRINTXH : SInst<"vrndx", "dd", "hQh">;
				def FRINTIH : SInst<"vrndi", "dd", "hQh">;

				// Misc.
				def VABSH : SInst<"vabs", "dd", "hQh">;
				def VNEGH : SOpInst<"vneg", "dd", "hQh", OP_NEG>;
				def VRECPEH : SInst<"vrecpe", "dd", "hQh">;
				def FRSQRTEH : SInst<"vrsqrte", "dd", "hQh">;
				def FSQRTH : SInst<"vsqrt", "dd", "hQh">;

				// ARMv8.2-A FP16 two-operands vector intrinsics.

				// Misc.
				def VADDH : SOpInst<"vadd", "ddd", "hQh", OP_ADD>;
				def VABDH : SInst<"vabd", "ddd", "hQh">;
				def VSUBH : SOpInst<"vsub", "ddd", "hQh", OP_SUB>;

				// Comparison
				let InstName = "vacge" in {
				def VCAGEH : SInst<"vcage", "udd", "hQh">;
				def VCALEH : SInst<"vcale", "udd", "hQh">;
				}
				let InstName = "vacgt" in {
				def VCAGTH : SInst<"vcagt", "udd", "hQh">;
				def VCALTH : SInst<"vcalt", "udd", "hQh">;
				}
				def VCEQH : SOpInst<"vceq", "udd", "hQh", OP_EQ>;
				def VCGEH : SOpInst<"vcge", "udd", "hQh", OP_GE>;
				def VCGTH : SOpInst<"vcgt", "udd", "hQh", OP_GT>;
				let InstName = "vcge" in
				def VCLEH : SOpInst<"vcle", "udd", "hQh", OP_LE>;
				let InstName = "vcgt" in
				def VCLTH : SOpInst<"vclt", "udd", "hQh", OP_LT>;

				// Vector conversion
				let isVCVT_N = 1 in {
				def VCVT_N_F16 : SInst<"vcvt_n_f16", "Hdi", "sUsQsQUs">;
				def VCVT_N_S16 : SInst<"vcvt_n_s16", "xdi", "hQh">;
				def VCVT_N_U16 : SInst<"vcvt_n_u16", "udi", "hQh">;
				}

				// Max/Min
				def VMAXH : SInst<"vmax", "ddd", "hQh">;
				def VMINH : SInst<"vmin", "ddd", "hQh">;
				def FMAXNMH : SInst<"vmaxnm", "ddd", "hQh">;
				def FMINNMH : SInst<"vminnm", "ddd", "hQh">;

				// Multiplication/Division
				def VMULH : SOpInst<"vmul", "ddd", "hQh", OP_MUL>;
				def MULXH : SInst<"vmulx", "ddd", "hQh">;
				def FDIVH : IOpInst<"vdiv", "ddd", "hQh", OP_DIV>;

				// Pairwise addition
				def VPADDH : SInst<"vpadd", "ddd", "hQh">;

				// Pairwise Max/Min
				def VPMAXH : SInst<"vpmax", "ddd", "hQh">;
				def VPMINH : SInst<"vpmin", "ddd", "hQh">;
				// Pairwise MaxNum/MinNum
				def FMAXNMPH : SInst<"vpmaxnm", "ddd", "hQh">;
				def FMINNMPH : SInst<"vpminnm", "ddd", "hQh">;

				// Reciprocal/Sqrt
				def VRECPSH : SInst<"vrecps", "ddd", "hQh">;
				def VRSQRTSH : SInst<"vrsqrts", "ddd", "hQh">;

				// ARMv8.2-A FP16 three-operands vector intrinsics.

				// Vector fused multiply-add operations
				def VFMAH : SInst<"vfma", "dddd", "hQh">;
				def VFMSH : SOpInst<"vfms", "dddd", "hQh", OP_FMLS>;

				// ARMv8.2-A FP16 lane vector intrinsics.

				// FMA lane
				def VFMA_LANEH : IInst<"vfma_lane", "dddgi", "hQh">;
				def VFMA_LANEQH : IInst<"vfma_laneq", "dddji", "hQh">;

				// FMA lane with scalar argument
				def FMLA_NH : SOpInst<"vfma_n", "ddds", "hQh", OP_FMLA_N>;
				// Scalar floating point fused multiply-add (scalar, by element)
				def SCALAR_FMLA_LANEH : IInst<"vfma_lane", "sssdi", "Sh">;
				def SCALAR_FMLA_LANEQH : IInst<"vfma_laneq", "sssji", "Sh">;

				// FMS lane
				def VFMS_LANEH : IOpInst<"vfms_lane", "dddgi", "hQh", OP_FMS_LN>;
				def VFMS_LANEQH : IOpInst<"vfms_laneq", "dddji", "hQh", OP_FMS_LNQ>;
				// FMS lane with scalar argument
				def FMLS_NH : SOpInst<"vfms_n", "ddds", "hQh", OP_FMLS_N>;
				// Scalar floating foint fused multiply-subtract (scalar, by element)
				def SCALAR_FMLS_LANEH : IOpInst<"vfms_lane", "sssdi", "Sh", OP_FMS_LN>;
				def SCALAR_FMLS_LANEQH : IOpInst<"vfms_laneq", "sssji", "Sh", OP_FMS_LNQ>;

				// Mul lane
				def VMUL_LANEH : IOpInst<"vmul_lane", "ddgi", "hQh", OP_MUL_LN>;
				def VMUL_LANEQH : IOpInst<"vmul_laneq", "ddji", "hQh", OP_MUL_LN>;
				def VMUL_NH : IOpInst<"vmul_n", "dds", "hQh", OP_MUL_N>;
				// Scalar floating point multiply (scalar, by element)
				def SCALAR_FMUL_LANEH : IOpInst<"vmul_lane", "ssdi", "Sh", OP_SCALAR_MUL_LN>;
				def SCALAR_FMUL_LANEQH : IOpInst<"vmul_laneq", "ssji", "Sh", OP_SCALAR_MUL_LN>;

				// Mulx lane
				def VMULX_LANEH : IOpInst<"vmulx_lane", "ddgi", "hQh", OP_MULX_LN>;
				def VMULX_LANEQH : IOpInst<"vmulx_laneq", "ddji", "hQh", OP_MULX_LN>;
				def VMULX_NH : IOpInst<"vmulx_n", "dds", "hQh", OP_MULX_N>;
				// TODO: Scalar floating point multiply extended (scalar, by element)
				// Below ones are commented out because they need vmulx_f16(float16_t, float16_t)
				// which will be implemented later with fp16 scalar intrinsic (arm_fp16.h)
				//def SCALAR_FMULX_LANEH : IOpInst<"vmulx_lane", "ssdi", "Sh", OP_SCALAR_MUL_LN>;
				//def SCALAR_FMULX_LANEQH : IOpInst<"vmulx_laneq", "ssji", "Sh", OP_SCALAR_MUL_LN>;

				// ARMv8.2-A FP16 reduction vector intrinsics.
				def VMAXVH : SInst<"vmaxv", "sd", "hQh">;
				def VMINVH : SInst<"vminv", "sd", "hQh">;
				def FMAXNMVH : SInst<"vmaxnmv", "sd", "hQh">;
				def FMINNMVH : SInst<"vminnmv", "sd", "hQh">;

				// Data processing intrinsics - section 5

				// Logical operations
				let isHiddenLInst = 1 in
				def VBSLH : SInst<"vbsl", "dudd", "hQh">;

				// Transposition operations
				def VZIPH : WInst<"vzip", "2dd", "hQh">;
				def VUZPH : WInst<"vuzp", "2dd", "hQh">;
				def VTRNH : WInst<"vtrn", "2dd", "hQh">;

				// Set all lanes to same value.
				/* Already implemented prior to ARMv8.2-A.
				def VMOV_NH : WOpInst<"vmov_n", "ds", "hQh", OP_DUP>;
				def VDUP_NH : WOpInst<"vdup_n", "ds", "hQh", OP_DUP>;
				def VDUP_LANE1H : WOpInst<"vdup_lane", "dgi", "hQh", OP_DUP_LN>;*/

				// Vector Extract
				def VEXTH : WInst<"vext", "dddi", "hQh">;

				// Reverse vector elements
				def VREV64H : WOpInst<"vrev64", "dd", "hQh", OP_REV64>;

				// Permutation
				def VTRN1H : SOpInst<"vtrn1", "ddd", "hQh", OP_TRN1>;
				def VZIP1H : SOpInst<"vzip1", "ddd", "hQh", OP_ZIP1>;
				def VUZP1H : SOpInst<"vuzp1", "ddd", "hQh", OP_UZP1>;
				def VTRN2H : SOpInst<"vtrn2", "ddd", "hQh", OP_TRN2>;
				def VZIP2H : SOpInst<"vzip2", "ddd", "hQh", OP_ZIP2>;
				def VUZP2H : SOpInst<"vuzp2", "ddd", "hQh", OP_UZP2>;

				def SCALAR_VDUP_LANEH : IInst<"vdup_lane", "sdi", "Sh">;
				def SCALAR_VDUP_LANEQH : IInst<"vdup_laneq", "sji", "Sh">;
				}

cfe/trunk/lib/Basic/Targets.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,166 Lines • ▼ Show 20 Lines	enum FPUModeEnum {
NeonMode		NeonMode
};		};

unsigned FPU;		unsigned FPU;
unsigned CRC;		unsigned CRC;
unsigned Crypto;		unsigned Crypto;
unsigned Unaligned;		unsigned Unaligned;
unsigned V8_1A;		unsigned V8_1A;
		unsigned V8_2A;
		unsigned HasFullFP16;

static const Builtin::Info BuiltinInfo[];		static const Builtin::Info BuiltinInfo[];

std::string ABI;		std::string ABI;

public:		public:
AArch64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)		AArch64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
: TargetInfo(Triple), ABI("aapcs") {		: TargetInfo(Triple), ABI("aapcs") {
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	void getTargetDefines(const LangOptions &Opts,
if (Crypto)		if (Crypto)
Builder.defineMacro("__ARM_FEATURE_CRYPTO", "1");		Builder.defineMacro("__ARM_FEATURE_CRYPTO", "1");

if (Unaligned)		if (Unaligned)
Builder.defineMacro("__ARM_FEATURE_UNALIGNED", "1");		Builder.defineMacro("__ARM_FEATURE_UNALIGNED", "1");

if (V8_1A)		if (V8_1A)
Builder.defineMacro("__ARM_FEATURE_QRDMX", "1");		Builder.defineMacro("__ARM_FEATURE_QRDMX", "1");
		if (V8_2A && FPU == NeonMode && HasFullFP16)
		Builder.defineMacro("__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", "1");

// All of the __sync_(bool\|val)_compare_and_swap_(1\|2\|4\|8) builtins work.		// All of the __sync_(bool\|val)_compare_and_swap_(1\|2\|4\|8) builtins work.
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1");		Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1");
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2");		Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2");
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4");		Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4");
Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8");		Builder.defineMacro("__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8");
}		}

Show All 11 Lines	public:

bool handleTargetFeatures(std::vector<std::string> &Features,		bool handleTargetFeatures(std::vector<std::string> &Features,
DiagnosticsEngine &Diags) override {		DiagnosticsEngine &Diags) override {
FPU = FPUMode;		FPU = FPUMode;
CRC = 0;		CRC = 0;
Crypto = 0;		Crypto = 0;
Unaligned = 1;		Unaligned = 1;
V8_1A = 0;		V8_1A = 0;
		V8_2A = 0;
		HasFullFP16 = 0;

for (const auto &Feature : Features) {		for (const auto &Feature : Features) {
if (Feature == "+neon")		if (Feature == "+neon")
FPU = NeonMode;		FPU = NeonMode;
if (Feature == "+crc")		if (Feature == "+crc")
CRC = 1;		CRC = 1;
if (Feature == "+crypto")		if (Feature == "+crypto")
Crypto = 1;		Crypto = 1;
if (Feature == "+strict-align")		if (Feature == "+strict-align")
Unaligned = 0;		Unaligned = 0;
if (Feature == "+v8.1a")		if (Feature == "+v8.1a")
V8_1A = 1;		V8_1A = 1;
		if (Feature == "+v8.2a")
		V8_2A = 1;
		if (Feature == "+fullfp16")
		HasFullFP16 = 1;
}		}

setDataLayout();		setDataLayout();

return true;		return true;
}		}

CallingConvCheckResult checkCallingConvention(CallingConv CC) const override {		CallingConvCheckResult checkCallingConvention(CallingConv CC) const override {
▲ Show 20 Lines • Show All 3,364 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,950 Lines • ▼ Show 20 Lines	static llvm::VectorType GetNeonType(CodeGenFunction CGF,
bool V1Ty=false) {		bool V1Ty=false) {
int IsQuad = TypeFlags.isQuad();		int IsQuad = TypeFlags.isQuad();
switch (TypeFlags.getEltType()) {		switch (TypeFlags.getEltType()) {
case NeonTypeFlags::Int8:		case NeonTypeFlags::Int8:
case NeonTypeFlags::Poly8:		case NeonTypeFlags::Poly8:
return llvm::VectorType::get(CGF->Int8Ty, V1Ty ? 1 : (8 << IsQuad));		return llvm::VectorType::get(CGF->Int8Ty, V1Ty ? 1 : (8 << IsQuad));
case NeonTypeFlags::Int16:		case NeonTypeFlags::Int16:
case NeonTypeFlags::Poly16:		case NeonTypeFlags::Poly16:
case NeonTypeFlags::Float16:
return llvm::VectorType::get(CGF->Int16Ty, V1Ty ? 1 : (4 << IsQuad));		return llvm::VectorType::get(CGF->Int16Ty, V1Ty ? 1 : (4 << IsQuad));
		case NeonTypeFlags::Float16:
		return llvm::VectorType::get(CGF->HalfTy, V1Ty ? 1 : (4 << IsQuad));
case NeonTypeFlags::Int32:		case NeonTypeFlags::Int32:
return llvm::VectorType::get(CGF->Int32Ty, V1Ty ? 1 : (2 << IsQuad));		return llvm::VectorType::get(CGF->Int32Ty, V1Ty ? 1 : (2 << IsQuad));
case NeonTypeFlags::Int64:		case NeonTypeFlags::Int64:
case NeonTypeFlags::Poly64:		case NeonTypeFlags::Poly64:
return llvm::VectorType::get(CGF->Int64Ty, V1Ty ? 1 : (1 << IsQuad));		return llvm::VectorType::get(CGF->Int64Ty, V1Ty ? 1 : (1 << IsQuad));
case NeonTypeFlags::Poly128:		case NeonTypeFlags::Poly128:
// FIXME: i128 and f128 doesn't get fully support in Clang and llvm.		// FIXME: i128 and f128 doesn't get fully support in Clang and llvm.
// There is a lot of i128 and f128 API missing.		// There is a lot of i128 and f128 API missing.
// so we use v16i8 to represent poly128 and get pattern matched.		// so we use v16i8 to represent poly128 and get pattern matched.
return llvm::VectorType::get(CGF->Int8Ty, 16);		return llvm::VectorType::get(CGF->Int8Ty, 16);
case NeonTypeFlags::Float32:		case NeonTypeFlags::Float32:
return llvm::VectorType::get(CGF->FloatTy, V1Ty ? 1 : (2 << IsQuad));		return llvm::VectorType::get(CGF->FloatTy, V1Ty ? 1 : (2 << IsQuad));
case NeonTypeFlags::Float64:		case NeonTypeFlags::Float64:
return llvm::VectorType::get(CGF->DoubleTy, V1Ty ? 1 : (1 << IsQuad));		return llvm::VectorType::get(CGF->DoubleTy, V1Ty ? 1 : (1 << IsQuad));
}		}
llvm_unreachable("Unknown vector element type!");		llvm_unreachable("Unknown vector element type!");
}		}

static llvm::VectorType GetFloatNeonType(CodeGenFunction CGF,		static llvm::VectorType GetFloatNeonType(CodeGenFunction CGF,
NeonTypeFlags IntTypeFlags) {		NeonTypeFlags IntTypeFlags) {
int IsQuad = IntTypeFlags.isQuad();		int IsQuad = IntTypeFlags.isQuad();
switch (IntTypeFlags.getEltType()) {		switch (IntTypeFlags.getEltType()) {
		case NeonTypeFlags::Int16:
		return llvm::VectorType::get(CGF->HalfTy, (4 << IsQuad));
case NeonTypeFlags::Int32:		case NeonTypeFlags::Int32:
return llvm::VectorType::get(CGF->FloatTy, (2 << IsQuad));		return llvm::VectorType::get(CGF->FloatTy, (2 << IsQuad));
case NeonTypeFlags::Int64:		case NeonTypeFlags::Int64:
return llvm::VectorType::get(CGF->DoubleTy, (1 << IsQuad));		return llvm::VectorType::get(CGF->DoubleTy, (1 << IsQuad));
default:		default:
llvm_unreachable("Type can't be converted to floating-point!");		llvm_unreachable("Type can't be converted to floating-point!");
}		}
}		}
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	static const NeonIntrinsicInfo ARMSIMDIntrinsicMap [] = {
NEONMAP1(vclsq_v, arm_neon_vcls, Add1ArgType),		NEONMAP1(vclsq_v, arm_neon_vcls, Add1ArgType),
NEONMAP1(vclz_v, ctlz, Add1ArgType),		NEONMAP1(vclz_v, ctlz, Add1ArgType),
NEONMAP1(vclzq_v, ctlz, Add1ArgType),		NEONMAP1(vclzq_v, ctlz, Add1ArgType),
NEONMAP1(vcnt_v, ctpop, Add1ArgType),		NEONMAP1(vcnt_v, ctpop, Add1ArgType),
NEONMAP1(vcntq_v, ctpop, Add1ArgType),		NEONMAP1(vcntq_v, ctpop, Add1ArgType),
NEONMAP1(vcvt_f16_f32, arm_neon_vcvtfp2hf, 0),		NEONMAP1(vcvt_f16_f32, arm_neon_vcvtfp2hf, 0),
NEONMAP1(vcvt_f32_f16, arm_neon_vcvthf2fp, 0),		NEONMAP1(vcvt_f32_f16, arm_neon_vcvthf2fp, 0),
NEONMAP0(vcvt_f32_v),		NEONMAP0(vcvt_f32_v),
		NEONMAP2(vcvt_n_f16_v, arm_neon_vcvtfxu2fp, arm_neon_vcvtfxs2fp, 0),
NEONMAP2(vcvt_n_f32_v, arm_neon_vcvtfxu2fp, arm_neon_vcvtfxs2fp, 0),		NEONMAP2(vcvt_n_f32_v, arm_neon_vcvtfxu2fp, arm_neon_vcvtfxs2fp, 0),
		NEONMAP1(vcvt_n_s16_v, arm_neon_vcvtfp2fxs, 0),
NEONMAP1(vcvt_n_s32_v, arm_neon_vcvtfp2fxs, 0),		NEONMAP1(vcvt_n_s32_v, arm_neon_vcvtfp2fxs, 0),
NEONMAP1(vcvt_n_s64_v, arm_neon_vcvtfp2fxs, 0),		NEONMAP1(vcvt_n_s64_v, arm_neon_vcvtfp2fxs, 0),
		NEONMAP1(vcvt_n_u16_v, arm_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvt_n_u32_v, arm_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvt_n_u32_v, arm_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvt_n_u64_v, arm_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvt_n_u64_v, arm_neon_vcvtfp2fxu, 0),
		NEONMAP0(vcvt_s16_v),
NEONMAP0(vcvt_s32_v),		NEONMAP0(vcvt_s32_v),
NEONMAP0(vcvt_s64_v),		NEONMAP0(vcvt_s64_v),
		NEONMAP0(vcvt_u16_v),
NEONMAP0(vcvt_u32_v),		NEONMAP0(vcvt_u32_v),
NEONMAP0(vcvt_u64_v),		NEONMAP0(vcvt_u64_v),
		NEONMAP1(vcvta_s16_v, arm_neon_vcvtas, 0),
NEONMAP1(vcvta_s32_v, arm_neon_vcvtas, 0),		NEONMAP1(vcvta_s32_v, arm_neon_vcvtas, 0),
NEONMAP1(vcvta_s64_v, arm_neon_vcvtas, 0),		NEONMAP1(vcvta_s64_v, arm_neon_vcvtas, 0),
NEONMAP1(vcvta_u32_v, arm_neon_vcvtau, 0),		NEONMAP1(vcvta_u32_v, arm_neon_vcvtau, 0),
NEONMAP1(vcvta_u64_v, arm_neon_vcvtau, 0),		NEONMAP1(vcvta_u64_v, arm_neon_vcvtau, 0),
		NEONMAP1(vcvtaq_s16_v, arm_neon_vcvtas, 0),
NEONMAP1(vcvtaq_s32_v, arm_neon_vcvtas, 0),		NEONMAP1(vcvtaq_s32_v, arm_neon_vcvtas, 0),
NEONMAP1(vcvtaq_s64_v, arm_neon_vcvtas, 0),		NEONMAP1(vcvtaq_s64_v, arm_neon_vcvtas, 0),
		NEONMAP1(vcvtaq_u16_v, arm_neon_vcvtau, 0),
NEONMAP1(vcvtaq_u32_v, arm_neon_vcvtau, 0),		NEONMAP1(vcvtaq_u32_v, arm_neon_vcvtau, 0),
NEONMAP1(vcvtaq_u64_v, arm_neon_vcvtau, 0),		NEONMAP1(vcvtaq_u64_v, arm_neon_vcvtau, 0),
		NEONMAP1(vcvtm_s16_v, arm_neon_vcvtms, 0),
NEONMAP1(vcvtm_s32_v, arm_neon_vcvtms, 0),		NEONMAP1(vcvtm_s32_v, arm_neon_vcvtms, 0),
NEONMAP1(vcvtm_s64_v, arm_neon_vcvtms, 0),		NEONMAP1(vcvtm_s64_v, arm_neon_vcvtms, 0),
		NEONMAP1(vcvtm_u16_v, arm_neon_vcvtmu, 0),
NEONMAP1(vcvtm_u32_v, arm_neon_vcvtmu, 0),		NEONMAP1(vcvtm_u32_v, arm_neon_vcvtmu, 0),
NEONMAP1(vcvtm_u64_v, arm_neon_vcvtmu, 0),		NEONMAP1(vcvtm_u64_v, arm_neon_vcvtmu, 0),
		NEONMAP1(vcvtmq_s16_v, arm_neon_vcvtms, 0),
NEONMAP1(vcvtmq_s32_v, arm_neon_vcvtms, 0),		NEONMAP1(vcvtmq_s32_v, arm_neon_vcvtms, 0),
NEONMAP1(vcvtmq_s64_v, arm_neon_vcvtms, 0),		NEONMAP1(vcvtmq_s64_v, arm_neon_vcvtms, 0),
		NEONMAP1(vcvtmq_u16_v, arm_neon_vcvtmu, 0),
NEONMAP1(vcvtmq_u32_v, arm_neon_vcvtmu, 0),		NEONMAP1(vcvtmq_u32_v, arm_neon_vcvtmu, 0),
NEONMAP1(vcvtmq_u64_v, arm_neon_vcvtmu, 0),		NEONMAP1(vcvtmq_u64_v, arm_neon_vcvtmu, 0),
		NEONMAP1(vcvtn_s16_v, arm_neon_vcvtns, 0),
NEONMAP1(vcvtn_s32_v, arm_neon_vcvtns, 0),		NEONMAP1(vcvtn_s32_v, arm_neon_vcvtns, 0),
NEONMAP1(vcvtn_s64_v, arm_neon_vcvtns, 0),		NEONMAP1(vcvtn_s64_v, arm_neon_vcvtns, 0),
		NEONMAP1(vcvtn_u16_v, arm_neon_vcvtnu, 0),
NEONMAP1(vcvtn_u32_v, arm_neon_vcvtnu, 0),		NEONMAP1(vcvtn_u32_v, arm_neon_vcvtnu, 0),
NEONMAP1(vcvtn_u64_v, arm_neon_vcvtnu, 0),		NEONMAP1(vcvtn_u64_v, arm_neon_vcvtnu, 0),
		NEONMAP1(vcvtnq_s16_v, arm_neon_vcvtns, 0),
NEONMAP1(vcvtnq_s32_v, arm_neon_vcvtns, 0),		NEONMAP1(vcvtnq_s32_v, arm_neon_vcvtns, 0),
NEONMAP1(vcvtnq_s64_v, arm_neon_vcvtns, 0),		NEONMAP1(vcvtnq_s64_v, arm_neon_vcvtns, 0),
		NEONMAP1(vcvtnq_u16_v, arm_neon_vcvtnu, 0),
NEONMAP1(vcvtnq_u32_v, arm_neon_vcvtnu, 0),		NEONMAP1(vcvtnq_u32_v, arm_neon_vcvtnu, 0),
NEONMAP1(vcvtnq_u64_v, arm_neon_vcvtnu, 0),		NEONMAP1(vcvtnq_u64_v, arm_neon_vcvtnu, 0),
		NEONMAP1(vcvtp_s16_v, arm_neon_vcvtps, 0),
NEONMAP1(vcvtp_s32_v, arm_neon_vcvtps, 0),		NEONMAP1(vcvtp_s32_v, arm_neon_vcvtps, 0),
NEONMAP1(vcvtp_s64_v, arm_neon_vcvtps, 0),		NEONMAP1(vcvtp_s64_v, arm_neon_vcvtps, 0),
		NEONMAP1(vcvtp_u16_v, arm_neon_vcvtpu, 0),
NEONMAP1(vcvtp_u32_v, arm_neon_vcvtpu, 0),		NEONMAP1(vcvtp_u32_v, arm_neon_vcvtpu, 0),
NEONMAP1(vcvtp_u64_v, arm_neon_vcvtpu, 0),		NEONMAP1(vcvtp_u64_v, arm_neon_vcvtpu, 0),
		NEONMAP1(vcvtpq_s16_v, arm_neon_vcvtps, 0),
NEONMAP1(vcvtpq_s32_v, arm_neon_vcvtps, 0),		NEONMAP1(vcvtpq_s32_v, arm_neon_vcvtps, 0),
NEONMAP1(vcvtpq_s64_v, arm_neon_vcvtps, 0),		NEONMAP1(vcvtpq_s64_v, arm_neon_vcvtps, 0),
		NEONMAP1(vcvtpq_u16_v, arm_neon_vcvtpu, 0),
NEONMAP1(vcvtpq_u32_v, arm_neon_vcvtpu, 0),		NEONMAP1(vcvtpq_u32_v, arm_neon_vcvtpu, 0),
NEONMAP1(vcvtpq_u64_v, arm_neon_vcvtpu, 0),		NEONMAP1(vcvtpq_u64_v, arm_neon_vcvtpu, 0),
NEONMAP0(vcvtq_f32_v),		NEONMAP0(vcvtq_f32_v),
		NEONMAP2(vcvtq_n_f16_v, arm_neon_vcvtfxu2fp, arm_neon_vcvtfxs2fp, 0),
NEONMAP2(vcvtq_n_f32_v, arm_neon_vcvtfxu2fp, arm_neon_vcvtfxs2fp, 0),		NEONMAP2(vcvtq_n_f32_v, arm_neon_vcvtfxu2fp, arm_neon_vcvtfxs2fp, 0),
		NEONMAP1(vcvtq_n_s16_v, arm_neon_vcvtfp2fxs, 0),
NEONMAP1(vcvtq_n_s32_v, arm_neon_vcvtfp2fxs, 0),		NEONMAP1(vcvtq_n_s32_v, arm_neon_vcvtfp2fxs, 0),
NEONMAP1(vcvtq_n_s64_v, arm_neon_vcvtfp2fxs, 0),		NEONMAP1(vcvtq_n_s64_v, arm_neon_vcvtfp2fxs, 0),
		NEONMAP1(vcvtq_n_u16_v, arm_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvtq_n_u32_v, arm_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvtq_n_u32_v, arm_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvtq_n_u64_v, arm_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvtq_n_u64_v, arm_neon_vcvtfp2fxu, 0),
		NEONMAP0(vcvtq_s16_v),
NEONMAP0(vcvtq_s32_v),		NEONMAP0(vcvtq_s32_v),
NEONMAP0(vcvtq_s64_v),		NEONMAP0(vcvtq_s64_v),
		NEONMAP0(vcvtq_u16_v),
NEONMAP0(vcvtq_u32_v),		NEONMAP0(vcvtq_u32_v),
NEONMAP0(vcvtq_u64_v),		NEONMAP0(vcvtq_u64_v),
NEONMAP0(vext_v),		NEONMAP0(vext_v),
NEONMAP0(vextq_v),		NEONMAP0(vextq_v),
NEONMAP0(vfma_v),		NEONMAP0(vfma_v),
NEONMAP0(vfmaq_v),		NEONMAP0(vfmaq_v),
NEONMAP2(vhadd_v, arm_neon_vhaddu, arm_neon_vhadds, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhadd_v, arm_neon_vhaddu, arm_neon_vhadds, Add1ArgType \| UnsignedAlts),
NEONMAP2(vhaddq_v, arm_neon_vhaddu, arm_neon_vhadds, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhaddq_v, arm_neon_vhaddu, arm_neon_vhadds, Add1ArgType \| UnsignedAlts),
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	static const NeonIntrinsicInfo AArch64SIMDIntrinsicMap[] = {
NEONMAP1(vcaltq_v, aarch64_neon_facgt, 0),		NEONMAP1(vcaltq_v, aarch64_neon_facgt, 0),
NEONMAP1(vcls_v, aarch64_neon_cls, Add1ArgType),		NEONMAP1(vcls_v, aarch64_neon_cls, Add1ArgType),
NEONMAP1(vclsq_v, aarch64_neon_cls, Add1ArgType),		NEONMAP1(vclsq_v, aarch64_neon_cls, Add1ArgType),
NEONMAP1(vclz_v, ctlz, Add1ArgType),		NEONMAP1(vclz_v, ctlz, Add1ArgType),
NEONMAP1(vclzq_v, ctlz, Add1ArgType),		NEONMAP1(vclzq_v, ctlz, Add1ArgType),
NEONMAP1(vcnt_v, ctpop, Add1ArgType),		NEONMAP1(vcnt_v, ctpop, Add1ArgType),
NEONMAP1(vcntq_v, ctpop, Add1ArgType),		NEONMAP1(vcntq_v, ctpop, Add1ArgType),
NEONMAP1(vcvt_f16_f32, aarch64_neon_vcvtfp2hf, 0),		NEONMAP1(vcvt_f16_f32, aarch64_neon_vcvtfp2hf, 0),
		NEONMAP0(vcvt_f16_v),
NEONMAP1(vcvt_f32_f16, aarch64_neon_vcvthf2fp, 0),		NEONMAP1(vcvt_f32_f16, aarch64_neon_vcvthf2fp, 0),
NEONMAP0(vcvt_f32_v),		NEONMAP0(vcvt_f32_v),
		NEONMAP2(vcvt_n_f16_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),
NEONMAP2(vcvt_n_f32_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),		NEONMAP2(vcvt_n_f32_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),
NEONMAP2(vcvt_n_f64_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),		NEONMAP2(vcvt_n_f64_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),
		NEONMAP1(vcvt_n_s16_v, aarch64_neon_vcvtfp2fxs, 0),
NEONMAP1(vcvt_n_s32_v, aarch64_neon_vcvtfp2fxs, 0),		NEONMAP1(vcvt_n_s32_v, aarch64_neon_vcvtfp2fxs, 0),
NEONMAP1(vcvt_n_s64_v, aarch64_neon_vcvtfp2fxs, 0),		NEONMAP1(vcvt_n_s64_v, aarch64_neon_vcvtfp2fxs, 0),
		NEONMAP1(vcvt_n_u16_v, aarch64_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvt_n_u32_v, aarch64_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvt_n_u32_v, aarch64_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvt_n_u64_v, aarch64_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvt_n_u64_v, aarch64_neon_vcvtfp2fxu, 0),
		NEONMAP0(vcvtq_f16_v),
NEONMAP0(vcvtq_f32_v),		NEONMAP0(vcvtq_f32_v),
		NEONMAP2(vcvtq_n_f16_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),
NEONMAP2(vcvtq_n_f32_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),		NEONMAP2(vcvtq_n_f32_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),
NEONMAP2(vcvtq_n_f64_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),		NEONMAP2(vcvtq_n_f64_v, aarch64_neon_vcvtfxu2fp, aarch64_neon_vcvtfxs2fp, 0),
		NEONMAP1(vcvtq_n_s16_v, aarch64_neon_vcvtfp2fxs, 0),
NEONMAP1(vcvtq_n_s32_v, aarch64_neon_vcvtfp2fxs, 0),		NEONMAP1(vcvtq_n_s32_v, aarch64_neon_vcvtfp2fxs, 0),
NEONMAP1(vcvtq_n_s64_v, aarch64_neon_vcvtfp2fxs, 0),		NEONMAP1(vcvtq_n_s64_v, aarch64_neon_vcvtfp2fxs, 0),
		NEONMAP1(vcvtq_n_u16_v, aarch64_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvtq_n_u32_v, aarch64_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvtq_n_u32_v, aarch64_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvtq_n_u64_v, aarch64_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvtq_n_u64_v, aarch64_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvtx_f32_v, aarch64_neon_fcvtxn, AddRetType \| Add1ArgType),		NEONMAP1(vcvtx_f32_v, aarch64_neon_fcvtxn, AddRetType \| Add1ArgType),
NEONMAP0(vext_v),		NEONMAP0(vext_v),
NEONMAP0(vextq_v),		NEONMAP0(vextq_v),
NEONMAP0(vfma_v),		NEONMAP0(vfma_v),
NEONMAP0(vfmaq_v),		NEONMAP0(vfmaq_v),
NEONMAP2(vhadd_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhadd_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),
▲ Show 20 Lines • Show All 452 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
case NEON::BI__builtin_neon_vcalt_v:		case NEON::BI__builtin_neon_vcalt_v:
case NEON::BI__builtin_neon_vcaltq_v:		case NEON::BI__builtin_neon_vcaltq_v:
std::swap(Ops[0], Ops[1]);		std::swap(Ops[0], Ops[1]);
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case NEON::BI__builtin_neon_vcage_v:		case NEON::BI__builtin_neon_vcage_v:
case NEON::BI__builtin_neon_vcageq_v:		case NEON::BI__builtin_neon_vcageq_v:
case NEON::BI__builtin_neon_vcagt_v:		case NEON::BI__builtin_neon_vcagt_v:
case NEON::BI__builtin_neon_vcagtq_v: {		case NEON::BI__builtin_neon_vcagtq_v: {
llvm::Type *VecFlt = llvm::VectorType::get(		llvm::Type *Ty;
VTy->getScalarSizeInBits() == 32 ? FloatTy : DoubleTy,		switch (VTy->getScalarSizeInBits()) {
VTy->getNumElements());		default: llvm_unreachable("unexpected type");
		case 32:
		Ty = FloatTy;
		break;
		case 64:
		Ty = DoubleTy;
		break;
		case 16:
		Ty = HalfTy;
		break;
		}
		llvm::Type *VecFlt = llvm::VectorType::get(Ty, VTy->getNumElements());
llvm::Type *Tys[] = { VTy, VecFlt };		llvm::Type *Tys[] = { VTy, VecFlt };
Function *F = CGM.getIntrinsic(LLVMIntrinsic, Tys);		Function *F = CGM.getIntrinsic(LLVMIntrinsic, Tys);
return EmitNeonCall(F, Ops, NameHint);		return EmitNeonCall(F, Ops, NameHint);
}		}
case NEON::BI__builtin_neon_vclz_v:		case NEON::BI__builtin_neon_vclz_v:
case NEON::BI__builtin_neon_vclzq_v:		case NEON::BI__builtin_neon_vclzq_v:
// We generate target-independent intrinsic, which needs a second argument		// We generate target-independent intrinsic, which needs a second argument
// for whether or not clz of zero is undefined; on ARM it isn't.		// for whether or not clz of zero is undefined; on ARM it isn't.
Ops.push_back(Builder.getInt1(getTarget().isCLZForZeroUndef()));		Ops.push_back(Builder.getInt1(getTarget().isCLZForZeroUndef()));
break;		break;
case NEON::BI__builtin_neon_vcvt_f32_v:		case NEON::BI__builtin_neon_vcvt_f32_v:
case NEON::BI__builtin_neon_vcvtq_f32_v:		case NEON::BI__builtin_neon_vcvtq_f32_v:
Ops[0] = Builder.CreateBitCast(Ops[0], Ty);		Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
Ty = GetNeonType(this, NeonTypeFlags(NeonTypeFlags::Float32, false, Quad));		Ty = GetNeonType(this, NeonTypeFlags(NeonTypeFlags::Float32, false, Quad));
return Usgn ? Builder.CreateUIToFP(Ops[0], Ty, "vcvt")		return Usgn ? Builder.CreateUIToFP(Ops[0], Ty, "vcvt")
: Builder.CreateSIToFP(Ops[0], Ty, "vcvt");		: Builder.CreateSIToFP(Ops[0], Ty, "vcvt");
		case NEON::BI__builtin_neon_vcvt_f16_v:
		case NEON::BI__builtin_neon_vcvtq_f16_v:
		Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
		Ty = GetNeonType(this, NeonTypeFlags(NeonTypeFlags::Float16, false, Quad));
		return Usgn ? Builder.CreateUIToFP(Ops[0], Ty, "vcvt")
		: Builder.CreateSIToFP(Ops[0], Ty, "vcvt");
		case NEON::BI__builtin_neon_vcvt_n_f16_v:
case NEON::BI__builtin_neon_vcvt_n_f32_v:		case NEON::BI__builtin_neon_vcvt_n_f32_v:
case NEON::BI__builtin_neon_vcvt_n_f64_v:		case NEON::BI__builtin_neon_vcvt_n_f64_v:
		case NEON::BI__builtin_neon_vcvtq_n_f16_v:
case NEON::BI__builtin_neon_vcvtq_n_f32_v:		case NEON::BI__builtin_neon_vcvtq_n_f32_v:
case NEON::BI__builtin_neon_vcvtq_n_f64_v: {		case NEON::BI__builtin_neon_vcvtq_n_f64_v: {
llvm::Type *Tys[2] = { GetFloatNeonType(this, Type), Ty };		llvm::Type *Tys[2] = { GetFloatNeonType(this, Type), Ty };
Int = Usgn ? LLVMIntrinsic : AltLLVMIntrinsic;		Int = Usgn ? LLVMIntrinsic : AltLLVMIntrinsic;
Function *F = CGM.getIntrinsic(Int, Tys);		Function *F = CGM.getIntrinsic(Int, Tys);
return EmitNeonCall(F, Ops, "vcvt_n");		return EmitNeonCall(F, Ops, "vcvt_n");
}		}
		case NEON::BI__builtin_neon_vcvt_n_s16_v:
case NEON::BI__builtin_neon_vcvt_n_s32_v:		case NEON::BI__builtin_neon_vcvt_n_s32_v:
		case NEON::BI__builtin_neon_vcvt_n_u16_v:
case NEON::BI__builtin_neon_vcvt_n_u32_v:		case NEON::BI__builtin_neon_vcvt_n_u32_v:
case NEON::BI__builtin_neon_vcvt_n_s64_v:		case NEON::BI__builtin_neon_vcvt_n_s64_v:
case NEON::BI__builtin_neon_vcvt_n_u64_v:		case NEON::BI__builtin_neon_vcvt_n_u64_v:
		case NEON::BI__builtin_neon_vcvtq_n_s16_v:
case NEON::BI__builtin_neon_vcvtq_n_s32_v:		case NEON::BI__builtin_neon_vcvtq_n_s32_v:
		case NEON::BI__builtin_neon_vcvtq_n_u16_v:
case NEON::BI__builtin_neon_vcvtq_n_u32_v:		case NEON::BI__builtin_neon_vcvtq_n_u32_v:
case NEON::BI__builtin_neon_vcvtq_n_s64_v:		case NEON::BI__builtin_neon_vcvtq_n_s64_v:
case NEON::BI__builtin_neon_vcvtq_n_u64_v: {		case NEON::BI__builtin_neon_vcvtq_n_u64_v: {
llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };		llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };
Function *F = CGM.getIntrinsic(LLVMIntrinsic, Tys);		Function *F = CGM.getIntrinsic(LLVMIntrinsic, Tys);
return EmitNeonCall(F, Ops, "vcvt_n");		return EmitNeonCall(F, Ops, "vcvt_n");
}		}
case NEON::BI__builtin_neon_vcvt_s32_v:		case NEON::BI__builtin_neon_vcvt_s32_v:
case NEON::BI__builtin_neon_vcvt_u32_v:		case NEON::BI__builtin_neon_vcvt_u32_v:
case NEON::BI__builtin_neon_vcvt_s64_v:		case NEON::BI__builtin_neon_vcvt_s64_v:
case NEON::BI__builtin_neon_vcvt_u64_v:		case NEON::BI__builtin_neon_vcvt_u64_v:
		case NEON::BI__builtin_neon_vcvt_s16_v:
		case NEON::BI__builtin_neon_vcvt_u16_v:
case NEON::BI__builtin_neon_vcvtq_s32_v:		case NEON::BI__builtin_neon_vcvtq_s32_v:
case NEON::BI__builtin_neon_vcvtq_u32_v:		case NEON::BI__builtin_neon_vcvtq_u32_v:
case NEON::BI__builtin_neon_vcvtq_s64_v:		case NEON::BI__builtin_neon_vcvtq_s64_v:
case NEON::BI__builtin_neon_vcvtq_u64_v: {		case NEON::BI__builtin_neon_vcvtq_u64_v:
		case NEON::BI__builtin_neon_vcvtq_s16_v:
		case NEON::BI__builtin_neon_vcvtq_u16_v: {
Ops[0] = Builder.CreateBitCast(Ops[0], GetFloatNeonType(this, Type));		Ops[0] = Builder.CreateBitCast(Ops[0], GetFloatNeonType(this, Type));
return Usgn ? Builder.CreateFPToUI(Ops[0], Ty, "vcvt")		return Usgn ? Builder.CreateFPToUI(Ops[0], Ty, "vcvt")
: Builder.CreateFPToSI(Ops[0], Ty, "vcvt");		: Builder.CreateFPToSI(Ops[0], Ty, "vcvt");
}		}
		case NEON::BI__builtin_neon_vcvta_s16_v:
case NEON::BI__builtin_neon_vcvta_s32_v:		case NEON::BI__builtin_neon_vcvta_s32_v:
case NEON::BI__builtin_neon_vcvta_s64_v:		case NEON::BI__builtin_neon_vcvta_s64_v:
case NEON::BI__builtin_neon_vcvta_u32_v:		case NEON::BI__builtin_neon_vcvta_u32_v:
case NEON::BI__builtin_neon_vcvta_u64_v:		case NEON::BI__builtin_neon_vcvta_u64_v:
		case NEON::BI__builtin_neon_vcvtaq_s16_v:
case NEON::BI__builtin_neon_vcvtaq_s32_v:		case NEON::BI__builtin_neon_vcvtaq_s32_v:
case NEON::BI__builtin_neon_vcvtaq_s64_v:		case NEON::BI__builtin_neon_vcvtaq_s64_v:
		case NEON::BI__builtin_neon_vcvtaq_u16_v:
case NEON::BI__builtin_neon_vcvtaq_u32_v:		case NEON::BI__builtin_neon_vcvtaq_u32_v:
case NEON::BI__builtin_neon_vcvtaq_u64_v:		case NEON::BI__builtin_neon_vcvtaq_u64_v:
		case NEON::BI__builtin_neon_vcvtn_s16_v:
case NEON::BI__builtin_neon_vcvtn_s32_v:		case NEON::BI__builtin_neon_vcvtn_s32_v:
case NEON::BI__builtin_neon_vcvtn_s64_v:		case NEON::BI__builtin_neon_vcvtn_s64_v:
		case NEON::BI__builtin_neon_vcvtn_u16_v:
case NEON::BI__builtin_neon_vcvtn_u32_v:		case NEON::BI__builtin_neon_vcvtn_u32_v:
case NEON::BI__builtin_neon_vcvtn_u64_v:		case NEON::BI__builtin_neon_vcvtn_u64_v:
		case NEON::BI__builtin_neon_vcvtnq_s16_v:
case NEON::BI__builtin_neon_vcvtnq_s32_v:		case NEON::BI__builtin_neon_vcvtnq_s32_v:
case NEON::BI__builtin_neon_vcvtnq_s64_v:		case NEON::BI__builtin_neon_vcvtnq_s64_v:
		case NEON::BI__builtin_neon_vcvtnq_u16_v:
case NEON::BI__builtin_neon_vcvtnq_u32_v:		case NEON::BI__builtin_neon_vcvtnq_u32_v:
case NEON::BI__builtin_neon_vcvtnq_u64_v:		case NEON::BI__builtin_neon_vcvtnq_u64_v:
		case NEON::BI__builtin_neon_vcvtp_s16_v:
case NEON::BI__builtin_neon_vcvtp_s32_v:		case NEON::BI__builtin_neon_vcvtp_s32_v:
case NEON::BI__builtin_neon_vcvtp_s64_v:		case NEON::BI__builtin_neon_vcvtp_s64_v:
		case NEON::BI__builtin_neon_vcvtp_u16_v:
case NEON::BI__builtin_neon_vcvtp_u32_v:		case NEON::BI__builtin_neon_vcvtp_u32_v:
case NEON::BI__builtin_neon_vcvtp_u64_v:		case NEON::BI__builtin_neon_vcvtp_u64_v:
		case NEON::BI__builtin_neon_vcvtpq_s16_v:
case NEON::BI__builtin_neon_vcvtpq_s32_v:		case NEON::BI__builtin_neon_vcvtpq_s32_v:
case NEON::BI__builtin_neon_vcvtpq_s64_v:		case NEON::BI__builtin_neon_vcvtpq_s64_v:
		case NEON::BI__builtin_neon_vcvtpq_u16_v:
case NEON::BI__builtin_neon_vcvtpq_u32_v:		case NEON::BI__builtin_neon_vcvtpq_u32_v:
case NEON::BI__builtin_neon_vcvtpq_u64_v:		case NEON::BI__builtin_neon_vcvtpq_u64_v:
		case NEON::BI__builtin_neon_vcvtm_s16_v:
case NEON::BI__builtin_neon_vcvtm_s32_v:		case NEON::BI__builtin_neon_vcvtm_s32_v:
case NEON::BI__builtin_neon_vcvtm_s64_v:		case NEON::BI__builtin_neon_vcvtm_s64_v:
		case NEON::BI__builtin_neon_vcvtm_u16_v:
case NEON::BI__builtin_neon_vcvtm_u32_v:		case NEON::BI__builtin_neon_vcvtm_u32_v:
case NEON::BI__builtin_neon_vcvtm_u64_v:		case NEON::BI__builtin_neon_vcvtm_u64_v:
		case NEON::BI__builtin_neon_vcvtmq_s16_v:
case NEON::BI__builtin_neon_vcvtmq_s32_v:		case NEON::BI__builtin_neon_vcvtmq_s32_v:
case NEON::BI__builtin_neon_vcvtmq_s64_v:		case NEON::BI__builtin_neon_vcvtmq_s64_v:
		case NEON::BI__builtin_neon_vcvtmq_u16_v:
case NEON::BI__builtin_neon_vcvtmq_u32_v:		case NEON::BI__builtin_neon_vcvtmq_u32_v:
case NEON::BI__builtin_neon_vcvtmq_u64_v: {		case NEON::BI__builtin_neon_vcvtmq_u64_v: {
llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };		llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };
return EmitNeonCall(CGM.getIntrinsic(LLVMIntrinsic, Tys), Ops, NameHint);		return EmitNeonCall(CGM.getIntrinsic(LLVMIntrinsic, Tys), Ops, NameHint);
}		}
case NEON::BI__builtin_neon_vext_v:		case NEON::BI__builtin_neon_vext_v:
case NEON::BI__builtin_neon_vextq_v: {		case NEON::BI__builtin_neon_vextq_v: {
int CV = cast<ConstantInt>(Ops[2])->getSExtValue();		int CV = cast<ConstantInt>(Ops[2])->getSExtValue();
▲ Show 20 Lines • Show All 2,193 Lines • ▼ Show 20 Lines	case NEON::BI__builtin_neon_vfmaq_laneq_v: {
Value *F = CGM.getIntrinsic(Intrinsic::fma, Ty);		Value *F = CGM.getIntrinsic(Intrinsic::fma, Ty);
Ops[0] = Builder.CreateBitCast(Ops[0], Ty);		Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
Ops[1] = Builder.CreateBitCast(Ops[1], Ty);		Ops[1] = Builder.CreateBitCast(Ops[1], Ty);

Ops[2] = Builder.CreateBitCast(Ops[2], Ty);		Ops[2] = Builder.CreateBitCast(Ops[2], Ty);
Ops[2] = EmitNeonSplat(Ops[2], cast<ConstantInt>(Ops[3]));		Ops[2] = EmitNeonSplat(Ops[2], cast<ConstantInt>(Ops[3]));
return Builder.CreateCall(F, {Ops[2], Ops[1], Ops[0]});		return Builder.CreateCall(F, {Ops[2], Ops[1], Ops[0]});
}		}
		case NEON::BI__builtin_neon_vfmah_lane_f16:
case NEON::BI__builtin_neon_vfmas_lane_f32:		case NEON::BI__builtin_neon_vfmas_lane_f32:
		case NEON::BI__builtin_neon_vfmah_laneq_f16:
case NEON::BI__builtin_neon_vfmas_laneq_f32:		case NEON::BI__builtin_neon_vfmas_laneq_f32:
case NEON::BI__builtin_neon_vfmad_lane_f64:		case NEON::BI__builtin_neon_vfmad_lane_f64:
case NEON::BI__builtin_neon_vfmad_laneq_f64: {		case NEON::BI__builtin_neon_vfmad_laneq_f64: {
Ops.push_back(EmitScalarExpr(E->getArg(3)));		Ops.push_back(EmitScalarExpr(E->getArg(3)));
llvm::Type *Ty = ConvertType(E->getCallReturnType(getContext()));		llvm::Type *Ty = ConvertType(E->getCallReturnType(getContext()));
Value *F = CGM.getIntrinsic(Intrinsic::fma, Ty);		Value *F = CGM.getIntrinsic(Intrinsic::fma, Ty);
Ops[2] = Builder.CreateExtractElement(Ops[2], Ops[3], "extract");		Ops[2] = Builder.CreateExtractElement(Ops[2], Ops[3], "extract");
return Builder.CreateCall(F, {Ops[1], Ops[2], Ops[0]});		return Builder.CreateCall(F, {Ops[1], Ops[2], Ops[0]});
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	case NEON::BI__builtin_neon_vcvt_f32_f64: {
Ops[0] = Builder.CreateBitCast(Ops[0], GetNeonType(this, SrcFlag));		Ops[0] = Builder.CreateBitCast(Ops[0], GetNeonType(this, SrcFlag));

return Builder.CreateFPTrunc(Ops[0], Ty, "vcvt");		return Builder.CreateFPTrunc(Ops[0], Ty, "vcvt");
}		}
case NEON::BI__builtin_neon_vcvt_s32_v:		case NEON::BI__builtin_neon_vcvt_s32_v:
case NEON::BI__builtin_neon_vcvt_u32_v:		case NEON::BI__builtin_neon_vcvt_u32_v:
case NEON::BI__builtin_neon_vcvt_s64_v:		case NEON::BI__builtin_neon_vcvt_s64_v:
case NEON::BI__builtin_neon_vcvt_u64_v:		case NEON::BI__builtin_neon_vcvt_u64_v:
		case NEON::BI__builtin_neon_vcvt_s16_v:
		case NEON::BI__builtin_neon_vcvt_u16_v:
case NEON::BI__builtin_neon_vcvtq_s32_v:		case NEON::BI__builtin_neon_vcvtq_s32_v:
case NEON::BI__builtin_neon_vcvtq_u32_v:		case NEON::BI__builtin_neon_vcvtq_u32_v:
case NEON::BI__builtin_neon_vcvtq_s64_v:		case NEON::BI__builtin_neon_vcvtq_s64_v:
case NEON::BI__builtin_neon_vcvtq_u64_v: {		case NEON::BI__builtin_neon_vcvtq_u64_v:
		case NEON::BI__builtin_neon_vcvtq_s16_v:
		case NEON::BI__builtin_neon_vcvtq_u16_v: {
Ops[0] = Builder.CreateBitCast(Ops[0], GetFloatNeonType(this, Type));		Ops[0] = Builder.CreateBitCast(Ops[0], GetFloatNeonType(this, Type));
if (usgn)		if (usgn)
return Builder.CreateFPToUI(Ops[0], Ty);		return Builder.CreateFPToUI(Ops[0], Ty);
return Builder.CreateFPToSI(Ops[0], Ty);		return Builder.CreateFPToSI(Ops[0], Ty);
}		}
		case NEON::BI__builtin_neon_vcvta_s16_v:
case NEON::BI__builtin_neon_vcvta_s32_v:		case NEON::BI__builtin_neon_vcvta_s32_v:
		case NEON::BI__builtin_neon_vcvtaq_s16_v:
case NEON::BI__builtin_neon_vcvtaq_s32_v:		case NEON::BI__builtin_neon_vcvtaq_s32_v:
case NEON::BI__builtin_neon_vcvta_u32_v:		case NEON::BI__builtin_neon_vcvta_u32_v:
		case NEON::BI__builtin_neon_vcvtaq_u16_v:
case NEON::BI__builtin_neon_vcvtaq_u32_v:		case NEON::BI__builtin_neon_vcvtaq_u32_v:
case NEON::BI__builtin_neon_vcvta_s64_v:		case NEON::BI__builtin_neon_vcvta_s64_v:
case NEON::BI__builtin_neon_vcvtaq_s64_v:		case NEON::BI__builtin_neon_vcvtaq_s64_v:
case NEON::BI__builtin_neon_vcvta_u64_v:		case NEON::BI__builtin_neon_vcvta_u64_v:
case NEON::BI__builtin_neon_vcvtaq_u64_v: {		case NEON::BI__builtin_neon_vcvtaq_u64_v: {
Int = usgn ? Intrinsic::aarch64_neon_fcvtau : Intrinsic::aarch64_neon_fcvtas;		Int = usgn ? Intrinsic::aarch64_neon_fcvtau : Intrinsic::aarch64_neon_fcvtas;
llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };		llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };
return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vcvta");		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vcvta");
}		}
		case NEON::BI__builtin_neon_vcvtm_s16_v:
case NEON::BI__builtin_neon_vcvtm_s32_v:		case NEON::BI__builtin_neon_vcvtm_s32_v:
		case NEON::BI__builtin_neon_vcvtmq_s16_v:
case NEON::BI__builtin_neon_vcvtmq_s32_v:		case NEON::BI__builtin_neon_vcvtmq_s32_v:
		case NEON::BI__builtin_neon_vcvtm_u16_v:
case NEON::BI__builtin_neon_vcvtm_u32_v:		case NEON::BI__builtin_neon_vcvtm_u32_v:
		case NEON::BI__builtin_neon_vcvtmq_u16_v:
case NEON::BI__builtin_neon_vcvtmq_u32_v:		case NEON::BI__builtin_neon_vcvtmq_u32_v:
case NEON::BI__builtin_neon_vcvtm_s64_v:		case NEON::BI__builtin_neon_vcvtm_s64_v:
case NEON::BI__builtin_neon_vcvtmq_s64_v:		case NEON::BI__builtin_neon_vcvtmq_s64_v:
case NEON::BI__builtin_neon_vcvtm_u64_v:		case NEON::BI__builtin_neon_vcvtm_u64_v:
case NEON::BI__builtin_neon_vcvtmq_u64_v: {		case NEON::BI__builtin_neon_vcvtmq_u64_v: {
Int = usgn ? Intrinsic::aarch64_neon_fcvtmu : Intrinsic::aarch64_neon_fcvtms;		Int = usgn ? Intrinsic::aarch64_neon_fcvtmu : Intrinsic::aarch64_neon_fcvtms;
llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };		llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };
return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vcvtm");		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vcvtm");
}		}
		case NEON::BI__builtin_neon_vcvtn_s16_v:
case NEON::BI__builtin_neon_vcvtn_s32_v:		case NEON::BI__builtin_neon_vcvtn_s32_v:
		case NEON::BI__builtin_neon_vcvtnq_s16_v:
case NEON::BI__builtin_neon_vcvtnq_s32_v:		case NEON::BI__builtin_neon_vcvtnq_s32_v:
		case NEON::BI__builtin_neon_vcvtn_u16_v:
case NEON::BI__builtin_neon_vcvtn_u32_v:		case NEON::BI__builtin_neon_vcvtn_u32_v:
		case NEON::BI__builtin_neon_vcvtnq_u16_v:
case NEON::BI__builtin_neon_vcvtnq_u32_v:		case NEON::BI__builtin_neon_vcvtnq_u32_v:
case NEON::BI__builtin_neon_vcvtn_s64_v:		case NEON::BI__builtin_neon_vcvtn_s64_v:
case NEON::BI__builtin_neon_vcvtnq_s64_v:		case NEON::BI__builtin_neon_vcvtnq_s64_v:
case NEON::BI__builtin_neon_vcvtn_u64_v:		case NEON::BI__builtin_neon_vcvtn_u64_v:
case NEON::BI__builtin_neon_vcvtnq_u64_v: {		case NEON::BI__builtin_neon_vcvtnq_u64_v: {
Int = usgn ? Intrinsic::aarch64_neon_fcvtnu : Intrinsic::aarch64_neon_fcvtns;		Int = usgn ? Intrinsic::aarch64_neon_fcvtnu : Intrinsic::aarch64_neon_fcvtns;
llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };		llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };
return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vcvtn");		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vcvtn");
}		}
		case NEON::BI__builtin_neon_vcvtp_s16_v:
case NEON::BI__builtin_neon_vcvtp_s32_v:		case NEON::BI__builtin_neon_vcvtp_s32_v:
		case NEON::BI__builtin_neon_vcvtpq_s16_v:
case NEON::BI__builtin_neon_vcvtpq_s32_v:		case NEON::BI__builtin_neon_vcvtpq_s32_v:
		case NEON::BI__builtin_neon_vcvtp_u16_v:
case NEON::BI__builtin_neon_vcvtp_u32_v:		case NEON::BI__builtin_neon_vcvtp_u32_v:
		case NEON::BI__builtin_neon_vcvtpq_u16_v:
case NEON::BI__builtin_neon_vcvtpq_u32_v:		case NEON::BI__builtin_neon_vcvtpq_u32_v:
case NEON::BI__builtin_neon_vcvtp_s64_v:		case NEON::BI__builtin_neon_vcvtp_s64_v:
case NEON::BI__builtin_neon_vcvtpq_s64_v:		case NEON::BI__builtin_neon_vcvtpq_s64_v:
case NEON::BI__builtin_neon_vcvtp_u64_v:		case NEON::BI__builtin_neon_vcvtp_u64_v:
case NEON::BI__builtin_neon_vcvtpq_u64_v: {		case NEON::BI__builtin_neon_vcvtpq_u64_v: {
Int = usgn ? Intrinsic::aarch64_neon_fcvtpu : Intrinsic::aarch64_neon_fcvtps;		Int = usgn ? Intrinsic::aarch64_neon_fcvtpu : Intrinsic::aarch64_neon_fcvtps;
llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };		llvm::Type *Tys[2] = { Ty, GetFloatNeonType(this, Type) };
return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vcvtp");		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vcvtp");
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	case NEON::BI__builtin_neon_vmaxvq_s16: {
Int = Intrinsic::aarch64_neon_smaxv;		Int = Intrinsic::aarch64_neon_smaxv;
Ty = Int32Ty;		Ty = Int32Ty;
VTy = llvm::VectorType::get(Int16Ty, 8);		VTy = llvm::VectorType::get(Int16Ty, 8);
llvm::Type *Tys[2] = { Ty, VTy };		llvm::Type *Tys[2] = { Ty, VTy };
Ops.push_back(EmitScalarExpr(E->getArg(0)));		Ops.push_back(EmitScalarExpr(E->getArg(0)));
Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
return Builder.CreateTrunc(Ops[0], Int16Ty);		return Builder.CreateTrunc(Ops[0], Int16Ty);
}		}
		case NEON::BI__builtin_neon_vmaxv_f16: {
		Int = Intrinsic::aarch64_neon_fmaxv;
		Ty = HalfTy;
		VTy = llvm::VectorType::get(HalfTy, 4);
		llvm::Type *Tys[2] = { Ty, VTy };
		Ops.push_back(EmitScalarExpr(E->getArg(0)));
		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
		return Builder.CreateTrunc(Ops[0], HalfTy);
		}
		case NEON::BI__builtin_neon_vmaxvq_f16: {
		Int = Intrinsic::aarch64_neon_fmaxv;
		Ty = HalfTy;
		VTy = llvm::VectorType::get(HalfTy, 8);
		llvm::Type *Tys[2] = { Ty, VTy };
		Ops.push_back(EmitScalarExpr(E->getArg(0)));
		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxv");
		return Builder.CreateTrunc(Ops[0], HalfTy);
		}
case NEON::BI__builtin_neon_vminv_u8: {		case NEON::BI__builtin_neon_vminv_u8: {
Int = Intrinsic::aarch64_neon_uminv;		Int = Intrinsic::aarch64_neon_uminv;
Ty = Int32Ty;		Ty = Int32Ty;
VTy = llvm::VectorType::get(Int8Ty, 8);		VTy = llvm::VectorType::get(Int8Ty, 8);
llvm::Type *Tys[2] = { Ty, VTy };		llvm::Type *Tys[2] = { Ty, VTy };
Ops.push_back(EmitScalarExpr(E->getArg(0)));		Ops.push_back(EmitScalarExpr(E->getArg(0)));
Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
return Builder.CreateTrunc(Ops[0], Int8Ty);		return Builder.CreateTrunc(Ops[0], Int8Ty);
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	case NEON::BI__builtin_neon_vminvq_s16: {
Int = Intrinsic::aarch64_neon_sminv;		Int = Intrinsic::aarch64_neon_sminv;
Ty = Int32Ty;		Ty = Int32Ty;
VTy = llvm::VectorType::get(Int16Ty, 8);		VTy = llvm::VectorType::get(Int16Ty, 8);
llvm::Type *Tys[2] = { Ty, VTy };		llvm::Type *Tys[2] = { Ty, VTy };
Ops.push_back(EmitScalarExpr(E->getArg(0)));		Ops.push_back(EmitScalarExpr(E->getArg(0)));
Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
return Builder.CreateTrunc(Ops[0], Int16Ty);		return Builder.CreateTrunc(Ops[0], Int16Ty);
}		}
		case NEON::BI__builtin_neon_vminv_f16: {
		Int = Intrinsic::aarch64_neon_fminv;
		Ty = HalfTy;
		VTy = llvm::VectorType::get(HalfTy, 4);
		llvm::Type *Tys[2] = { Ty, VTy };
		Ops.push_back(EmitScalarExpr(E->getArg(0)));
		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
		return Builder.CreateTrunc(Ops[0], HalfTy);
		}
		case NEON::BI__builtin_neon_vminvq_f16: {
		Int = Intrinsic::aarch64_neon_fminv;
		Ty = HalfTy;
		VTy = llvm::VectorType::get(HalfTy, 8);
		llvm::Type *Tys[2] = { Ty, VTy };
		Ops.push_back(EmitScalarExpr(E->getArg(0)));
		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminv");
		return Builder.CreateTrunc(Ops[0], HalfTy);
		}
		case NEON::BI__builtin_neon_vmaxnmv_f16: {
		Int = Intrinsic::aarch64_neon_fmaxnmv;
		Ty = HalfTy;
		VTy = llvm::VectorType::get(HalfTy, 4);
		llvm::Type *Tys[2] = { Ty, VTy };
		Ops.push_back(EmitScalarExpr(E->getArg(0)));
		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxnmv");
		return Builder.CreateTrunc(Ops[0], HalfTy);
		}
		case NEON::BI__builtin_neon_vmaxnmvq_f16: {
		Int = Intrinsic::aarch64_neon_fmaxnmv;
		Ty = HalfTy;
		VTy = llvm::VectorType::get(HalfTy, 8);
		llvm::Type *Tys[2] = { Ty, VTy };
		Ops.push_back(EmitScalarExpr(E->getArg(0)));
		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmaxnmv");
		return Builder.CreateTrunc(Ops[0], HalfTy);
		}
		case NEON::BI__builtin_neon_vminnmv_f16: {
		Int = Intrinsic::aarch64_neon_fminnmv;
		Ty = HalfTy;
		VTy = llvm::VectorType::get(HalfTy, 4);
		llvm::Type *Tys[2] = { Ty, VTy };
		Ops.push_back(EmitScalarExpr(E->getArg(0)));
		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminnmv");
		return Builder.CreateTrunc(Ops[0], HalfTy);
		}
		case NEON::BI__builtin_neon_vminnmvq_f16: {
		Int = Intrinsic::aarch64_neon_fminnmv;
		Ty = HalfTy;
		VTy = llvm::VectorType::get(HalfTy, 8);
		llvm::Type *Tys[2] = { Ty, VTy };
		Ops.push_back(EmitScalarExpr(E->getArg(0)));
		Ops[0] = EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vminnmv");
		return Builder.CreateTrunc(Ops[0], HalfTy);
		}
case NEON::BI__builtin_neon_vmul_n_f64: {		case NEON::BI__builtin_neon_vmul_n_f64: {
Ops[0] = Builder.CreateBitCast(Ops[0], DoubleTy);		Ops[0] = Builder.CreateBitCast(Ops[0], DoubleTy);
Value *RHS = Builder.CreateBitCast(EmitScalarExpr(E->getArg(1)), DoubleTy);		Value *RHS = Builder.CreateBitCast(EmitScalarExpr(E->getArg(1)), DoubleTy);
return Builder.CreateFMul(Ops[0], RHS);		return Builder.CreateFMul(Ops[0], RHS);
}		}
case NEON::BI__builtin_neon_vaddlv_u8: {		case NEON::BI__builtin_neon_vaddlv_u8: {
Int = Intrinsic::aarch64_neon_uaddlv;		Int = Intrinsic::aarch64_neon_uaddlv;
Ty = Int32Ty;		Ty = Int32Ty;
▲ Show 20 Lines • Show All 2,621 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	CodeGenModule::CodeGenModule(ASTContext &C, const HeaderSearchOptions &HSO,

// Initialize the type cache.		// Initialize the type cache.
llvm::LLVMContext &LLVMContext = M.getContext();		llvm::LLVMContext &LLVMContext = M.getContext();
VoidTy = llvm::Type::getVoidTy(LLVMContext);		VoidTy = llvm::Type::getVoidTy(LLVMContext);
Int8Ty = llvm::Type::getInt8Ty(LLVMContext);		Int8Ty = llvm::Type::getInt8Ty(LLVMContext);
Int16Ty = llvm::Type::getInt16Ty(LLVMContext);		Int16Ty = llvm::Type::getInt16Ty(LLVMContext);
Int32Ty = llvm::Type::getInt32Ty(LLVMContext);		Int32Ty = llvm::Type::getInt32Ty(LLVMContext);
Int64Ty = llvm::Type::getInt64Ty(LLVMContext);		Int64Ty = llvm::Type::getInt64Ty(LLVMContext);
		HalfTy = llvm::Type::getHalfTy(LLVMContext);
FloatTy = llvm::Type::getFloatTy(LLVMContext);		FloatTy = llvm::Type::getFloatTy(LLVMContext);
DoubleTy = llvm::Type::getDoubleTy(LLVMContext);		DoubleTy = llvm::Type::getDoubleTy(LLVMContext);
PointerWidthInBits = C.getTargetInfo().getPointerWidth(0);		PointerWidthInBits = C.getTargetInfo().getPointerWidth(0);
PointerAlignInBytes =		PointerAlignInBytes =
C.toCharUnitsFromBits(C.getTargetInfo().getPointerAlign(0)).getQuantity();		C.toCharUnitsFromBits(C.getTargetInfo().getPointerAlign(0)).getQuantity();
SizeSizeInBytes =		SizeSizeInBytes =
C.toCharUnitsFromBits(C.getTargetInfo().getMaxPointerWidth()).getQuantity();		C.toCharUnitsFromBits(C.getTargetInfo().getMaxPointerWidth()).getQuantity();
IntAlignInBytes =		IntAlignInBytes =
▲ Show 20 Lines • Show All 4,403 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CodeGenTypeCache.h

	Show All 30 Lines
	/// constructor and then copied around into new CodeGenFunctions.			/// constructor and then copied around into new CodeGenFunctions.
	struct CodeGenTypeCache {			struct CodeGenTypeCache {
	/// void			/// void
	llvm::Type *VoidTy;			llvm::Type *VoidTy;

	/// i8, i16, i32, and i64			/// i8, i16, i32, and i64
	llvm::IntegerType Int8Ty, Int16Ty, Int32Ty, Int64Ty;			llvm::IntegerType Int8Ty, Int16Ty, Int32Ty, Int64Ty;
	/// float, double			/// float, double
	llvm::Type FloatTy, DoubleTy;			llvm::Type HalfTy, FloatTy, *DoubleTy;

	/// int			/// int
	llvm::IntegerType *IntTy;			llvm::IntegerType *IntTy;

	/// intptr_t, size_t, and ptrdiff_t, which we assume are the same size.			/// intptr_t, size_t, and ptrdiff_t, which we assume are the same size.
	union {			union {
	llvm::IntegerType *IntPtrTy;			llvm::IntegerType *IntPtrTy;
	llvm::IntegerType *SizeTy;			llvm::IntegerType *SizeTy;
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-intrinsics.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,031 Lines • ▼ Show 20 Lines
// CHECK: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]]		// CHECK: [[TMP2:%.]] = load <2 x i64>, <2 x i64> [[TMP1]]
// CHECK: ret <2 x i64> [[TMP2]]		// CHECK: ret <2 x i64> [[TMP2]]
int64x2_t test_vld1q_s64(int64_t const *a) {		int64x2_t test_vld1q_s64(int64_t const *a) {
return vld1q_s64(a);		return vld1q_s64(a);
}		}

// CHECK-LABEL: @test_vld1q_f16(		// CHECK-LABEL: @test_vld1q_f16(
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <8 x i16>*		// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <8 x half>*
// CHECK: [[TMP2:%.]] = load <8 x i16>, <8 x i16> [[TMP1]]		// CHECK: [[TMP2:%.]] = load <8 x half>, <8 x half> [[TMP1]]
// CHECK: [[TMP3:%.*]] = bitcast <8 x i16> [[TMP2]] to <8 x half>		// CHECK: ret <8 x half> [[TMP2]]
// CHECK: ret <8 x half> [[TMP3]]
float16x8_t test_vld1q_f16(float16_t const *a) {		float16x8_t test_vld1q_f16(float16_t const *a) {
return vld1q_f16(a);		return vld1q_f16(a);
}		}

// CHECK-LABEL: @test_vld1q_f32(		// CHECK-LABEL: @test_vld1q_f32(
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <4 x float>*		// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <4 x float>*
// CHECK: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]]		// CHECK: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]]
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
// CHECK: [[TMP2:%.]] = load <1 x i64>, <1 x i64> [[TMP1]]		// CHECK: [[TMP2:%.]] = load <1 x i64>, <1 x i64> [[TMP1]]
// CHECK: ret <1 x i64> [[TMP2]]		// CHECK: ret <1 x i64> [[TMP2]]
int64x1_t test_vld1_s64(int64_t const *a) {		int64x1_t test_vld1_s64(int64_t const *a) {
return vld1_s64(a);		return vld1_s64(a);
}		}

// CHECK-LABEL: @test_vld1_f16(		// CHECK-LABEL: @test_vld1_f16(
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <4 x i16>*		// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <4 x half>*
// CHECK: [[TMP2:%.]] = load <4 x i16>, <4 x i16> [[TMP1]]		// CHECK: [[TMP2:%.]] = load <4 x half>, <4 x half> [[TMP1]]
// CHECK: [[TMP3:%.*]] = bitcast <4 x i16> [[TMP2]] to <4 x half>		// CHECK: ret <4 x half> [[TMP2]]
// CHECK: ret <4 x half> [[TMP3]]
float16x4_t test_vld1_f16(float16_t const *a) {		float16x4_t test_vld1_f16(float16_t const *a) {
return vld1_f16(a);		return vld1_f16(a);
}		}

// CHECK-LABEL: @test_vld1_f32(		// CHECK-LABEL: @test_vld1_f32(
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <2 x float>*		// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <2 x float>*
// CHECK: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]]		// CHECK: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]]
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	int64x2x2_t test_vld2q_s64(int64_t const *a) {
return vld2q_s64(a);		return vld2q_s64(a);
}		}

// CHECK-LABEL: @test_vld2q_f16(		// CHECK-LABEL: @test_vld2q_f16(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16
// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16		// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <8 x i16>*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <8 x half>*
// CHECK: [[VLD2:%.]] = call { <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld2.v8i16.p0v8i16(<8 x i16> [[TMP2]])		// CHECK: [[VLD2:%.]] = call { <8 x half>, <8 x half> } @llvm.aarch64.neon.ld2.v8f16.p0v8f16(<8 x half> [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x i16>, <8 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16> } [[VLD2]], { <8 x i16>, <8 x i16> }* [[TMP3]]		// CHECK: store { <8 x half>, <8 x half> } [[VLD2]], { <8 x half>, <8 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 16, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16		// CHECK: [[TMP6:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x2_t [[TMP6]]		// CHECK: ret %struct.float16x8x2_t [[TMP6]]
float16x8x2_t test_vld2q_f16(float16_t const *a) {		float16x8x2_t test_vld2q_f16(float16_t const *a) {
return vld2q_f16(a);		return vld2q_f16(a);
}		}
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	int64x1x2_t test_vld2_s64(int64_t const *a) {
return vld2_s64(a);		return vld2_s64(a);
}		}

// CHECK-LABEL: @test_vld2_f16(		// CHECK-LABEL: @test_vld2_f16(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8
// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8		// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <4 x i16>*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <4 x half>*
// CHECK: [[VLD2:%.]] = call { <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld2.v4i16.p0v4i16(<4 x i16> [[TMP2]])		// CHECK: [[VLD2:%.]] = call { <4 x half>, <4 x half> } @llvm.aarch64.neon.ld2.v4f16.p0v4f16(<4 x half> [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x i16>, <4 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16> } [[VLD2]], { <4 x i16>, <4 x i16> }* [[TMP3]]		// CHECK: store { <4 x half>, <4 x half> } [[VLD2]], { <4 x half>, <4 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 16, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 16, i32 8, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8		// CHECK: [[TMP6:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x2_t [[TMP6]]		// CHECK: ret %struct.float16x4x2_t [[TMP6]]
float16x4x2_t test_vld2_f16(float16_t const *a) {		float16x4x2_t test_vld2_f16(float16_t const *a) {
return vld2_f16(a);		return vld2_f16(a);
}		}
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	int64x2x3_t test_vld3q_s64(int64_t const *a) {
return vld3q_s64(a);		return vld3q_s64(a);
}		}

// CHECK-LABEL: @test_vld3q_f16(		// CHECK-LABEL: @test_vld3q_f16(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x3_t, align 16		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x3_t, align 16
// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16		// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <8 x i16>*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <8 x half>*
// CHECK: [[VLD3:%.]] = call { <8 x i16>, <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld3.v8i16.p0v8i16(<8 x i16> [[TMP2]])		// CHECK: [[VLD3:%.]] = call { <8 x half>, <8 x half>, <8 x half> } @llvm.aarch64.neon.ld3.v8f16.p0v8f16(<8 x half> [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x i16>, <8 x i16>, <8 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x half>, <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16>, <8 x i16> } [[VLD3]], { <8 x i16>, <8 x i16>, <8 x i16> }* [[TMP3]]		// CHECK: store { <8 x half>, <8 x half>, <8 x half> } [[VLD3]], { <8 x half>, <8 x half>, <8 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x3_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x3_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 48, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 48, i32 16, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16		// CHECK: [[TMP6:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x3_t [[TMP6]]		// CHECK: ret %struct.float16x8x3_t [[TMP6]]
float16x8x3_t test_vld3q_f16(float16_t const *a) {		float16x8x3_t test_vld3q_f16(float16_t const *a) {
return vld3q_f16(a);		return vld3q_f16(a);
}		}
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	int64x1x3_t test_vld3_s64(int64_t const *a) {
return vld3_s64(a);		return vld3_s64(a);
}		}

// CHECK-LABEL: @test_vld3_f16(		// CHECK-LABEL: @test_vld3_f16(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x3_t, align 8		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x3_t, align 8
// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8		// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <4 x i16>*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <4 x half>*
// CHECK: [[VLD3:%.]] = call { <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld3.v4i16.p0v4i16(<4 x i16> [[TMP2]])		// CHECK: [[VLD3:%.]] = call { <4 x half>, <4 x half>, <4 x half> } @llvm.aarch64.neon.ld3.v4f16.p0v4f16(<4 x half> [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x i16>, <4 x i16>, <4 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x half>, <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16>, <4 x i16> } [[VLD3]], { <4 x i16>, <4 x i16>, <4 x i16> }* [[TMP3]]		// CHECK: store { <4 x half>, <4 x half>, <4 x half> } [[VLD3]], { <4 x half>, <4 x half>, <4 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x3_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x3_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 24, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 24, i32 8, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8		// CHECK: [[TMP6:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x3_t [[TMP6]]		// CHECK: ret %struct.float16x4x3_t [[TMP6]]
float16x4x3_t test_vld3_f16(float16_t const *a) {		float16x4x3_t test_vld3_f16(float16_t const *a) {
return vld3_f16(a);		return vld3_f16(a);
}		}
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	int64x2x4_t test_vld4q_s64(int64_t const *a) {
return vld4q_s64(a);		return vld4q_s64(a);
}		}

// CHECK-LABEL: @test_vld4q_f16(		// CHECK-LABEL: @test_vld4q_f16(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x4_t, align 16		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x4_t, align 16
// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16		// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <8 x i16>*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <8 x half>*
// CHECK: [[VLD4:%.]] = call { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld4.v8i16.p0v8i16(<8 x i16> [[TMP2]])		// CHECK: [[VLD4:%.]] = call { <8 x half>, <8 x half>, <8 x half>, <8 x half> } @llvm.aarch64.neon.ld4.v8f16.p0v8f16(<8 x half> [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x half>, <8 x half>, <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> } [[VLD4]], { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }* [[TMP3]]		// CHECK: store { <8 x half>, <8 x half>, <8 x half>, <8 x half> } [[VLD4]], { <8 x half>, <8 x half>, <8 x half>, <8 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x4_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x4_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 64, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 64, i32 16, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16		// CHECK: [[TMP6:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x4_t [[TMP6]]		// CHECK: ret %struct.float16x8x4_t [[TMP6]]
float16x8x4_t test_vld4q_f16(float16_t const *a) {		float16x8x4_t test_vld4q_f16(float16_t const *a) {
return vld4q_f16(a);		return vld4q_f16(a);
}		}
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	int64x1x4_t test_vld4_s64(int64_t const *a) {
return vld4_s64(a);		return vld4_s64(a);
}		}

// CHECK-LABEL: @test_vld4_f16(		// CHECK-LABEL: @test_vld4_f16(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x4_t, align 8		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x4_t, align 8
// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8		// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <4 x i16>*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to <4 x half>*
// CHECK: [[VLD4:%.]] = call { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld4.v4i16.p0v4i16(<4 x i16> [[TMP2]])		// CHECK: [[VLD4:%.]] = call { <4 x half>, <4 x half>, <4 x half>, <4 x half> } @llvm.aarch64.neon.ld4.v4f16.p0v4f16(<4 x half> [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x half>, <4 x half>, <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } [[VLD4]], { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }* [[TMP3]]		// CHECK: store { <4 x half>, <4 x half>, <4 x half>, <4 x half> } [[VLD4]], { <4 x half>, <4 x half>, <4 x half>, <4 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x4_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x4_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 8, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8		// CHECK: [[TMP6:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x4_t [[TMP6]]		// CHECK: ret %struct.float16x4x4_t [[TMP6]]
float16x4x4_t test_vld4_f16(float16_t const *a) {		float16x4x4_t test_vld4_f16(float16_t const *a) {
return vld4_f16(a);		return vld4_f16(a);
}		}
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
// CHECK: ret void		// CHECK: ret void
void test_vst1q_s64(int64_t *a, int64x2_t b) {		void test_vst1q_s64(int64_t *a, int64x2_t b) {
vst1q_s64(a, b);		vst1q_s64(a, b);
}		}

// CHECK-LABEL: @test_vst1q_f16(		// CHECK-LABEL: @test_vst1q_f16(
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP0]] to <8 x i16>*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP0]] to <8 x half>*
// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>		// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
// CHECK: store <8 x i16> [[TMP3]], <8 x i16>* [[TMP2]]		// CHECK: store <8 x half> [[TMP3]], <8 x half>* [[TMP2]]
// CHECK: ret void		// CHECK: ret void
void test_vst1q_f16(float16_t *a, float16x8_t b) {		void test_vst1q_f16(float16_t *a, float16x8_t b) {
vst1q_f16(a, b);		vst1q_f16(a, b);
}		}

// CHECK-LABEL: @test_vst1q_f32(		// CHECK-LABEL: @test_vst1q_f32(
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
// CHECK: ret void		// CHECK: ret void
void test_vst1_s64(int64_t *a, int64x1_t b) {		void test_vst1_s64(int64_t *a, int64x1_t b) {
vst1_s64(a, b);		vst1_s64(a, b);
}		}

// CHECK-LABEL: @test_vst1_f16(		// CHECK-LABEL: @test_vst1_f16(
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP0]] to <4 x i16>*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP0]] to <4 x half>*
// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>		// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
// CHECK: store <4 x i16> [[TMP3]], <4 x i16>* [[TMP2]]		// CHECK: store <4 x half> [[TMP3]], <4 x half>* [[TMP2]]
// CHECK: ret void		// CHECK: ret void
void test_vst1_f16(float16_t *a, float16x4_t b) {		void test_vst1_f16(float16_t *a, float16x4_t b) {
vst1_f16(a, b);		vst1_f16(a, b);
}		}

// CHECK-LABEL: @test_vst1_f32(		// CHECK-LABEL: @test_vst1_f32(
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines
// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0		// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0
// CHECK: [[TMP3:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16		// CHECK: [[TMP3:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
// CHECK: [[TMP4:%.*]] = bitcast <8 x half> [[TMP3]] to <16 x i8>		// CHECK: [[TMP4:%.*]] = bitcast <8 x half> [[TMP3]] to <16 x i8>
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16		// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>
// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>		// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>		// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
// CHECK: call void @llvm.aarch64.neon.st2.v8i16.p0i8(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st2.v8f16.p0i8(<8 x half> [[TMP7]], <8 x half> [[TMP8]], i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst2q_f16(float16_t *a, float16x8x2_t b) {		void test_vst2q_f16(float16_t *a, float16x8x2_t b) {
vst2q_f16(a, b);		vst2q_f16(a, b);
}		}

// CHECK-LABEL: @test_vst2q_f32(		// CHECK-LABEL: @test_vst2q_f32(
// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16		// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16		// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
▲ Show 20 Lines • Show All 291 Lines • ▼ Show 20 Lines
// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0		// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0
// CHECK: [[TMP3:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8		// CHECK: [[TMP3:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
// CHECK: [[TMP4:%.*]] = bitcast <4 x half> [[TMP3]] to <8 x i8>		// CHECK: [[TMP4:%.*]] = bitcast <4 x half> [[TMP3]] to <8 x i8>
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8		// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>
// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>		// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>		// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
// CHECK: call void @llvm.aarch64.neon.st2.v4i16.p0i8(<4 x i16> [[TMP7]], <4 x i16> [[TMP8]], i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st2.v4f16.p0i8(<4 x half> [[TMP7]], <4 x half> [[TMP8]], i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst2_f16(float16_t *a, float16x4x2_t b) {		void test_vst2_f16(float16_t *a, float16x4x2_t b) {
vst2_f16(a, b);		vst2_f16(a, b);
}		}

// CHECK-LABEL: @test_vst2_f32(		// CHECK-LABEL: @test_vst2_f32(
// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8		// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8		// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
▲ Show 20 Lines • Show All 331 Lines • ▼ Show 20 Lines
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16		// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16		// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>		// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>		// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
// CHECK: call void @llvm.aarch64.neon.st3.v8i16.p0i8(<8 x i16> [[TMP9]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st3.v8f16.p0i8(<8 x half> [[TMP9]], <8 x half> [[TMP10]], <8 x half> [[TMP11]], i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst3q_f16(float16_t *a, float16x8x3_t b) {		void test_vst3q_f16(float16_t *a, float16x8x3_t b) {
vst3q_f16(a, b);		vst3q_f16(a, b);
}		}

// CHECK-LABEL: @test_vst3q_f32(		// CHECK-LABEL: @test_vst3q_f32(
// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16		// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16		// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
▲ Show 20 Lines • Show All 349 Lines • ▼ Show 20 Lines
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8		// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8		// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>		// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>		// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
// CHECK: call void @llvm.aarch64.neon.st3.v4i16.p0i8(<4 x i16> [[TMP9]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st3.v4f16.p0i8(<4 x half> [[TMP9]], <4 x half> [[TMP10]], <4 x half> [[TMP11]], i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst3_f16(float16_t *a, float16x4x3_t b) {		void test_vst3_f16(float16_t *a, float16x4x3_t b) {
vst3_f16(a, b);		vst3_f16(a, b);
}		}

// CHECK-LABEL: @test_vst3_f32(		// CHECK-LABEL: @test_vst3_f32(
// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8		// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8		// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
▲ Show 20 Lines • Show All 389 Lines • ▼ Show 20 Lines
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16		// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i64 0, i64 3		// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i64 0, i64 3
// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16		// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>		// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>
// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>		// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>		// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x half>
// CHECK: call void @llvm.aarch64.neon.st4.v8i16.p0i8(<8 x i16> [[TMP11]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st4.v8f16.p0i8(<8 x half> [[TMP11]], <8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst4q_f16(float16_t *a, float16x8x4_t b) {		void test_vst4q_f16(float16_t *a, float16x8x4_t b) {
vst4q_f16(a, b);		vst4q_f16(a, b);
}		}

// CHECK-LABEL: @test_vst4q_f32(		// CHECK-LABEL: @test_vst4q_f32(
// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16		// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16		// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
▲ Show 20 Lines • Show All 407 Lines • ▼ Show 20 Lines
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8		// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i64 0, i64 3		// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i64 0, i64 3
// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8		// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>		// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>
// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>		// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>		// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x half>
// CHECK: call void @llvm.aarch64.neon.st4.v4i16.p0i8(<4 x i16> [[TMP11]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st4.v4f16.p0i8(<4 x half> [[TMP11]], <4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst4_f16(float16_t *a, float16x4x4_t b) {		void test_vst4_f16(float16_t *a, float16x4x4_t b) {
vst4_f16(a, b);		vst4_f16(a, b);
}		}

// CHECK-LABEL: @test_vst4_f32(		// CHECK-LABEL: @test_vst4_f32(
// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8		// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8		// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
▲ Show 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	int64x2x2_t test_vld1q_s64_x2(int64_t const *a) {
return vld1q_s64_x2(a);		return vld1q_s64_x2(a);
}		}

// CHECK-LABEL: @test_vld1q_f16_x2(		// CHECK-LABEL: @test_vld1q_f16_x2(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16
// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16		// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD1XN:%.]] = call { <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld1x2.v8i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD1XN:%.]] = call { <8 x half>, <8 x half> } @llvm.aarch64.neon.ld1x2.v8f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x i16>, <8 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16> } [[VLD1XN]], { <8 x i16>, <8 x i16> }* [[TMP3]]		// CHECK: store { <8 x half>, <8 x half> } [[VLD1XN]], { <8 x half>, <8 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 16, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16		// CHECK: [[TMP6:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x2_t [[TMP6]]		// CHECK: ret %struct.float16x8x2_t [[TMP6]]
float16x8x2_t test_vld1q_f16_x2(float16_t const *a) {		float16x8x2_t test_vld1q_f16_x2(float16_t const *a) {
return vld1q_f16_x2(a);		return vld1q_f16_x2(a);
}		}
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	int64x1x2_t test_vld1_s64_x2(int64_t const *a) {
return vld1_s64_x2(a);		return vld1_s64_x2(a);
}		}

// CHECK-LABEL: @test_vld1_f16_x2(		// CHECK-LABEL: @test_vld1_f16_x2(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8
// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8		// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD1XN:%.]] = call { <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld1x2.v4i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD1XN:%.]] = call { <4 x half>, <4 x half> } @llvm.aarch64.neon.ld1x2.v4f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x i16>, <4 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16> } [[VLD1XN]], { <4 x i16>, <4 x i16> }* [[TMP3]]		// CHECK: store { <4 x half>, <4 x half> } [[VLD1XN]], { <4 x half>, <4 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 16, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 16, i32 8, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8		// CHECK: [[TMP6:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x2_t [[TMP6]]		// CHECK: ret %struct.float16x4x2_t [[TMP6]]
float16x4x2_t test_vld1_f16_x2(float16_t const *a) {		float16x4x2_t test_vld1_f16_x2(float16_t const *a) {
return vld1_f16_x2(a);		return vld1_f16_x2(a);
}		}
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	int64x2x3_t test_vld1q_s64_x3(int64_t const *a) {
return vld1q_s64_x3(a);		return vld1q_s64_x3(a);
}		}

// CHECK-LABEL: @test_vld1q_f16_x3(		// CHECK-LABEL: @test_vld1q_f16_x3(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x3_t, align 16		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x3_t, align 16
// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16		// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD1XN:%.]] = call { <8 x i16>, <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld1x3.v8i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD1XN:%.]] = call { <8 x half>, <8 x half>, <8 x half> } @llvm.aarch64.neon.ld1x3.v8f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x i16>, <8 x i16>, <8 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x half>, <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16>, <8 x i16> } [[VLD1XN]], { <8 x i16>, <8 x i16>, <8 x i16> }* [[TMP3]]		// CHECK: store { <8 x half>, <8 x half>, <8 x half> } [[VLD1XN]], { <8 x half>, <8 x half>, <8 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x3_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x3_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 48, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 48, i32 16, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16		// CHECK: [[TMP6:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x3_t [[TMP6]]		// CHECK: ret %struct.float16x8x3_t [[TMP6]]
float16x8x3_t test_vld1q_f16_x3(float16_t const *a) {		float16x8x3_t test_vld1q_f16_x3(float16_t const *a) {
return vld1q_f16_x3(a);		return vld1q_f16_x3(a);
}		}
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	int64x1x3_t test_vld1_s64_x3(int64_t const *a) {
return vld1_s64_x3(a);		return vld1_s64_x3(a);
}		}

// CHECK-LABEL: @test_vld1_f16_x3(		// CHECK-LABEL: @test_vld1_f16_x3(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x3_t, align 8		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x3_t, align 8
// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8		// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD1XN:%.]] = call { <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld1x3.v4i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD1XN:%.]] = call { <4 x half>, <4 x half>, <4 x half> } @llvm.aarch64.neon.ld1x3.v4f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x i16>, <4 x i16>, <4 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x half>, <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16>, <4 x i16> } [[VLD1XN]], { <4 x i16>, <4 x i16>, <4 x i16> }* [[TMP3]]		// CHECK: store { <4 x half>, <4 x half>, <4 x half> } [[VLD1XN]], { <4 x half>, <4 x half>, <4 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x3_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x3_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 24, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 24, i32 8, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8		// CHECK: [[TMP6:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x3_t [[TMP6]]		// CHECK: ret %struct.float16x4x3_t [[TMP6]]
float16x4x3_t test_vld1_f16_x3(float16_t const *a) {		float16x4x3_t test_vld1_f16_x3(float16_t const *a) {
return vld1_f16_x3(a);		return vld1_f16_x3(a);
}		}
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	int64x2x4_t test_vld1q_s64_x4(int64_t const *a) {
return vld1q_s64_x4(a);		return vld1q_s64_x4(a);
}		}

// CHECK-LABEL: @test_vld1q_f16_x4(		// CHECK-LABEL: @test_vld1q_f16_x4(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x4_t, align 16		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x4_t, align 16
// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16		// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD1XN:%.]] = call { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld1x4.v8i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD1XN:%.]] = call { <8 x half>, <8 x half>, <8 x half>, <8 x half> } @llvm.aarch64.neon.ld1x4.v8f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x half>, <8 x half>, <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> } [[VLD1XN]], { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }* [[TMP3]]		// CHECK: store { <8 x half>, <8 x half>, <8 x half>, <8 x half> } [[VLD1XN]], { <8 x half>, <8 x half>, <8 x half>, <8 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x4_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x4_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 64, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 64, i32 16, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16		// CHECK: [[TMP6:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x4_t [[TMP6]]		// CHECK: ret %struct.float16x8x4_t [[TMP6]]
float16x8x4_t test_vld1q_f16_x4(float16_t const *a) {		float16x8x4_t test_vld1q_f16_x4(float16_t const *a) {
return vld1q_f16_x4(a);		return vld1q_f16_x4(a);
}		}
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	int64x1x4_t test_vld1_s64_x4(int64_t const *a) {
return vld1_s64_x4(a);		return vld1_s64_x4(a);
}		}

// CHECK-LABEL: @test_vld1_f16_x4(		// CHECK-LABEL: @test_vld1_f16_x4(
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x4_t, align 8		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x4_t, align 8
// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8		// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD1XN:%.]] = call { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld1x4.v4i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD1XN:%.]] = call { <4 x half>, <4 x half>, <4 x half>, <4 x half> } @llvm.aarch64.neon.ld1x4.v4f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x half>, <4 x half>, <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } [[VLD1XN]], { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }* [[TMP3]]		// CHECK: store { <4 x half>, <4 x half>, <4 x half>, <4 x half> } [[VLD1XN]], { <4 x half>, <4 x half>, <4 x half>, <4 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x4_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x4_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 8, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8		// CHECK: [[TMP6:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x4_t [[TMP6]]		// CHECK: ret %struct.float16x4x4_t [[TMP6]]
float16x4x4_t test_vld1_f16_x4(float16_t const *a) {		float16x4x4_t test_vld1_f16_x4(float16_t const *a) {
return vld1_f16_x4(a);		return vld1_f16_x4(a);
}		}
▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0		// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0
// CHECK: [[TMP3:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16		// CHECK: [[TMP3:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
// CHECK: [[TMP4:%.*]] = bitcast <8 x half> [[TMP3]] to <16 x i8>		// CHECK: [[TMP4:%.*]] = bitcast <8 x half> [[TMP3]] to <16 x i8>
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16		// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>
// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>		// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>		// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
// CHECK: [[TMP9:%.]] = bitcast i8 [[TMP2]] to i16*		// CHECK: [[TMP9:%.]] = bitcast i8 [[TMP2]] to half*
// CHECK: call void @llvm.aarch64.neon.st1x2.v8i16.p0i16(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i16* [[TMP9]])		// CHECK: call void @llvm.aarch64.neon.st1x2.v8f16.p0f16(<8 x half> [[TMP7]], <8 x half> [[TMP8]], half* [[TMP9]])
// CHECK: ret void		// CHECK: ret void
void test_vst1q_f16_x2(float16_t *a, float16x8x2_t b) {		void test_vst1q_f16_x2(float16_t *a, float16x8x2_t b) {
vst1q_f16_x2(a, b);		vst1q_f16_x2(a, b);
}		}

// CHECK-LABEL: @test_vst1q_f32_x2(		// CHECK-LABEL: @test_vst1q_f32_x2(
// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16		// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16		// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
▲ Show 20 Lines • Show All 326 Lines • ▼ Show 20 Lines
// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0		// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0
// CHECK: [[TMP3:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8		// CHECK: [[TMP3:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
// CHECK: [[TMP4:%.*]] = bitcast <4 x half> [[TMP3]] to <8 x i8>		// CHECK: [[TMP4:%.*]] = bitcast <4 x half> [[TMP3]] to <8 x i8>
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8		// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>
// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>		// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>		// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
// CHECK: [[TMP9:%.]] = bitcast i8 [[TMP2]] to i16*		// CHECK: [[TMP9:%.]] = bitcast i8 [[TMP2]] to half*
// CHECK: call void @llvm.aarch64.neon.st1x2.v4i16.p0i16(<4 x i16> [[TMP7]], <4 x i16> [[TMP8]], i16* [[TMP9]])		// CHECK: call void @llvm.aarch64.neon.st1x2.v4f16.p0f16(<4 x half> [[TMP7]], <4 x half> [[TMP8]], half* [[TMP9]])
// CHECK: ret void		// CHECK: ret void
void test_vst1_f16_x2(float16_t *a, float16x4x2_t b) {		void test_vst1_f16_x2(float16_t *a, float16x4x2_t b) {
vst1_f16_x2(a, b);		vst1_f16_x2(a, b);
}		}

// CHECK-LABEL: @test_vst1_f32_x2(		// CHECK-LABEL: @test_vst1_f32_x2(
// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8		// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8		// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16		// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16		// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>		// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>		// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
// CHECK: [[TMP12:%.]] = bitcast i8 [[TMP2]] to i16*		// CHECK: [[TMP12:%.]] = bitcast i8 [[TMP2]] to half*
// CHECK: call void @llvm.aarch64.neon.st1x3.v8i16.p0i16(<8 x i16> [[TMP9]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i16* [[TMP12]])		// CHECK: call void @llvm.aarch64.neon.st1x3.v8f16.p0f16(<8 x half> [[TMP9]], <8 x half> [[TMP10]], <8 x half> [[TMP11]], half* [[TMP12]])
// CHECK: ret void		// CHECK: ret void
void test_vst1q_f16_x3(float16_t *a, float16x8x3_t b) {		void test_vst1q_f16_x3(float16_t *a, float16x8x3_t b) {
vst1q_f16_x3(a, b);		vst1q_f16_x3(a, b);
}		}

// CHECK-LABEL: @test_vst1q_f32_x3(		// CHECK-LABEL: @test_vst1q_f32_x3(
// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16		// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16		// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
▲ Show 20 Lines • Show All 389 Lines • ▼ Show 20 Lines
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8		// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8		// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>		// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>		// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
// CHECK: [[TMP12:%.]] = bitcast i8 [[TMP2]] to i16*		// CHECK: [[TMP12:%.]] = bitcast i8 [[TMP2]] to half*
// CHECK: call void @llvm.aarch64.neon.st1x3.v4i16.p0i16(<4 x i16> [[TMP9]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], i16* [[TMP12]])		// CHECK: call void @llvm.aarch64.neon.st1x3.v4f16.p0f16(<4 x half> [[TMP9]], <4 x half> [[TMP10]], <4 x half> [[TMP11]], half* [[TMP12]])
// CHECK: ret void		// CHECK: ret void
void test_vst1_f16_x3(float16_t *a, float16x4x3_t b) {		void test_vst1_f16_x3(float16_t *a, float16x4x3_t b) {
vst1_f16_x3(a, b);		vst1_f16_x3(a, b);
}		}

// CHECK-LABEL: @test_vst1_f32_x3(		// CHECK-LABEL: @test_vst1_f32_x3(
// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8		// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8		// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
▲ Show 20 Lines • Show All 429 Lines • ▼ Show 20 Lines
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16		// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i64 0, i64 3		// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i64 0, i64 3
// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16		// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>		// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>
// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>		// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>		// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x half>
// CHECK: [[TMP15:%.]] = bitcast i8 [[TMP2]] to i16*		// CHECK: [[TMP15:%.]] = bitcast i8 [[TMP2]] to half*
// CHECK: call void @llvm.aarch64.neon.st1x4.v8i16.p0i16(<8 x i16> [[TMP11]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], i16* [[TMP15]])		// CHECK: call void @llvm.aarch64.neon.st1x4.v8f16.p0f16(<8 x half> [[TMP11]], <8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], half* [[TMP15]])
// CHECK: ret void		// CHECK: ret void
void test_vst1q_f16_x4(float16_t *a, float16x8x4_t b) {		void test_vst1q_f16_x4(float16_t *a, float16x8x4_t b) {
vst1q_f16_x4(a, b);		vst1q_f16_x4(a, b);
}		}

// CHECK-LABEL: @test_vst1q_f32_x4(		// CHECK-LABEL: @test_vst1q_f32_x4(
// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16		// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16		// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
▲ Show 20 Lines • Show All 452 Lines • ▼ Show 20 Lines
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8		// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i64 0, i64 3		// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i64 0, i64 3
// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8		// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>		// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>
// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>		// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>		// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x half>
// CHECK: [[TMP15:%.]] = bitcast i8 [[TMP2]] to i16*		// CHECK: [[TMP15:%.]] = bitcast i8 [[TMP2]] to half*
// CHECK: call void @llvm.aarch64.neon.st1x4.v4i16.p0i16(<4 x i16> [[TMP11]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], i16* [[TMP15]])		// CHECK: call void @llvm.aarch64.neon.st1x4.v4f16.p0f16(<4 x half> [[TMP11]], <4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], half* [[TMP15]])
// CHECK: ret void		// CHECK: ret void
void test_vst1_f16_x4(float16_t *a, float16x4x4_t b) {		void test_vst1_f16_x4(float16_t *a, float16x4x4_t b) {
vst1_f16_x4(a, b);		vst1_f16_x4(a, b);
}		}

// CHECK-LABEL: @test_vst1_f32_x4(		// CHECK-LABEL: @test_vst1_f32_x4(
// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8		// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8		// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
▲ Show 20 Lines • Show All 4,707 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-ldst-one.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
// CHECK: [[LANE:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP3]], <2 x i32> zeroinitializer		// CHECK: [[LANE:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP3]], <2 x i32> zeroinitializer
// CHECK: ret <2 x i64> [[LANE]]		// CHECK: ret <2 x i64> [[LANE]]
int64x2_t test_vld1q_dup_s64(int64_t *a) {		int64x2_t test_vld1q_dup_s64(int64_t *a) {
return vld1q_dup_s64(a);		return vld1q_dup_s64(a);
}		}

// CHECK-LABEL: define <8 x half> @test_vld1q_dup_f16(half* %a) #0 {		// CHECK-LABEL: define <8 x half> @test_vld1q_dup_f16(half* %a) #0 {
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*		// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to half*
// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]]		// CHECK: [[TMP2:%.]] = load half, half [[TMP1]]
// CHECK: [[TMP3:%.*]] = insertelement <8 x i16> undef, i16 [[TMP2]], i32 0		// CHECK: [[TMP3:%.*]] = insertelement <8 x half> undef, half [[TMP2]], i32 0
// CHECK: [[LANE:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP3]], <8 x i32> zeroinitializer		// CHECK: [[LANE:%.*]] = shufflevector <8 x half> [[TMP3]], <8 x half> [[TMP3]], <8 x i32> zeroinitializer
// CHECK: [[TMP4:%.*]] = bitcast <8 x i16> [[LANE]] to <8 x half>		// CHECK: ret <8 x half> [[LANE]]
// CHECK: ret <8 x half> [[TMP4]]
float16x8_t test_vld1q_dup_f16(float16_t *a) {		float16x8_t test_vld1q_dup_f16(float16_t *a) {
return vld1q_dup_f16(a);		return vld1q_dup_f16(a);
}		}

// CHECK-LABEL: define <4 x float> @test_vld1q_dup_f32(float* %a) #0 {		// CHECK-LABEL: define <4 x float> @test_vld1q_dup_f32(float* %a) #0 {
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*		// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*
// CHECK: [[TMP2:%.]] = load float, float [[TMP1]]		// CHECK: [[TMP2:%.]] = load float, float [[TMP1]]
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer		// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer
// CHECK: ret <1 x i64> [[LANE]]		// CHECK: ret <1 x i64> [[LANE]]
int64x1_t test_vld1_dup_s64(int64_t *a) {		int64x1_t test_vld1_dup_s64(int64_t *a) {
return vld1_dup_s64(a);		return vld1_dup_s64(a);
}		}

// CHECK-LABEL: define <4 x half> @test_vld1_dup_f16(half* %a) #0 {		// CHECK-LABEL: define <4 x half> @test_vld1_dup_f16(half* %a) #0 {
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*		// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to half*
// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]]		// CHECK: [[TMP2:%.]] = load half, half [[TMP1]]
// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[TMP2]], i32 0		// CHECK: [[TMP3:%.*]] = insertelement <4 x half> undef, half [[TMP2]], i32 0
// CHECK: [[LANE:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> [[TMP3]], <4 x i32> zeroinitializer		// CHECK: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> zeroinitializer
// CHECK: [[TMP4:%.*]] = bitcast <4 x i16> [[LANE]] to <4 x half>		// CHECK: ret <4 x half> [[LANE]]
// CHECK: ret <4 x half> [[TMP4]]
float16x4_t test_vld1_dup_f16(float16_t *a) {		float16x4_t test_vld1_dup_f16(float16_t *a) {
return vld1_dup_f16(a);		return vld1_dup_f16(a);
}		}

// CHECK-LABEL: define <2 x float> @test_vld1_dup_f32(float* %a) #0 {		// CHECK-LABEL: define <2 x float> @test_vld1_dup_f32(float* %a) #0 {
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*		// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*
// CHECK: [[TMP2:%.]] = load float, float [[TMP1]]		// CHECK: [[TMP2:%.]] = load float, float [[TMP1]]
▲ Show 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	int64x2x2_t test_vld2q_dup_s64(int64_t *a) {
return vld2q_dup_s64(a);		return vld2q_dup_s64(a);
}		}

// CHECK-LABEL: define %struct.float16x8x2_t @test_vld2q_dup_f16(half* %a) #0 {		// CHECK-LABEL: define %struct.float16x8x2_t @test_vld2q_dup_f16(half* %a) #0 {
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16
// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16		// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD2:%.]] = call { <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld2r.v8i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD2:%.]] = call { <8 x half>, <8 x half> } @llvm.aarch64.neon.ld2r.v8f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x i16>, <8 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16> } [[VLD2]], { <8 x i16>, <8 x i16> }* [[TMP3]]		// CHECK: store { <8 x half>, <8 x half> } [[VLD2]], { <8 x half>, <8 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 16, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16		// CHECK: [[TMP6:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x2_t [[TMP6]]		// CHECK: ret %struct.float16x8x2_t [[TMP6]]
float16x8x2_t test_vld2q_dup_f16(float16_t *a) {		float16x8x2_t test_vld2q_dup_f16(float16_t *a) {
return vld2q_dup_f16(a);		return vld2q_dup_f16(a);
}		}
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	int64x1x2_t test_vld2_dup_s64(int64_t *a) {
return vld2_dup_s64(a);		return vld2_dup_s64(a);
}		}

// CHECK-LABEL: define %struct.float16x4x2_t @test_vld2_dup_f16(half* %a) #0 {		// CHECK-LABEL: define %struct.float16x4x2_t @test_vld2_dup_f16(half* %a) #0 {
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8
// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8		// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD2:%.]] = call { <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld2r.v4i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD2:%.]] = call { <4 x half>, <4 x half> } @llvm.aarch64.neon.ld2r.v4f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x i16>, <4 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16> } [[VLD2]], { <4 x i16>, <4 x i16> }* [[TMP3]]		// CHECK: store { <4 x half>, <4 x half> } [[VLD2]], { <4 x half>, <4 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 16, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 16, i32 8, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8		// CHECK: [[TMP6:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x2_t [[TMP6]]		// CHECK: ret %struct.float16x4x2_t [[TMP6]]
float16x4x2_t test_vld2_dup_f16(float16_t *a) {		float16x4x2_t test_vld2_dup_f16(float16_t *a) {
return vld2_dup_f16(a);		return vld2_dup_f16(a);
}		}
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	int64x2x3_t test_vld3q_dup_s64(int64_t *a) {
// [{{x[0-9]+\|sp}}]		// [{{x[0-9]+\|sp}}]
}		}

// CHECK-LABEL: define %struct.float16x8x3_t @test_vld3q_dup_f16(half* %a) #0 {		// CHECK-LABEL: define %struct.float16x8x3_t @test_vld3q_dup_f16(half* %a) #0 {
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x3_t, align 16		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x3_t, align 16
// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16		// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD3:%.]] = call { <8 x i16>, <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld3r.v8i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD3:%.]] = call { <8 x half>, <8 x half>, <8 x half> } @llvm.aarch64.neon.ld3r.v8f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x i16>, <8 x i16>, <8 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x half>, <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16>, <8 x i16> } [[VLD3]], { <8 x i16>, <8 x i16>, <8 x i16> }* [[TMP3]]		// CHECK: store { <8 x half>, <8 x half>, <8 x half> } [[VLD3]], { <8 x half>, <8 x half>, <8 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x3_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x3_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 48, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 48, i32 16, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16		// CHECK: [[TMP6:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x3_t [[TMP6]]		// CHECK: ret %struct.float16x8x3_t [[TMP6]]
float16x8x3_t test_vld3q_dup_f16(float16_t *a) {		float16x8x3_t test_vld3q_dup_f16(float16_t *a) {
return vld3q_dup_f16(a);		return vld3q_dup_f16(a);
// [{{x[0-9]+\|sp}}]		// [{{x[0-9]+\|sp}}]
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	int64x1x3_t test_vld3_dup_s64(int64_t *a) {
// [{{x[0-9]+\|sp}}]		// [{{x[0-9]+\|sp}}]
}		}

// CHECK-LABEL: define %struct.float16x4x3_t @test_vld3_dup_f16(half* %a) #0 {		// CHECK-LABEL: define %struct.float16x4x3_t @test_vld3_dup_f16(half* %a) #0 {
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x3_t, align 8		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x3_t, align 8
// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8		// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD3:%.]] = call { <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld3r.v4i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD3:%.]] = call { <4 x half>, <4 x half>, <4 x half> } @llvm.aarch64.neon.ld3r.v4f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x i16>, <4 x i16>, <4 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x half>, <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16>, <4 x i16> } [[VLD3]], { <4 x i16>, <4 x i16>, <4 x i16> }* [[TMP3]]		// CHECK: store { <4 x half>, <4 x half>, <4 x half> } [[VLD3]], { <4 x half>, <4 x half>, <4 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x3_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x3_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 24, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 24, i32 8, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8		// CHECK: [[TMP6:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x3_t [[TMP6]]		// CHECK: ret %struct.float16x4x3_t [[TMP6]]
float16x4x3_t test_vld3_dup_f16(float16_t *a) {		float16x4x3_t test_vld3_dup_f16(float16_t *a) {
return vld3_dup_f16(a);		return vld3_dup_f16(a);
// [{{x[0-9]+\|sp}}]		// [{{x[0-9]+\|sp}}]
▲ Show 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	int64x2x4_t test_vld4q_dup_s64(int64_t *a) {
return vld4q_dup_s64(a);		return vld4q_dup_s64(a);
}		}

// CHECK-LABEL: define %struct.float16x8x4_t @test_vld4q_dup_f16(half* %a) #0 {		// CHECK-LABEL: define %struct.float16x8x4_t @test_vld4q_dup_f16(half* %a) #0 {
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x4_t, align 16		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x4_t, align 16
// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16		// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD4:%.]] = call { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld4r.v8i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD4:%.]] = call { <8 x half>, <8 x half>, <8 x half>, <8 x half> } @llvm.aarch64.neon.ld4r.v8f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <8 x half>, <8 x half>, <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> } [[VLD4]], { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }* [[TMP3]]		// CHECK: store { <8 x half>, <8 x half>, <8 x half>, <8 x half> } [[VLD4]], { <8 x half>, <8 x half>, <8 x half>, <8 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x4_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x8x4_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 64, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 64, i32 16, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16		// CHECK: [[TMP6:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x4_t [[TMP6]]		// CHECK: ret %struct.float16x8x4_t [[TMP6]]
float16x8x4_t test_vld4q_dup_f16(float16_t *a) {		float16x8x4_t test_vld4q_dup_f16(float16_t *a) {
return vld4q_dup_f16(a);		return vld4q_dup_f16(a);
}		}
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	int64x1x4_t test_vld4_dup_s64(int64_t *a) {
return vld4_dup_s64(a);		return vld4_dup_s64(a);
}		}

// CHECK-LABEL: define %struct.float16x4x4_t @test_vld4_dup_f16(half* %a) #0 {		// CHECK-LABEL: define %struct.float16x4x4_t @test_vld4_dup_f16(half* %a) #0 {
// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x4_t, align 8		// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x4_t, align 8
// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8		// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8
// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*		// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
// CHECK: [[TMP1:%.]] = bitcast half %a to i8*		// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to i16*		// CHECK: [[TMP2:%.]] = bitcast i8 [[TMP1]] to half*
// CHECK: [[VLD4:%.]] = call { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld4r.v4i16.p0i16(i16 [[TMP2]])		// CHECK: [[VLD4:%.]] = call { <4 x half>, <4 x half>, <4 x half>, <4 x half> } @llvm.aarch64.neon.ld4r.v4f16.p0f16(half [[TMP2]])
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to { <4 x half>, <4 x half>, <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } [[VLD4]], { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }* [[TMP3]]		// CHECK: store { <4 x half>, <4 x half>, <4 x half>, <4 x half> } [[VLD4]], { <4 x half>, <4 x half>, <4 x half>, <4 x half> }* [[TMP3]]
// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x4_t [[RETVAL]] to i8*		// CHECK: [[TMP4:%.]] = bitcast %struct.float16x4x4_t [[RETVAL]] to i8*
// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*		// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP4]], i8* [[TMP5]], i64 32, i32 8, i1 false)
// CHECK: [[TMP6:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8		// CHECK: [[TMP6:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x4_t [[TMP6]]		// CHECK: ret %struct.float16x4x4_t [[TMP6]]
float16x4x4_t test_vld4_dup_f16(float16_t *a) {		float16x4x4_t test_vld4_dup_f16(float16_t *a) {
return vld4_dup_f16(a);		return vld4_dup_f16(a);
}		}
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
// CHECK: ret <2 x i64> [[VLD1_LANE]]		// CHECK: ret <2 x i64> [[VLD1_LANE]]
int64x2_t test_vld1q_lane_s64(int64_t *a, int64x2_t b) {		int64x2_t test_vld1q_lane_s64(int64_t *a, int64x2_t b) {
return vld1q_lane_s64(a, b, 1);		return vld1q_lane_s64(a, b, 1);
}		}

// CHECK-LABEL: define <8 x half> @test_vld1q_lane_f16(half* %a, <8 x half> %b) #0 {		// CHECK-LABEL: define <8 x half> @test_vld1q_lane_f16(half* %a, <8 x half> %b) #0 {
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>		// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to half*
// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]]		// CHECK: [[TMP4:%.]] = load half, half [[TMP3]]
// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x i16> [[TMP2]], i16 [[TMP4]], i32 7		// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x half> [[TMP2]], half [[TMP4]], i32 7
// CHECK: [[TMP5:%.*]] = bitcast <8 x i16> [[VLD1_LANE]] to <8 x half>		// CHECK: ret <8 x half> [[VLD1_LANE]]
// CHECK: ret <8 x half> [[TMP5]]
float16x8_t test_vld1q_lane_f16(float16_t *a, float16x8_t b) {		float16x8_t test_vld1q_lane_f16(float16_t *a, float16x8_t b) {
return vld1q_lane_f16(a, b, 7);		return vld1q_lane_f16(a, b, 7);
}		}

// CHECK-LABEL: define <4 x float> @test_vld1q_lane_f32(float* %a, <4 x float> %b) #0 {		// CHECK-LABEL: define <4 x float> @test_vld1q_lane_f32(float* %a, <4 x float> %b) #0 {
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>		// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
// CHECK: ret <1 x i64> [[VLD1_LANE]]		// CHECK: ret <1 x i64> [[VLD1_LANE]]
int64x1_t test_vld1_lane_s64(int64_t *a, int64x1_t b) {		int64x1_t test_vld1_lane_s64(int64_t *a, int64x1_t b) {
return vld1_lane_s64(a, b, 0);		return vld1_lane_s64(a, b, 0);
}		}

// CHECK-LABEL: define <4 x half> @test_vld1_lane_f16(half* %a, <4 x half> %b) #0 {		// CHECK-LABEL: define <4 x half> @test_vld1_lane_f16(half* %a, <4 x half> %b) #0 {
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>		// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*		// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to half*
// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]]		// CHECK: [[TMP4:%.]] = load half, half [[TMP3]]
// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[TMP4]], i32 3		// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x half> [[TMP2]], half [[TMP4]], i32 3
// CHECK: [[TMP5:%.*]] = bitcast <4 x i16> [[VLD1_LANE]] to <4 x half>		// CHECK: ret <4 x half> [[VLD1_LANE]]
// CHECK: ret <4 x half> [[TMP5]]
float16x4_t test_vld1_lane_f16(float16_t *a, float16x4_t b) {		float16x4_t test_vld1_lane_f16(float16_t *a, float16x4_t b) {
return vld1_lane_f16(a, b, 3);		return vld1_lane_f16(a, b, 3);
}		}

// CHECK-LABEL: define <2 x float> @test_vld1_lane_f32(float* %a, <2 x float> %b) #0 {		// CHECK-LABEL: define <2 x float> @test_vld1_lane_f32(float* %a, <2 x float> %b) #0 {
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>		// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
▲ Show 20 Lines • Show All 419 Lines • ▼ Show 20 Lines
// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0		// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0
// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16		// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>		// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16		// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>		// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>		// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>		// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
// CHECK: [[VLD2_LANE:%.]] = call { <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld2lane.v8i16.p0i8(<8 x i16> [[TMP8]], <8 x i16> [[TMP9]], i64 7, i8 [[TMP3]])		// CHECK: [[VLD2_LANE:%.]] = call { <8 x half>, <8 x half> } @llvm.aarch64.neon.ld2lane.v8f16.p0i8(<8 x half> [[TMP8]], <8 x half> [[TMP9]], i64 7, i8 [[TMP3]])
// CHECK: [[TMP10:%.]] = bitcast i8 [[TMP2]] to { <8 x i16>, <8 x i16> }*		// CHECK: [[TMP10:%.]] = bitcast i8 [[TMP2]] to { <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16> } [[VLD2_LANE]], { <8 x i16>, <8 x i16> }* [[TMP10]]		// CHECK: store { <8 x half>, <8 x half> } [[VLD2_LANE]], { <8 x half>, <8 x half> }* [[TMP10]]
// CHECK: [[TMP11:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*		// CHECK: [[TMP11:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*
// CHECK: [[TMP12:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*		// CHECK: [[TMP12:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP11]], i8* [[TMP12]], i64 32, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP11]], i8* [[TMP12]], i64 32, i32 16, i1 false)
// CHECK: [[TMP13:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16		// CHECK: [[TMP13:%.]] = load %struct.float16x8x2_t, %struct.float16x8x2_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x2_t [[TMP13]]		// CHECK: ret %struct.float16x8x2_t [[TMP13]]
float16x8x2_t test_vld2q_lane_f16(float16_t *a, float16x8x2_t b) {		float16x8x2_t test_vld2q_lane_f16(float16_t *a, float16x8x2_t b) {
return vld2q_lane_f16(a, b, 7);		return vld2q_lane_f16(a, b, 7);
}		}
▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines
// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0		// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0
// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8		// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>		// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8		// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>		// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>		// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>		// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
// CHECK: [[VLD2_LANE:%.]] = call { <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld2lane.v4i16.p0i8(<4 x i16> [[TMP8]], <4 x i16> [[TMP9]], i64 3, i8 [[TMP3]])		// CHECK: [[VLD2_LANE:%.]] = call { <4 x half>, <4 x half> } @llvm.aarch64.neon.ld2lane.v4f16.p0i8(<4 x half> [[TMP8]], <4 x half> [[TMP9]], i64 3, i8 [[TMP3]])
// CHECK: [[TMP10:%.]] = bitcast i8 [[TMP2]] to { <4 x i16>, <4 x i16> }*		// CHECK: [[TMP10:%.]] = bitcast i8 [[TMP2]] to { <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16> } [[VLD2_LANE]], { <4 x i16>, <4 x i16> }* [[TMP10]]		// CHECK: store { <4 x half>, <4 x half> } [[VLD2_LANE]], { <4 x half>, <4 x half> }* [[TMP10]]
// CHECK: [[TMP11:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*		// CHECK: [[TMP11:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*
// CHECK: [[TMP12:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*		// CHECK: [[TMP12:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP11]], i8* [[TMP12]], i64 16, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP11]], i8* [[TMP12]], i64 16, i32 8, i1 false)
// CHECK: [[TMP13:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8		// CHECK: [[TMP13:%.]] = load %struct.float16x4x2_t, %struct.float16x4x2_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x2_t [[TMP13]]		// CHECK: ret %struct.float16x4x2_t [[TMP13]]
float16x4x2_t test_vld2_lane_f16(float16_t *a, float16x4x2_t b) {		float16x4x2_t test_vld2_lane_f16(float16_t *a, float16x4x2_t b) {
return vld2_lane_f16(a, b, 3);		return vld2_lane_f16(a, b, 3);
}		}
▲ Show 20 Lines • Show All 416 Lines • ▼ Show 20 Lines
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16		// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>		// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16		// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>		// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>		// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
// CHECK: [[VLD3_LANE:%.]] = call { <8 x i16>, <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld3lane.v8i16.p0i8(<8 x i16> [[TMP10]], <8 x i16> [[TMP11]], <8 x i16> [[TMP12]], i64 7, i8 [[TMP3]])		// CHECK: [[VLD3_LANE:%.]] = call { <8 x half>, <8 x half>, <8 x half> } @llvm.aarch64.neon.ld3lane.v8f16.p0i8(<8 x half> [[TMP10]], <8 x half> [[TMP11]], <8 x half> [[TMP12]], i64 7, i8 [[TMP3]])
// CHECK: [[TMP13:%.]] = bitcast i8 [[TMP2]] to { <8 x i16>, <8 x i16>, <8 x i16> }*		// CHECK: [[TMP13:%.]] = bitcast i8 [[TMP2]] to { <8 x half>, <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16>, <8 x i16> } [[VLD3_LANE]], { <8 x i16>, <8 x i16>, <8 x i16> }* [[TMP13]]		// CHECK: store { <8 x half>, <8 x half>, <8 x half> } [[VLD3_LANE]], { <8 x half>, <8 x half>, <8 x half> }* [[TMP13]]
// CHECK: [[TMP14:%.]] = bitcast %struct.float16x8x3_t [[RETVAL]] to i8*		// CHECK: [[TMP14:%.]] = bitcast %struct.float16x8x3_t [[RETVAL]] to i8*
// CHECK: [[TMP15:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*		// CHECK: [[TMP15:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP14]], i8* [[TMP15]], i64 48, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP14]], i8* [[TMP15]], i64 48, i32 16, i1 false)
// CHECK: [[TMP16:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16		// CHECK: [[TMP16:%.]] = load %struct.float16x8x3_t, %struct.float16x8x3_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x3_t [[TMP16]]		// CHECK: ret %struct.float16x8x3_t [[TMP16]]
float16x8x3_t test_vld3q_lane_f16(float16_t *a, float16x8x3_t b) {		float16x8x3_t test_vld3q_lane_f16(float16_t *a, float16x8x3_t b) {
return vld3q_lane_f16(a, b, 7);		return vld3q_lane_f16(a, b, 7);
}		}
▲ Show 20 Lines • Show All 503 Lines • ▼ Show 20 Lines
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8		// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>		// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8		// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>		// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>		// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
// CHECK: [[VLD3_LANE:%.]] = call { <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld3lane.v4i16.p0i8(<4 x i16> [[TMP10]], <4 x i16> [[TMP11]], <4 x i16> [[TMP12]], i64 3, i8 [[TMP3]])		// CHECK: [[VLD3_LANE:%.]] = call { <4 x half>, <4 x half>, <4 x half> } @llvm.aarch64.neon.ld3lane.v4f16.p0i8(<4 x half> [[TMP10]], <4 x half> [[TMP11]], <4 x half> [[TMP12]], i64 3, i8 [[TMP3]])
// CHECK: [[TMP13:%.]] = bitcast i8 [[TMP2]] to { <4 x i16>, <4 x i16>, <4 x i16> }*		// CHECK: [[TMP13:%.]] = bitcast i8 [[TMP2]] to { <4 x half>, <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16>, <4 x i16> } [[VLD3_LANE]], { <4 x i16>, <4 x i16>, <4 x i16> }* [[TMP13]]		// CHECK: store { <4 x half>, <4 x half>, <4 x half> } [[VLD3_LANE]], { <4 x half>, <4 x half>, <4 x half> }* [[TMP13]]
// CHECK: [[TMP14:%.]] = bitcast %struct.float16x4x3_t [[RETVAL]] to i8*		// CHECK: [[TMP14:%.]] = bitcast %struct.float16x4x3_t [[RETVAL]] to i8*
// CHECK: [[TMP15:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*		// CHECK: [[TMP15:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP14]], i8* [[TMP15]], i64 24, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP14]], i8* [[TMP15]], i64 24, i32 8, i1 false)
// CHECK: [[TMP16:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8		// CHECK: [[TMP16:%.]] = load %struct.float16x4x3_t, %struct.float16x4x3_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x3_t [[TMP16]]		// CHECK: ret %struct.float16x4x3_t [[TMP16]]
float16x4x3_t test_vld3_lane_f16(float16_t *a, float16x4x3_t b) {		float16x4x3_t test_vld3_lane_f16(float16_t *a, float16x4x3_t b) {
return vld3_lane_f16(a, b, 3);		return vld3_lane_f16(a, b, 3);
}		}
▲ Show 20 Lines • Show All 543 Lines • ▼ Show 20 Lines
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16		// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>		// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i64 0, i64 3		// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i64 0, i64 3
// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16		// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>		// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>
// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>		// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>		// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x i16>		// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x half>
// CHECK: [[VLD4_LANE:%.]] = call { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> } @llvm.aarch64.neon.ld4lane.v8i16.p0i8(<8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], <8 x i16> [[TMP15]], i64 7, i8 [[TMP3]])		// CHECK: [[VLD4_LANE:%.]] = call { <8 x half>, <8 x half>, <8 x half>, <8 x half> } @llvm.aarch64.neon.ld4lane.v8f16.p0i8(<8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], <8 x half> [[TMP15]], i64 7, i8 [[TMP3]])
// CHECK: [[TMP16:%.]] = bitcast i8 [[TMP2]] to { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }*		// CHECK: [[TMP16:%.]] = bitcast i8 [[TMP2]] to { <8 x half>, <8 x half>, <8 x half>, <8 x half> }*
// CHECK: store { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> } [[VLD4_LANE]], { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16> }* [[TMP16]]		// CHECK: store { <8 x half>, <8 x half>, <8 x half>, <8 x half> } [[VLD4_LANE]], { <8 x half>, <8 x half>, <8 x half>, <8 x half> }* [[TMP16]]
// CHECK: [[TMP17:%.]] = bitcast %struct.float16x8x4_t [[RETVAL]] to i8*		// CHECK: [[TMP17:%.]] = bitcast %struct.float16x8x4_t [[RETVAL]] to i8*
// CHECK: [[TMP18:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*		// CHECK: [[TMP18:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP17]], i8* [[TMP18]], i64 64, i32 16, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP17]], i8* [[TMP18]], i64 64, i32 16, i1 false)
// CHECK: [[TMP19:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16		// CHECK: [[TMP19:%.]] = load %struct.float16x8x4_t, %struct.float16x8x4_t [[RETVAL]], align 16
// CHECK: ret %struct.float16x8x4_t [[TMP19]]		// CHECK: ret %struct.float16x8x4_t [[TMP19]]
float16x8x4_t test_vld4q_lane_f16(float16_t *a, float16x8x4_t b) {		float16x8x4_t test_vld4q_lane_f16(float16_t *a, float16x8x4_t b) {
return vld4q_lane_f16(a, b, 7);		return vld4q_lane_f16(a, b, 7);
}		}
▲ Show 20 Lines • Show All 566 Lines • ▼ Show 20 Lines
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8		// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>		// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i64 0, i64 3		// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i64 0, i64 3
// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8		// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>		// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>
// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>		// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>		// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x i16>		// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x half>
// CHECK: [[VLD4_LANE:%.]] = call { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld4lane.v4i16.p0i8(<4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], <4 x i16> [[TMP15]], i64 3, i8 [[TMP3]])		// CHECK: [[VLD4_LANE:%.]] = call { <4 x half>, <4 x half>, <4 x half>, <4 x half> } @llvm.aarch64.neon.ld4lane.v4f16.p0i8(<4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], <4 x half> [[TMP15]], i64 3, i8 [[TMP3]])
// CHECK: [[TMP16:%.]] = bitcast i8 [[TMP2]] to { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }*		// CHECK: [[TMP16:%.]] = bitcast i8 [[TMP2]] to { <4 x half>, <4 x half>, <4 x half>, <4 x half> }*
// CHECK: store { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } [[VLD4_LANE]], { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> }* [[TMP16]]		// CHECK: store { <4 x half>, <4 x half>, <4 x half>, <4 x half> } [[VLD4_LANE]], { <4 x half>, <4 x half>, <4 x half>, <4 x half> }* [[TMP16]]
// CHECK: [[TMP17:%.]] = bitcast %struct.float16x4x4_t [[RETVAL]] to i8*		// CHECK: [[TMP17:%.]] = bitcast %struct.float16x4x4_t [[RETVAL]] to i8*
// CHECK: [[TMP18:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*		// CHECK: [[TMP18:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP17]], i8* [[TMP18]], i64 32, i32 8, i1 false)		// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP17]], i8* [[TMP18]], i64 32, i32 8, i1 false)
// CHECK: [[TMP19:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8		// CHECK: [[TMP19:%.]] = load %struct.float16x4x4_t, %struct.float16x4x4_t [[RETVAL]], align 8
// CHECK: ret %struct.float16x4x4_t [[TMP19]]		// CHECK: ret %struct.float16x4x4_t [[TMP19]]
float16x4x4_t test_vld4_lane_f16(float16_t *a, float16x4x4_t b) {		float16x4x4_t test_vld4_lane_f16(float16_t *a, float16x4x4_t b) {
return vld4_lane_f16(a, b, 3);		return vld4_lane_f16(a, b, 3);
}		}
▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
// CHECK: ret void		// CHECK: ret void
void test_vst1q_lane_s64(int64_t *a, int64x2_t b) {		void test_vst1q_lane_s64(int64_t *a, int64x2_t b) {
vst1q_lane_s64(a, b, 1);		vst1q_lane_s64(a, b, 1);
}		}

// CHECK-LABEL: define void @test_vst1q_lane_f16(half* %a, <8 x half> %b) #0 {		// CHECK-LABEL: define void @test_vst1q_lane_f16(half* %a, <8 x half> %b) #0 {
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>		// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
// CHECK: [[TMP3:%.*]] = extractelement <8 x i16> [[TMP2]], i32 7		// CHECK: [[TMP3:%.*]] = extractelement <8 x half> [[TMP2]], i32 7
// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*		// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to half*
// CHECK: store i16 [[TMP3]], i16* [[TMP4]]		// CHECK: store half [[TMP3]], half* [[TMP4]]
// CHECK: ret void		// CHECK: ret void
void test_vst1q_lane_f16(float16_t *a, float16x8_t b) {		void test_vst1q_lane_f16(float16_t *a, float16x8_t b) {
vst1q_lane_f16(a, b, 7);		vst1q_lane_f16(a, b, 7);
}		}

// CHECK-LABEL: define void @test_vst1q_lane_f32(float* %a, <4 x float> %b) #0 {		// CHECK-LABEL: define void @test_vst1q_lane_f32(float* %a, <4 x float> %b) #0 {
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
// CHECK: ret void		// CHECK: ret void
void test_vst1_lane_s64(int64_t *a, int64x1_t b) {		void test_vst1_lane_s64(int64_t *a, int64x1_t b) {
vst1_lane_s64(a, b, 0);		vst1_lane_s64(a, b, 0);
}		}

// CHECK-LABEL: define void @test_vst1_lane_f16(half* %a, <4 x half> %b) #0 {		// CHECK-LABEL: define void @test_vst1_lane_f16(half* %a, <4 x half> %b) #0 {
// CHECK: [[TMP0:%.]] = bitcast half %a to i8*		// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>		// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
// CHECK: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP2]], i32 3		// CHECK: [[TMP3:%.*]] = extractelement <4 x half> [[TMP2]], i32 3
// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*		// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to half*
// CHECK: store i16 [[TMP3]], i16* [[TMP4]]		// CHECK: store half [[TMP3]], half* [[TMP4]]
// CHECK: ret void		// CHECK: ret void
void test_vst1_lane_f16(float16_t *a, float16x4_t b) {		void test_vst1_lane_f16(float16_t *a, float16x4_t b) {
vst1_lane_f16(a, b, 3);		vst1_lane_f16(a, b, 3);
}		}

// CHECK-LABEL: define void @test_vst1_lane_f32(float* %a, <2 x float> %b) #0 {		// CHECK-LABEL: define void @test_vst1_lane_f32(float* %a, <2 x float> %b) #0 {
// CHECK: [[TMP0:%.]] = bitcast float %a to i8*		// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>		// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines
// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0		// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i64 0, i64 0
// CHECK: [[TMP3:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16		// CHECK: [[TMP3:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
// CHECK: [[TMP4:%.*]] = bitcast <8 x half> [[TMP3]] to <16 x i8>		// CHECK: [[TMP4:%.*]] = bitcast <8 x half> [[TMP3]] to <16 x i8>
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16		// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>
// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>		// CHECK: [[TMP7:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>		// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
// CHECK: call void @llvm.aarch64.neon.st2lane.v8i16.p0i8(<8 x i16> [[TMP7]], <8 x i16> [[TMP8]], i64 7, i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st2lane.v8f16.p0i8(<8 x half> [[TMP7]], <8 x half> [[TMP8]], i64 7, i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst2q_lane_f16(float16_t *a, float16x8x2_t b) {		void test_vst2q_lane_f16(float16_t *a, float16x8x2_t b) {
vst2q_lane_f16(a, b, 7);		vst2q_lane_f16(a, b, 7);
}		}

// CHECK-LABEL: define void @test_vst2q_lane_f32(float* %a, [2 x <4 x float>] %b.coerce) #0 {		// CHECK-LABEL: define void @test_vst2q_lane_f32(float* %a, [2 x <4 x float>] %b.coerce) #0 {
// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16		// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16		// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
▲ Show 20 Lines • Show All 316 Lines • ▼ Show 20 Lines
// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0		// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i64 0, i64 0
// CHECK: [[TMP3:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8		// CHECK: [[TMP3:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
// CHECK: [[TMP4:%.*]] = bitcast <4 x half> [[TMP3]] to <8 x i8>		// CHECK: [[TMP4:%.*]] = bitcast <4 x half> [[TMP3]] to <8 x i8>
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8		// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>
// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>		// CHECK: [[TMP7:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>		// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
// CHECK: call void @llvm.aarch64.neon.st2lane.v4i16.p0i8(<4 x i16> [[TMP7]], <4 x i16> [[TMP8]], i64 3, i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st2lane.v4f16.p0i8(<4 x half> [[TMP7]], <4 x half> [[TMP8]], i64 3, i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst2_lane_f16(float16_t *a, float16x4x2_t b) {		void test_vst2_lane_f16(float16_t *a, float16x4x2_t b) {
vst2_lane_f16(a, b, 3);		vst2_lane_f16(a, b, 3);
}		}

// CHECK-LABEL: define void @test_vst2_lane_f32(float* %a, [2 x <2 x float>] %b.coerce) #0 {		// CHECK-LABEL: define void @test_vst2_lane_f32(float* %a, [2 x <2 x float>] %b.coerce) #0 {
// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8		// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8		// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
▲ Show 20 Lines • Show All 356 Lines • ▼ Show 20 Lines
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16		// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16		// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>		// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>		// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
// CHECK: call void @llvm.aarch64.neon.st3lane.v8i16.p0i8(<8 x i16> [[TMP9]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], i64 7, i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st3lane.v8f16.p0i8(<8 x half> [[TMP9]], <8 x half> [[TMP10]], <8 x half> [[TMP11]], i64 7, i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst3q_lane_f16(float16_t *a, float16x8x3_t b) {		void test_vst3q_lane_f16(float16_t *a, float16x8x3_t b) {
vst3q_lane_f16(a, b, 7);		vst3q_lane_f16(a, b, 7);
}		}

// CHECK-LABEL: define void @test_vst3q_lane_f32(float* %a, [3 x <4 x float>] %b.coerce) #0 {		// CHECK-LABEL: define void @test_vst3q_lane_f32(float* %a, [3 x <4 x float>] %b.coerce) #0 {
// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16		// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16		// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
▲ Show 20 Lines • Show All 379 Lines • ▼ Show 20 Lines
// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i64 0, i64 1		// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i64 0, i64 1
// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8		// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>		// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8		// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>		// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>		// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
// CHECK: call void @llvm.aarch64.neon.st3lane.v4i16.p0i8(<4 x i16> [[TMP9]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], i64 3, i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st3lane.v4f16.p0i8(<4 x half> [[TMP9]], <4 x half> [[TMP10]], <4 x half> [[TMP11]], i64 3, i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst3_lane_f16(float16_t *a, float16x4x3_t b) {		void test_vst3_lane_f16(float16_t *a, float16x4x3_t b) {
vst3_lane_f16(a, b, 3);		vst3_lane_f16(a, b, 3);
}		}

// CHECK-LABEL: define void @test_vst3_lane_f32(float* %a, [3 x <2 x float>] %b.coerce) #0 {		// CHECK-LABEL: define void @test_vst3_lane_f32(float* %a, [3 x <2 x float>] %b.coerce) #0 {
// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8		// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8		// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
▲ Show 20 Lines • Show All 419 Lines • ▼ Show 20 Lines
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16		// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i64 0, i64 3		// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i64 0, i64 3
// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16		// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>		// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>
// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP4]] to <8 x half>
// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>		// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>		// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x half>
// CHECK: call void @llvm.aarch64.neon.st4lane.v8i16.p0i8(<8 x i16> [[TMP11]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], i64 7, i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st4lane.v8f16.p0i8(<8 x half> [[TMP11]], <8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], i64 7, i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst4q_lane_f16(float16_t *a, float16x8x4_t b) {		void test_vst4q_lane_f16(float16_t *a, float16x8x4_t b) {
vst4q_lane_f16(a, b, 7);		vst4q_lane_f16(a, b, 7);
}		}

// CHECK-LABEL: define void @test_vst4q_lane_f32(float* %a, [4 x <4 x float>] %b.coerce) #0 {		// CHECK-LABEL: define void @test_vst4q_lane_f32(float* %a, [4 x <4 x float>] %b.coerce) #0 {
// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16		// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16		// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
▲ Show 20 Lines • Show All 442 Lines • ▼ Show 20 Lines
// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i64 0, i64 2		// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i64 0, i64 2
// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8		// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>		// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0		// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i64 0, i64 3		// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i64 0, i64 3
// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8		// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>		// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>
// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x i16>		// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP4]] to <4 x half>
// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>		// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>		// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>		// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x half>
// CHECK: call void @llvm.aarch64.neon.st4lane.v4i16.p0i8(<4 x i16> [[TMP11]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], i64 3, i8* [[TMP2]])		// CHECK: call void @llvm.aarch64.neon.st4lane.v4f16.p0i8(<4 x half> [[TMP11]], <4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], i64 3, i8* [[TMP2]])
// CHECK: ret void		// CHECK: ret void
void test_vst4_lane_f16(float16_t *a, float16x4x4_t b) {		void test_vst4_lane_f16(float16_t *a, float16x4x4_t b) {
vst4_lane_f16(a, b, 3);		vst4_lane_f16(a, b, 3);
}		}

// CHECK-LABEL: define void @test_vst4_lane_f32(float* %a, [4 x <2 x float>] %b.coerce) #0 {		// CHECK-LABEL: define void @test_vst4_lane_f32(float* %a, [4 x <2 x float>] %b.coerce) #0 {
// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8		// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8		// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-v8.2a-neon-intrinsics.c

				// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -target-feature +fullfp16 -target-feature +v8.2a\
				// RUN: -fallow-half-arguments-and-returns -S -disable-O0-optnone -emit-llvm -o - %s \
				// RUN: \| opt -S -mem2reg \
				// RUN: \| FileCheck %s

				// REQUIRES: aarch64-registered-target

				#include <arm_neon.h>

				// CHECK-LABEL: test_vabs_f16
				// CHECK: [[ABS:%.*]] = call <4 x half> @llvm.fabs.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[ABS]]
				float16x4_t test_vabs_f16(float16x4_t a) {
				return vabs_f16(a);
				}

				// CHECK-LABEL: test_vabsq_f16
				// CHECK: [[ABS:%.*]] = call <8 x half> @llvm.fabs.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[ABS]]
				float16x8_t test_vabsq_f16(float16x8_t a) {
				return vabsq_f16(a);
				}

				// CHECK-LABEL: test_vceqz_f16
				// CHECK: [[TMP1:%.*]] = fcmp oeq <4 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vceqz_f16(float16x4_t a) {
				return vceqz_f16(a);
				}

				// CHECK-LABEL: test_vceqzq_f16
				// CHECK: [[TMP1:%.*]] = fcmp oeq <8 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vceqzq_f16(float16x8_t a) {
				return vceqzq_f16(a);
				}

				// CHECK-LABEL: test_vcgez_f16
				// CHECK: [[TMP1:%.*]] = fcmp oge <4 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vcgez_f16(float16x4_t a) {
				return vcgez_f16(a);
				}

				// CHECK-LABEL: test_vcgezq_f16
				// CHECK: [[TMP1:%.*]] = fcmp oge <8 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vcgezq_f16(float16x8_t a) {
				return vcgezq_f16(a);
				}

				// CHECK-LABEL: test_vcgtz_f16
				// CHECK: [[TMP1:%.*]] = fcmp ogt <4 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vcgtz_f16(float16x4_t a) {
				return vcgtz_f16(a);
				}

				// CHECK-LABEL: test_vcgtzq_f16
				// CHECK: [[TMP1:%.*]] = fcmp ogt <8 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vcgtzq_f16(float16x8_t a) {
				return vcgtzq_f16(a);
				}

				// CHECK-LABEL: test_vclez_f16
				// CHECK: [[TMP1:%.*]] = fcmp ole <4 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vclez_f16(float16x4_t a) {
				return vclez_f16(a);
				}

				// CHECK-LABEL: test_vclezq_f16
				// CHECK: [[TMP1:%.*]] = fcmp ole <8 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vclezq_f16(float16x8_t a) {
				return vclezq_f16(a);
				}

				// CHECK-LABEL: test_vcltz_f16
				// CHECK: [[TMP1:%.*]] = fcmp olt <4 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vcltz_f16(float16x4_t a) {
				return vcltz_f16(a);
				}

				// CHECK-LABEL: test_vcltzq_f16
				// CHECK: [[TMP1:%.*]] = fcmp olt <8 x half> %a, zeroinitializer
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vcltzq_f16(float16x8_t a) {
				return vcltzq_f16(a);
				}

				// CHECK-LABEL: test_vcvt_f16_s16
				// CHECK: [[VCVT:%.*]] = sitofp <4 x i16> %a to <4 x half>
				// CHECK: ret <4 x half> [[VCVT]]
				float16x4_t test_vcvt_f16_s16 (int16x4_t a) {
				return vcvt_f16_s16(a);
				}

				// CHECK-LABEL: test_vcvtq_f16_s16
				// CHECK: [[VCVT:%.*]] = sitofp <8 x i16> %a to <8 x half>
				// CHECK: ret <8 x half> [[VCVT]]
				float16x8_t test_vcvtq_f16_s16 (int16x8_t a) {
				return vcvtq_f16_s16(a);
				}

				// CHECK-LABEL: test_vcvt_f16_u16
				// CHECK: [[VCVT:%.*]] = uitofp <4 x i16> %a to <4 x half>
				// CHECK: ret <4 x half> [[VCVT]]
				float16x4_t test_vcvt_f16_u16 (uint16x4_t a) {
				return vcvt_f16_u16(a);
				}

				// CHECK-LABEL: test_vcvtq_f16_u16
				// CHECK: [[VCVT:%.*]] = uitofp <8 x i16> %a to <8 x half>
				// CHECK: ret <8 x half> [[VCVT]]
				float16x8_t test_vcvtq_f16_u16 (uint16x8_t a) {
				return vcvtq_f16_u16(a);
				}

				// CHECK-LABEL: test_vcvt_s16_f16
				// CHECK: [[VCVT:%.*]] = fptosi <4 x half> %a to <4 x i16>
				// CHECK: ret <4 x i16> [[VCVT]]
				int16x4_t test_vcvt_s16_f16 (float16x4_t a) {
				return vcvt_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvtq_s16_f16
				// CHECK: [[VCVT:%.*]] = fptosi <8 x half> %a to <8 x i16>
				// CHECK: ret <8 x i16> [[VCVT]]
				int16x8_t test_vcvtq_s16_f16 (float16x8_t a) {
				return vcvtq_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvt_u16_f16
				// CHECK: [[VCVT:%.*]] = fptoui <4 x half> %a to <4 x i16>
				// CHECK: ret <4 x i16> [[VCVT]]
				int16x4_t test_vcvt_u16_f16 (float16x4_t a) {
				return vcvt_u16_f16(a);
				}

				// CHECK-LABEL: test_vcvtq_u16_f16
				// CHECK: [[VCVT:%.*]] = fptoui <8 x half> %a to <8 x i16>
				// CHECK: ret <8 x i16> [[VCVT]]
				int16x8_t test_vcvtq_u16_f16 (float16x8_t a) {
				return vcvtq_u16_f16(a);
				}

				// CHECK-LABEL: test_vcvta_s16_f16
				// CHECK: [[VCVT:%.*]] = call <4 x i16> @llvm.aarch64.neon.fcvtas.v4i16.v4f16(<4 x half> %a)
				// CHECK: ret <4 x i16> [[VCVT]]
				int16x4_t test_vcvta_s16_f16 (float16x4_t a) {
				return vcvta_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvtaq_s16_f16
				// CHECK: [[VCVT:%.*]] = call <8 x i16> @llvm.aarch64.neon.fcvtas.v8i16.v8f16(<8 x half> %a)
				// CHECK: ret <8 x i16> [[VCVT]]
				int16x8_t test_vcvtaq_s16_f16 (float16x8_t a) {
				return vcvtaq_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvtm_s16_f16
				// CHECK: [[VCVT:%.*]] = call <4 x i16> @llvm.aarch64.neon.fcvtms.v4i16.v4f16(<4 x half> %a)
				// CHECK: ret <4 x i16> [[VCVT]]
				int16x4_t test_vcvtm_s16_f16 (float16x4_t a) {
				return vcvtm_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvtmq_s16_f16
				// CHECK: [[VCVT:%.*]] = call <8 x i16> @llvm.aarch64.neon.fcvtms.v8i16.v8f16(<8 x half> %a)
				// CHECK: ret <8 x i16> [[VCVT]]
				int16x8_t test_vcvtmq_s16_f16 (float16x8_t a) {
				return vcvtmq_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvtm_u16_f16
				// CHECK: [[VCVT:%.*]] = call <4 x i16> @llvm.aarch64.neon.fcvtmu.v4i16.v4f16(<4 x half> %a)
				// CHECK: ret <4 x i16> [[VCVT]]
				uint16x4_t test_vcvtm_u16_f16 (float16x4_t a) {
				return vcvtm_u16_f16(a);
				}

				// CHECK-LABEL: test_vcvtmq_u16_f16
				// CHECK: [[VCVT:%.*]] = call <8 x i16> @llvm.aarch64.neon.fcvtmu.v8i16.v8f16(<8 x half> %a)
				// CHECK: ret <8 x i16> [[VCVT]]
				uint16x8_t test_vcvtmq_u16_f16 (float16x8_t a) {
				return vcvtmq_u16_f16(a);
				}

				// CHECK-LABEL: test_vcvtn_s16_f16
				// CHECK: [[VCVT:%.*]] = call <4 x i16> @llvm.aarch64.neon.fcvtns.v4i16.v4f16(<4 x half> %a)
				// CHECK: ret <4 x i16> [[VCVT]]
				int16x4_t test_vcvtn_s16_f16 (float16x4_t a) {
				return vcvtn_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvtnq_s16_f16
				// CHECK: [[VCVT:%.*]] = call <8 x i16> @llvm.aarch64.neon.fcvtns.v8i16.v8f16(<8 x half> %a)
				// CHECK: ret <8 x i16> [[VCVT]]
				int16x8_t test_vcvtnq_s16_f16 (float16x8_t a) {
				return vcvtnq_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvtn_u16_f16
				// CHECK: [[VCVT:%.*]] = call <4 x i16> @llvm.aarch64.neon.fcvtnu.v4i16.v4f16(<4 x half> %a)
				// CHECK: ret <4 x i16> [[VCVT]]
				uint16x4_t test_vcvtn_u16_f16 (float16x4_t a) {
				return vcvtn_u16_f16(a);
				}

				// CHECK-LABEL: test_vcvtnq_u16_f16
				// CHECK: [[VCVT:%.*]] = call <8 x i16> @llvm.aarch64.neon.fcvtnu.v8i16.v8f16(<8 x half> %a)
				// CHECK: ret <8 x i16> [[VCVT]]
				uint16x8_t test_vcvtnq_u16_f16 (float16x8_t a) {
				return vcvtnq_u16_f16(a);
				}

				// CHECK-LABEL: test_vcvtp_s16_f16
				// CHECK: [[VCVT:%.*]] = call <4 x i16> @llvm.aarch64.neon.fcvtps.v4i16.v4f16(<4 x half> %a)
				// CHECK: ret <4 x i16> [[VCVT]]
				int16x4_t test_vcvtp_s16_f16 (float16x4_t a) {
				return vcvtp_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvtpq_s16_f16
				// CHECK: [[VCVT:%.*]] = call <8 x i16> @llvm.aarch64.neon.fcvtps.v8i16.v8f16(<8 x half> %a)
				// CHECK: ret <8 x i16> [[VCVT]]
				int16x8_t test_vcvtpq_s16_f16 (float16x8_t a) {
				return vcvtpq_s16_f16(a);
				}

				// CHECK-LABEL: test_vcvtp_u16_f16
				// CHECK: [[VCVT:%.*]] = call <4 x i16> @llvm.aarch64.neon.fcvtpu.v4i16.v4f16(<4 x half> %a)
				// CHECK: ret <4 x i16> [[VCVT]]
				uint16x4_t test_vcvtp_u16_f16 (float16x4_t a) {
				return vcvtp_u16_f16(a);
				}

				// CHECK-LABEL: test_vcvtpq_u16_f16
				// CHECK: [[VCVT:%.*]] = call <8 x i16> @llvm.aarch64.neon.fcvtpu.v8i16.v8f16(<8 x half> %a)
				// CHECK: ret <8 x i16> [[VCVT]]
				uint16x8_t test_vcvtpq_u16_f16 (float16x8_t a) {
				return vcvtpq_u16_f16(a);
				}

				// FIXME: Fix the zero constant when fp16 non-storage-only type becomes available.
				// CHECK-LABEL: test_vneg_f16
				// CHECK: [[NEG:%.*]] = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %a
				// CHECK: ret <4 x half> [[NEG]]
				float16x4_t test_vneg_f16(float16x4_t a) {
				return vneg_f16(a);
				}

				// CHECK-LABEL: test_vnegq_f16
				// CHECK: [[NEG:%.*]] = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %a
				// CHECK: ret <8 x half> [[NEG]]
				float16x8_t test_vnegq_f16(float16x8_t a) {
				return vnegq_f16(a);
				}

				// CHECK-LABEL: test_vrecpe_f16
				// CHECK: [[RCP:%.*]] = call <4 x half> @llvm.aarch64.neon.frecpe.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[RCP]]
				float16x4_t test_vrecpe_f16(float16x4_t a) {
				return vrecpe_f16(a);
				}

				// CHECK-LABEL: test_vrecpeq_f16
				// CHECK: [[RCP:%.*]] = call <8 x half> @llvm.aarch64.neon.frecpe.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[RCP]]
				float16x8_t test_vrecpeq_f16(float16x8_t a) {
				return vrecpeq_f16(a);
				}

				// CHECK-LABEL: test_vrnd_f16
				// CHECK: [[RND:%.*]] = call <4 x half> @llvm.trunc.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[RND]]
				float16x4_t test_vrnd_f16(float16x4_t a) {
				return vrnd_f16(a);
				}

				// CHECK-LABEL: test_vrndq_f16
				// CHECK: [[RND:%.*]] = call <8 x half> @llvm.trunc.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[RND]]
				float16x8_t test_vrndq_f16(float16x8_t a) {
				return vrndq_f16(a);
				}

				// CHECK-LABEL: test_vrnda_f16
				// CHECK: [[RND:%.*]] = call <4 x half> @llvm.round.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[RND]]
				float16x4_t test_vrnda_f16(float16x4_t a) {
				return vrnda_f16(a);
				}

				// CHECK-LABEL: test_vrndaq_f16
				// CHECK: [[RND:%.*]] = call <8 x half> @llvm.round.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[RND]]
				float16x8_t test_vrndaq_f16(float16x8_t a) {
				return vrndaq_f16(a);
				}

				// CHECK-LABEL: test_vrndi_f16
				// CHECK: [[RND:%.*]] = call <4 x half> @llvm.nearbyint.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[RND]]
				float16x4_t test_vrndi_f16(float16x4_t a) {
				return vrndi_f16(a);
				}

				// CHECK-LABEL: test_vrndiq_f16
				// CHECK: [[RND:%.*]] = call <8 x half> @llvm.nearbyint.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[RND]]
				float16x8_t test_vrndiq_f16(float16x8_t a) {
				return vrndiq_f16(a);
				}

				// CHECK-LABEL: test_vrndm_f16
				// CHECK: [[RND:%.*]] = call <4 x half> @llvm.floor.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[RND]]
				float16x4_t test_vrndm_f16(float16x4_t a) {
				return vrndm_f16(a);
				}

				// CHECK-LABEL: test_vrndmq_f16
				// CHECK: [[RND:%.*]] = call <8 x half> @llvm.floor.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[RND]]
				float16x8_t test_vrndmq_f16(float16x8_t a) {
				return vrndmq_f16(a);
				}

				// CHECK-LABEL: test_vrndn_f16
				// CHECK: [[RND:%.*]] = call <4 x half> @llvm.aarch64.neon.frintn.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[RND]]
				float16x4_t test_vrndn_f16(float16x4_t a) {
				return vrndn_f16(a);
				}

				// CHECK-LABEL: test_vrndnq_f16
				// CHECK: [[RND:%.*]] = call <8 x half> @llvm.aarch64.neon.frintn.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[RND]]
				float16x8_t test_vrndnq_f16(float16x8_t a) {
				return vrndnq_f16(a);
				}

				// CHECK-LABEL: test_vrndp_f16
				// CHECK: [[RND:%.*]] = call <4 x half> @llvm.ceil.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[RND]]
				float16x4_t test_vrndp_f16(float16x4_t a) {
				return vrndp_f16(a);
				}

				// CHECK-LABEL: test_vrndpq_f16
				// CHECK: [[RND:%.*]] = call <8 x half> @llvm.ceil.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[RND]]
				float16x8_t test_vrndpq_f16(float16x8_t a) {
				return vrndpq_f16(a);
				}

				// CHECK-LABEL: test_vrndx_f16
				// CHECK: [[RND:%.*]] = call <4 x half> @llvm.rint.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[RND]]
				float16x4_t test_vrndx_f16(float16x4_t a) {
				return vrndx_f16(a);
				}

				// CHECK-LABEL: test_vrndxq_f16
				// CHECK: [[RND:%.*]] = call <8 x half> @llvm.rint.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[RND]]
				float16x8_t test_vrndxq_f16(float16x8_t a) {
				return vrndxq_f16(a);
				}

				// CHECK-LABEL: test_vrsqrte_f16
				// CHECK: [[RND:%.*]] = call <4 x half> @llvm.aarch64.neon.frsqrte.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[RND]]
				float16x4_t test_vrsqrte_f16(float16x4_t a) {
				return vrsqrte_f16(a);
				}

				// CHECK-LABEL: test_vrsqrteq_f16
				// CHECK: [[RND:%.*]] = call <8 x half> @llvm.aarch64.neon.frsqrte.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[RND]]
				float16x8_t test_vrsqrteq_f16(float16x8_t a) {
				return vrsqrteq_f16(a);
				}

				// CHECK-LABEL: test_vsqrt_f16
				// CHECK: [[SQR:%.*]] = call <4 x half> @llvm.sqrt.v4f16(<4 x half> %a)
				// CHECK: ret <4 x half> [[SQR]]
				float16x4_t test_vsqrt_f16(float16x4_t a) {
				return vsqrt_f16(a);
				}

				// CHECK-LABEL: test_vsqrtq_f16
				// CHECK: [[SQR:%.*]] = call <8 x half> @llvm.sqrt.v8f16(<8 x half> %a)
				// CHECK: ret <8 x half> [[SQR]]
				float16x8_t test_vsqrtq_f16(float16x8_t a) {
				return vsqrtq_f16(a);
				}

				// CHECK-LABEL: test_vadd_f16
				// CHECK: [[ADD:%.*]] = fadd <4 x half> %a, %b
				// CHECK: ret <4 x half> [[ADD]]
				float16x4_t test_vadd_f16(float16x4_t a, float16x4_t b) {
				return vadd_f16(a, b);
				}

				// CHECK-LABEL: test_vaddq_f16
				// CHECK: [[ADD:%.*]] = fadd <8 x half> %a, %b
				// CHECK: ret <8 x half> [[ADD]]
				float16x8_t test_vaddq_f16(float16x8_t a, float16x8_t b) {
				return vaddq_f16(a, b);
				}

				// CHECK-LABEL: test_vabd_f16
				// CHECK: [[ABD:%.*]] = call <4 x half> @llvm.aarch64.neon.fabd.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[ABD]]
				float16x4_t test_vabd_f16(float16x4_t a, float16x4_t b) {
				return vabd_f16(a, b);
				}

				// CHECK-LABEL: test_vabdq_f16
				// CHECK: [[ABD:%.*]] = call <8 x half> @llvm.aarch64.neon.fabd.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[ABD]]
				float16x8_t test_vabdq_f16(float16x8_t a, float16x8_t b) {
				return vabdq_f16(a, b);
				}

				// CHECK-LABEL: test_vcage_f16
				// CHECK: [[ABS:%.*]] = call <4 x i16> @llvm.aarch64.neon.facge.v4i16.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x i16> [[ABS]]
				uint16x4_t test_vcage_f16(float16x4_t a, float16x4_t b) {
				return vcage_f16(a, b);
				}

				// CHECK-LABEL: test_vcageq_f16
				// CHECK: [[ABS:%.*]] = call <8 x i16> @llvm.aarch64.neon.facge.v8i16.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x i16> [[ABS]]
				uint16x8_t test_vcageq_f16(float16x8_t a, float16x8_t b) {
				return vcageq_f16(a, b);
				}

				// CHECK-LABEL: test_vcagt_f16
				// CHECK: [[ABS:%.*]] = call <4 x i16> @llvm.aarch64.neon.facgt.v4i16.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x i16> [[ABS]]
				uint16x4_t test_vcagt_f16(float16x4_t a, float16x4_t b) {
				return vcagt_f16(a, b);
				}

				// CHECK-LABEL: test_vcagtq_f16
				// CHECK: [[ABS:%.*]] = call <8 x i16> @llvm.aarch64.neon.facgt.v8i16.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x i16> [[ABS]]
				uint16x8_t test_vcagtq_f16(float16x8_t a, float16x8_t b) {
				return vcagtq_f16(a, b);
				}

				// CHECK-LABEL: test_vcale_f16
				// CHECK: [[ABS:%.*]] = call <4 x i16> @llvm.aarch64.neon.facge.v4i16.v4f16(<4 x half> %b, <4 x half> %a)
				// CHECK: ret <4 x i16> [[ABS]]
				uint16x4_t test_vcale_f16(float16x4_t a, float16x4_t b) {
				return vcale_f16(a, b);
				}

				// CHECK-LABEL: test_vcaleq_f16
				// CHECK: [[ABS:%.*]] = call <8 x i16> @llvm.aarch64.neon.facge.v8i16.v8f16(<8 x half> %b, <8 x half> %a)
				// CHECK: ret <8 x i16> [[ABS]]
				uint16x8_t test_vcaleq_f16(float16x8_t a, float16x8_t b) {
				return vcaleq_f16(a, b);
				}

				// CHECK-LABEL: test_vcalt_f16
				// CHECK: [[ABS:%.*]] = call <4 x i16> @llvm.aarch64.neon.facgt.v4i16.v4f16(<4 x half> %b, <4 x half> %a)
				// CHECK: ret <4 x i16> [[ABS]]
				uint16x4_t test_vcalt_f16(float16x4_t a, float16x4_t b) {
				return vcalt_f16(a, b);
				}

				// CHECK-LABEL: test_vcaltq_f16
				// CHECK: [[ABS:%.*]] = call <8 x i16> @llvm.aarch64.neon.facgt.v8i16.v8f16(<8 x half> %b, <8 x half> %a)
				// CHECK: ret <8 x i16> [[ABS]]
				uint16x8_t test_vcaltq_f16(float16x8_t a, float16x8_t b) {
				return vcaltq_f16(a, b);
				}

				// CHECK-LABEL: test_vceq_f16
				// CHECK: [[TMP1:%.*]] = fcmp oeq <4 x half> %a, %b
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vceq_f16(float16x4_t a, float16x4_t b) {
				return vceq_f16(a, b);
				}

				// CHECK-LABEL: test_vceqq_f16
				// CHECK: [[TMP1:%.*]] = fcmp oeq <8 x half> %a, %b
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vceqq_f16(float16x8_t a, float16x8_t b) {
				return vceqq_f16(a, b);
				}

				// CHECK-LABEL: test_vcge_f16
				// CHECK: [[TMP1:%.*]] = fcmp oge <4 x half> %a, %b
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vcge_f16(float16x4_t a, float16x4_t b) {
				return vcge_f16(a, b);
				}

				// CHECK-LABEL: test_vcgeq_f16
				// CHECK: [[TMP1:%.*]] = fcmp oge <8 x half> %a, %b
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vcgeq_f16(float16x8_t a, float16x8_t b) {
				return vcgeq_f16(a, b);
				}

				// CHECK-LABEL: test_vcgt_f16
				// CHECK: [[TMP1:%.*]] = fcmp ogt <4 x half> %a, %b
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vcgt_f16(float16x4_t a, float16x4_t b) {
				return vcgt_f16(a, b);
				}

				// CHECK-LABEL: test_vcgtq_f16
				// CHECK: [[TMP1:%.*]] = fcmp ogt <8 x half> %a, %b
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vcgtq_f16(float16x8_t a, float16x8_t b) {
				return vcgtq_f16(a, b);
				}

				// CHECK-LABEL: test_vcle_f16
				// CHECK: [[TMP1:%.*]] = fcmp ole <4 x half> %a, %b
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vcle_f16(float16x4_t a, float16x4_t b) {
				return vcle_f16(a, b);
				}

				// CHECK-LABEL: test_vcleq_f16
				// CHECK: [[TMP1:%.*]] = fcmp ole <8 x half> %a, %b
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vcleq_f16(float16x8_t a, float16x8_t b) {
				return vcleq_f16(a, b);
				}

				// CHECK-LABEL: test_vclt_f16
				// CHECK: [[TMP1:%.*]] = fcmp olt <4 x half> %a, %b
				// CHECK: [[TMP2:%.*]] = sext <4 x i1> [[TMP1]] to <4 x i16>
				// CHECK: ret <4 x i16> [[TMP2]]
				uint16x4_t test_vclt_f16(float16x4_t a, float16x4_t b) {
				return vclt_f16(a, b);
				}

				// CHECK-LABEL: test_vcltq_f16
				// CHECK: [[TMP1:%.*]] = fcmp olt <8 x half> %a, %b
				// CHECK: [[TMP2:%.]] = sext <8 x i1> [[TMP1:%.]] to <8 x i16>
				// CHECK: ret <8 x i16> [[TMP2]]
				uint16x8_t test_vcltq_f16(float16x8_t a, float16x8_t b) {
				return vcltq_f16(a, b);
				}

				// CHECK-LABEL: test_vcvt_n_f16_s16
				// CHECK: [[CVT:%.*]] = call <4 x half> @llvm.aarch64.neon.vcvtfxs2fp.v4f16.v4i16(<4 x i16> %vcvt_n, i32 2)
				// CHECK: ret <4 x half> [[CVT]]
				float16x4_t test_vcvt_n_f16_s16(int16x4_t a) {
				return vcvt_n_f16_s16(a, 2);
				}

				// CHECK-LABEL: test_vcvtq_n_f16_s16
				// CHECK: [[CVT:%.*]] = call <8 x half> @llvm.aarch64.neon.vcvtfxs2fp.v8f16.v8i16(<8 x i16> %vcvt_n, i32 2)
				// CHECK: ret <8 x half> [[CVT]]
				float16x8_t test_vcvtq_n_f16_s16(int16x8_t a) {
				return vcvtq_n_f16_s16(a, 2);
				}

				// CHECK-LABEL: test_vcvt_n_f16_u16
				// CHECK: [[CVT:%.*]] = call <4 x half> @llvm.aarch64.neon.vcvtfxu2fp.v4f16.v4i16(<4 x i16> %vcvt_n, i32 2)
				// CHECK: ret <4 x half> [[CVT]]
				float16x4_t test_vcvt_n_f16_u16(uint16x4_t a) {
				return vcvt_n_f16_u16(a, 2);
				}

				// CHECK-LABEL: test_vcvtq_n_f16_u16
				// CHECK: [[CVT:%.*]] = call <8 x half> @llvm.aarch64.neon.vcvtfxu2fp.v8f16.v8i16(<8 x i16> %vcvt_n, i32 2)
				// CHECK: ret <8 x half> [[CVT]]
				float16x8_t test_vcvtq_n_f16_u16(uint16x8_t a) {
				return vcvtq_n_f16_u16(a, 2);
				}

				// CHECK-LABEL: test_vcvt_n_s16_f16
				// CHECK: [[CVT:%.*]] = call <4 x i16> @llvm.aarch64.neon.vcvtfp2fxs.v4i16.v4f16(<4 x half> %vcvt_n, i32 2)
				// CHECK: ret <4 x i16> [[CVT]]
				int16x4_t test_vcvt_n_s16_f16(float16x4_t a) {
				return vcvt_n_s16_f16(a, 2);
				}

				// CHECK-LABEL: test_vcvtq_n_s16_f16
				// CHECK: [[CVT:%.*]] = call <8 x i16> @llvm.aarch64.neon.vcvtfp2fxs.v8i16.v8f16(<8 x half> %vcvt_n, i32 2)
				// CHECK: ret <8 x i16> [[CVT]]
				int16x8_t test_vcvtq_n_s16_f16(float16x8_t a) {
				return vcvtq_n_s16_f16(a, 2);
				}

				// CHECK-LABEL: test_vcvt_n_u16_f16
				// CHECK: [[CVT:%.*]] = call <4 x i16> @llvm.aarch64.neon.vcvtfp2fxu.v4i16.v4f16(<4 x half> %vcvt_n, i32 2)
				// CHECK: ret <4 x i16> [[CVT]]
				uint16x4_t test_vcvt_n_u16_f16(float16x4_t a) {
				return vcvt_n_u16_f16(a, 2);
				}

				// CHECK-LABEL: test_vcvtq_n_u16_f16
				// CHECK: [[CVT:%.*]] = call <8 x i16> @llvm.aarch64.neon.vcvtfp2fxu.v8i16.v8f16(<8 x half> %vcvt_n, i32 2)
				// CHECK: ret <8 x i16> [[CVT]]
				uint16x8_t test_vcvtq_n_u16_f16(float16x8_t a) {
				return vcvtq_n_u16_f16(a, 2);
				}

				// CHECK-LABEL: test_vdiv_f16
				// CHECK: [[DIV:%.*]] = fdiv <4 x half> %a, %b
				// CHECK: ret <4 x half> [[DIV]]
				float16x4_t test_vdiv_f16(float16x4_t a, float16x4_t b) {
				return vdiv_f16(a, b);
				}

				// CHECK-LABEL: test_vdivq_f16
				// CHECK: [[DIV:%.*]] = fdiv <8 x half> %a, %b
				// CHECK: ret <8 x half> [[DIV]]
				float16x8_t test_vdivq_f16(float16x8_t a, float16x8_t b) {
				return vdivq_f16(a, b);
				}

				// CHECK-LABEL: test_vmax_f16
				// CHECK: [[MAX:%.*]] = call <4 x half> @llvm.aarch64.neon.fmax.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MAX]]
				float16x4_t test_vmax_f16(float16x4_t a, float16x4_t b) {
				return vmax_f16(a, b);
				}

				// CHECK-LABEL: test_vmaxq_f16
				// CHECK: [[MAX:%.*]] = call <8 x half> @llvm.aarch64.neon.fmax.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MAX]]
				float16x8_t test_vmaxq_f16(float16x8_t a, float16x8_t b) {
				return vmaxq_f16(a, b);
				}

				// CHECK-LABEL: test_vmaxnm_f16
				// CHECK: [[MAX:%.*]] = call <4 x half> @llvm.aarch64.neon.fmaxnm.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MAX]]
				float16x4_t test_vmaxnm_f16(float16x4_t a, float16x4_t b) {
				return vmaxnm_f16(a, b);
				}

				// CHECK-LABEL: test_vmaxnmq_f16
				// CHECK: [[MAX:%.*]] = call <8 x half> @llvm.aarch64.neon.fmaxnm.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MAX]]
				float16x8_t test_vmaxnmq_f16(float16x8_t a, float16x8_t b) {
				return vmaxnmq_f16(a, b);
				}

				// CHECK-LABEL: test_vmin_f16
				// CHECK: [[MIN:%.*]] = call <4 x half> @llvm.aarch64.neon.fmin.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MIN]]
				float16x4_t test_vmin_f16(float16x4_t a, float16x4_t b) {
				return vmin_f16(a, b);
				}

				// CHECK-LABEL: test_vminq_f16
				// CHECK: [[MIN:%.*]] = call <8 x half> @llvm.aarch64.neon.fmin.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MIN]]
				float16x8_t test_vminq_f16(float16x8_t a, float16x8_t b) {
				return vminq_f16(a, b);
				}

				// CHECK-LABEL: test_vminnm_f16
				// CHECK: [[MIN:%.*]] = call <4 x half> @llvm.aarch64.neon.fminnm.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MIN]]
				float16x4_t test_vminnm_f16(float16x4_t a, float16x4_t b) {
				return vminnm_f16(a, b);
				}

				// CHECK-LABEL: test_vminnmq_f16
				// CHECK: [[MIN:%.*]] = call <8 x half> @llvm.aarch64.neon.fminnm.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MIN]]
				float16x8_t test_vminnmq_f16(float16x8_t a, float16x8_t b) {
				return vminnmq_f16(a, b);
				}

				// CHECK-LABEL: test_vmul_f16
				// CHECK: [[MUL:%.*]] = fmul <4 x half> %a, %b
				// CHECK: ret <4 x half> [[MUL]]
				float16x4_t test_vmul_f16(float16x4_t a, float16x4_t b) {
				return vmul_f16(a, b);
				}

				// CHECK-LABEL: test_vmulq_f16
				// CHECK: [[MUL:%.*]] = fmul <8 x half> %a, %b
				// CHECK: ret <8 x half> [[MUL]]
				float16x8_t test_vmulq_f16(float16x8_t a, float16x8_t b) {
				return vmulq_f16(a, b);
				}

				// CHECK-LABEL: test_vmulx_f16
				// CHECK: [[MUL:%.*]] = call <4 x half> @llvm.aarch64.neon.fmulx.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MUL]]
				float16x4_t test_vmulx_f16(float16x4_t a, float16x4_t b) {
				return vmulx_f16(a, b);
				}

				// CHECK-LABEL: test_vmulxq_f16
				// CHECK: [[MUL:%.*]] = call <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MUL]]
				float16x8_t test_vmulxq_f16(float16x8_t a, float16x8_t b) {
				return vmulxq_f16(a, b);
				}

				// CHECK-LABEL: test_vpadd_f16
				// CHECK: [[ADD:%.*]] = call <4 x half> @llvm.aarch64.neon.addp.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[ADD]]
				float16x4_t test_vpadd_f16(float16x4_t a, float16x4_t b) {
				return vpadd_f16(a, b);
				}

				// CHECK-LABEL: test_vpaddq_f16
				// CHECK: [[ADD:%.*]] = call <8 x half> @llvm.aarch64.neon.addp.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[ADD]]
				float16x8_t test_vpaddq_f16(float16x8_t a, float16x8_t b) {
				return vpaddq_f16(a, b);
				}

				// CHECK-LABEL: test_vpmax_f16
				// CHECK: [[MAX:%.*]] = call <4 x half> @llvm.aarch64.neon.fmaxp.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MAX]]
				float16x4_t test_vpmax_f16(float16x4_t a, float16x4_t b) {
				return vpmax_f16(a, b);
				}

				// CHECK-LABEL: test_vpmaxq_f16
				// CHECK: [[MAX:%.*]] = call <8 x half> @llvm.aarch64.neon.fmaxp.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MAX]]
				float16x8_t test_vpmaxq_f16(float16x8_t a, float16x8_t b) {
				return vpmaxq_f16(a, b);
				}

				// CHECK-LABEL: test_vpmaxnm_f16
				// CHECK: [[MAX:%.*]] = call <4 x half> @llvm.aarch64.neon.fmaxnmp.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MAX]]
				float16x4_t test_vpmaxnm_f16(float16x4_t a, float16x4_t b) {
				return vpmaxnm_f16(a, b);
				}

				// CHECK-LABEL: test_vpmaxnmq_f16
				// CHECK: [[MAX:%.*]] = call <8 x half> @llvm.aarch64.neon.fmaxnmp.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MAX]]
				float16x8_t test_vpmaxnmq_f16(float16x8_t a, float16x8_t b) {
				return vpmaxnmq_f16(a, b);
				}

				// CHECK-LABEL: test_vpmin_f16
				// CHECK: [[MIN:%.*]] = call <4 x half> @llvm.aarch64.neon.fminp.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MIN]]
				float16x4_t test_vpmin_f16(float16x4_t a, float16x4_t b) {
				return vpmin_f16(a, b);
				}

				// CHECK-LABEL: test_vpminq_f16
				// CHECK: [[MIN:%.*]] = call <8 x half> @llvm.aarch64.neon.fminp.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MIN]]
				float16x8_t test_vpminq_f16(float16x8_t a, float16x8_t b) {
				return vpminq_f16(a, b);
				}

				// CHECK-LABEL: test_vpminnm_f16
				// CHECK: [[MIN:%.*]] = call <4 x half> @llvm.aarch64.neon.fminnmp.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MIN]]
				float16x4_t test_vpminnm_f16(float16x4_t a, float16x4_t b) {
				return vpminnm_f16(a, b);
				}

				// CHECK-LABEL: test_vpminnmq_f16
				// CHECK: [[MIN:%.*]] = call <8 x half> @llvm.aarch64.neon.fminnmp.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MIN]]
				float16x8_t test_vpminnmq_f16(float16x8_t a, float16x8_t b) {
				return vpminnmq_f16(a, b);
				}

				// CHECK-LABEL: test_vrecps_f16
				// CHECK: [[MIN:%.*]] = call <4 x half> @llvm.aarch64.neon.frecps.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MIN]]
				float16x4_t test_vrecps_f16(float16x4_t a, float16x4_t b) {
				return vrecps_f16(a, b);
				}

				// CHECK-LABEL: test_vrecpsq_f16
				// CHECK: [[MIN:%.*]] = call <8 x half> @llvm.aarch64.neon.frecps.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MIN]]
				float16x8_t test_vrecpsq_f16(float16x8_t a, float16x8_t b) {
				return vrecpsq_f16(a, b);
				}

				// CHECK-LABEL: test_vrsqrts_f16
				// CHECK: [[MIN:%.*]] = call <4 x half> @llvm.aarch64.neon.frsqrts.v4f16(<4 x half> %a, <4 x half> %b)
				// CHECK: ret <4 x half> [[MIN]]
				float16x4_t test_vrsqrts_f16(float16x4_t a, float16x4_t b) {
				return vrsqrts_f16(a, b);
				}

				// CHECK-LABEL: test_vrsqrtsq_f16
				// CHECK: [[MIN:%.*]] = call <8 x half> @llvm.aarch64.neon.frsqrts.v8f16(<8 x half> %a, <8 x half> %b)
				// CHECK: ret <8 x half> [[MIN]]
				float16x8_t test_vrsqrtsq_f16(float16x8_t a, float16x8_t b) {
				return vrsqrtsq_f16(a, b);
				}

				// CHECK-LABEL: test_vsub_f16
				// CHECK: [[ADD:%.*]] = fsub <4 x half> %a, %b
				// CHECK: ret <4 x half> [[ADD]]
				float16x4_t test_vsub_f16(float16x4_t a, float16x4_t b) {
				return vsub_f16(a, b);
				}

				// CHECK-LABEL: test_vsubq_f16
				// CHECK: [[ADD:%.*]] = fsub <8 x half> %a, %b
				// CHECK: ret <8 x half> [[ADD]]
				float16x8_t test_vsubq_f16(float16x8_t a, float16x8_t b) {
				return vsubq_f16(a, b);
				}

				// CHECK-LABEL: test_vfma_f16
				// CHECK: [[ADD:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> %b, <4 x half> %c, <4 x half> %a)
				// CHECK: ret <4 x half> [[ADD]]
				float16x4_t test_vfma_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
				return vfma_f16(a, b, c);
				}

				// CHECK-LABEL: test_vfmaq_f16
				// CHECK: [[ADD:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> %b, <8 x half> %c, <8 x half> %a)
				// CHECK: ret <8 x half> [[ADD]]
				float16x8_t test_vfmaq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
				return vfmaq_f16(a, b, c);
				}

				// CHECK-LABEL: test_vfms_f16
				// CHECK: [[SUB:%.*]] = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
				// CHECK: [[ADD:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[SUB]], <4 x half> %c, <4 x half> %a)
				// CHECK: ret <4 x half> [[ADD]]
				float16x4_t test_vfms_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
				return vfms_f16(a, b, c);
				}

				// CHECK-LABEL: test_vfmsq_f16
				// CHECK: [[SUB:%.*]] = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
				// CHECK: [[ADD:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[SUB]], <8 x half> %c, <8 x half> %a)
				// CHECK: ret <8 x half> [[ADD]]
				float16x8_t test_vfmsq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
				return vfmsq_f16(a, b, c);
				}

				// CHECK-LABEL: test_vfma_lane_f16
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>
				// CHECK: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
				// CHECK: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[FMLA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]])
				// CHECK: ret <4 x half> [[FMLA]]
				float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
				return vfma_lane_f16(a, b, c, 3);
				}

				// CHECK-LABEL: test_vfmaq_lane_f16
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>
				// CHECK: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
				// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]])
				// CHECK: ret <8 x half> [[FMLA]]
				float16x8_t test_vfmaq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {
				return vfmaq_lane_f16(a, b, c, 3);
				}

				// CHECK-LABEL: test_vfma_laneq_f16
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <8 x half> %c to <16 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
				// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x half>
				// CHECK: [[LANE:%.*]] = shufflevector <8 x half> [[TMP5]], <8 x half> [[TMP5]], <4 x i32> <i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[FMLA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[LANE]], <4 x half> [[TMP4]], <4 x half> [[TMP3]])
				// CHECK: ret <4 x half> [[FMLA]]
				float16x4_t test_vfma_laneq_f16(float16x4_t a, float16x4_t b, float16x8_t c) {
				return vfma_laneq_f16(a, b, c, 7);
				}

				// CHECK-LABEL: test_vfmaq_laneq_f16
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <8 x half> %c to <16 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
				// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x half>
				// CHECK: [[LANE:%.*]] = shufflevector <8 x half> [[TMP5]], <8 x half> [[TMP5]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]])
				// CHECK: ret <8 x half> [[FMLA]]
				float16x8_t test_vfmaq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
				return vfmaq_laneq_f16(a, b, c, 7);
				}

				// CHECK-LABEL: test_vfma_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <4 x half> undef, half %c, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %c, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %c, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %c, i32 3
				// CHECK: [[FMA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> %b, <4 x half> [[TMP3]], <4 x half> %a)
				// CHECK: ret <4 x half> [[FMA]]
				float16x4_t test_vfma_n_f16(float16x4_t a, float16x4_t b, float16_t c) {
				return vfma_n_f16(a, b, c);
				}

				// CHECK-LABEL: test_vfmaq_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <8 x half> undef, half %c, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %c, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %c, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %c, i32 3
				// CHECK: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %c, i32 4
				// CHECK: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %c, i32 5
				// CHECK: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %c, i32 6
				// CHECK: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %c, i32 7
				// CHECK: [[FMA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> %b, <8 x half> [[TMP7]], <8 x half> %a)
				// CHECK: ret <8 x half> [[FMA]]
				float16x8_t test_vfmaq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {
				return vfmaq_n_f16(a, b, c);
				}

				// CHECK-LABEL: test_vfmah_lane_f16
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %c to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[EXTR:%.*]] = extractelement <4 x half> [[TMP1]], i32 3
				// CHECK: [[FMA:%.*]] = call half @llvm.fma.f16(half %b, half [[EXTR]], half %a)
				// CHECK: ret half [[FMA]]
				float16_t test_vfmah_lane_f16(float16_t a, float16_t b, float16x4_t c) {
				return vfmah_lane_f16(a, b, c, 3);
				}

				// CHECK-LABEL: test_vfmah_laneq_f16
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %c to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[EXTR:%.*]] = extractelement <8 x half> [[TMP1]], i32 7
				// CHECK: [[FMA:%.*]] = call half @llvm.fma.f16(half %b, half [[EXTR]], half %a)
				// CHECK: ret half [[FMA]]
				float16_t test_vfmah_laneq_f16(float16_t a, float16_t b, float16x8_t c) {
				return vfmah_laneq_f16(a, b, c, 7);
				}

				// CHECK-LABEL: test_vfms_lane_f16
				// CHECK: [[SUB:%.*]] = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <4 x half> [[SUB]] to <8 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>
				// CHECK: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
				// CHECK: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[FMA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]])
				// CHECK: ret <4 x half> [[FMA]]
				float16x4_t test_vfms_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
				return vfms_lane_f16(a, b, c, 3);
				}

				// CHECK-LABEL: test_vfmsq_lane_f16
				// CHECK: [[SUB:%.*]] = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x half> [[SUB]] to <16 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>
				// CHECK: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
				// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]])
				// CHECK: ret <8 x half> [[FMLA]]
				float16x8_t test_vfmsq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {
				return vfmsq_lane_f16(a, b, c, 3);
				}

				// CHECK-LABEL: test_vfms_laneq_f16
				// CHECK: [[SUB:%.*]] = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <4 x half> [[SUB]] to <8 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <8 x half> %c to <16 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
				// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x half>
				// CHECK: [[LANE:%.*]] = shufflevector <8 x half> [[TMP5]], <8 x half> [[TMP5]], <4 x i32> <i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[FMLA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[LANE]], <4 x half> [[TMP4]], <4 x half> [[TMP3]])
				// CHECK: ret <4 x half> [[FMLA]]
				float16x4_t test_vfms_laneq_f16(float16x4_t a, float16x4_t b, float16x8_t c) {
				return vfms_laneq_f16(a, b, c, 7);
				}

				// CHECK-LABEL: test_vfmsq_laneq_f16
				// CHECK: [[SUB:%.*]] = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x half> [[SUB]] to <16 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <8 x half> %c to <16 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
				// CHECK: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x half>
				// CHECK: [[LANE:%.*]] = shufflevector <8 x half> [[TMP5]], <8 x half> [[TMP5]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]])
				// CHECK: ret <8 x half> [[FMLA]]
				float16x8_t test_vfmsq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
				return vfmsq_laneq_f16(a, b, c, 7);
				}

				// CHECK-LABEL: test_vfms_n_f16
				// CHECK: [[SUB:%.*]] = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
				// CHECK: [[TMP0:%.*]] = insertelement <4 x half> undef, half %c, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %c, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %c, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %c, i32 3
				// CHECK: [[FMA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[SUB]], <4 x half> [[TMP3]], <4 x half> %a)
				// CHECK: ret <4 x half> [[FMA]]
				float16x4_t test_vfms_n_f16(float16x4_t a, float16x4_t b, float16_t c) {
				return vfms_n_f16(a, b, c);
				}

				// CHECK-LABEL: test_vfmsq_n_f16
				// CHECK: [[SUB:%.*]] = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
				// CHECK: [[TMP0:%.*]] = insertelement <8 x half> undef, half %c, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %c, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %c, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %c, i32 3
				// CHECK: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %c, i32 4
				// CHECK: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %c, i32 5
				// CHECK: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %c, i32 6
				// CHECK: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %c, i32 7
				// CHECK: [[FMA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[SUB]], <8 x half> [[TMP7]], <8 x half> %a)
				// CHECK: ret <8 x half> [[FMA]]
				float16x8_t test_vfmsq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {
				return vfmsq_n_f16(a, b, c);
				}

				// CHECK-LABEL: test_vfmsh_lane_f16
				// CHECK: [[TMP0:%.*]] = fpext half %b to float
				// CHECK: [[TMP1:%.*]] = fsub float -0.000000e+00, [[TMP0]]
				// CHECK: [[SUB:%.*]] = fptrunc float [[TMP1]] to half
				// CHECK: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>
				// CHECK: [[EXTR:%.*]] = extractelement <4 x half> [[TMP3]], i32 3
				// CHECK: [[FMA:%.*]] = call half @llvm.fma.f16(half [[SUB]], half [[EXTR]], half %a)
				// CHECK: ret half [[FMA]]
				float16_t test_vfmsh_lane_f16(float16_t a, float16_t b, float16x4_t c) {
				return vfmsh_lane_f16(a, b, c, 3);
				}

				// CHECK-LABEL: test_vfmsh_laneq_f16
				// CHECK: [[TMP0:%.*]] = fpext half %b to float
				// CHECK: [[TMP1:%.*]] = fsub float -0.000000e+00, [[TMP0]]
				// CHECK: [[SUB:%.*]] = fptrunc float [[TMP1]] to half
				// CHECK: [[TMP2:%.*]] = bitcast <8 x half> %c to <16 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x half>
				// CHECK: [[EXTR:%.*]] = extractelement <8 x half> [[TMP3]], i32 7
				// CHECK: [[FMA:%.*]] = call half @llvm.fma.f16(half [[SUB]], half [[EXTR]], half %a)
				// CHECK: ret half [[FMA]]
				float16_t test_vfmsh_laneq_f16(float16_t a, float16_t b, float16x8_t c) {
				return vfmsh_laneq_f16(a, b, c, 7);
				}

				// CHECK-LABEL: test_vmul_lane_f16
				// CHECK: [[TMP0:%.*]] = shufflevector <4 x half> %b, <4 x half> %b, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[MUL:%.*]] = fmul <4 x half> %a, [[TMP0]]
				// CHECK: ret <4 x half> [[MUL]]
				float16x4_t test_vmul_lane_f16(float16x4_t a, float16x4_t b) {
				return vmul_lane_f16(a, b, 3);
				}

				// CHECK-LABEL: test_vmulq_lane_f16
				// CHECK: [[TMP0:%.*]] = shufflevector <4 x half> %b, <4 x half> %b, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[MUL:%.*]] = fmul <8 x half> %a, [[TMP0]]
				// CHECK: ret <8 x half> [[MUL]]
				float16x8_t test_vmulq_lane_f16(float16x8_t a, float16x4_t b) {
				return vmulq_lane_f16(a, b, 7);
				}

				// CHECK-LABEL: test_vmul_laneq_f16
				// CHECK: [[TMP0:%.*]] = shufflevector <8 x half> %b, <8 x half> %b, <4 x i32> <i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[MUL:%.*]] = fmul <4 x half> %a, [[TMP0]]
				// CHECK: ret <4 x half> [[MUL]]
				float16x4_t test_vmul_laneq_f16(float16x4_t a, float16x8_t b) {
				return vmul_laneq_f16(a, b, 7);
				}

				// CHECK-LABEL: test_vmulq_laneq_f16
				// CHECK: [[TMP0:%.*]] = shufflevector <8 x half> %b, <8 x half> %b, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[MUL:%.*]] = fmul <8 x half> %a, [[TMP0]]
				// CHECK: ret <8 x half> [[MUL]]
				float16x8_t test_vmulq_laneq_f16(float16x8_t a, float16x8_t b) {
				return vmulq_laneq_f16(a, b, 7);
				}

				// CHECK-LABEL: test_vmul_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <4 x half> undef, half %b, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %b, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %b, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %b, i32 3
				// CHECK: [[MUL:%.*]] = fmul <4 x half> %a, [[TMP3]]
				// CHECK: ret <4 x half> [[MUL]]
				float16x4_t test_vmul_n_f16(float16x4_t a, float16_t b) {
				return vmul_n_f16(a, b);
				}

				// CHECK-LABEL: test_vmulq_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <8 x half> undef, half %b, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %b, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %b, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %b, i32 3
				// CHECK: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %b, i32 4
				// CHECK: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %b, i32 5
				// CHECK: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %b, i32 6
				// CHECK: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %b, i32 7
				// CHECK: [[MUL:%.*]] = fmul <8 x half> %a, [[TMP7]]
				// CHECK: ret <8 x half> [[MUL]]
				float16x8_t test_vmulq_n_f16(float16x8_t a, float16_t b) {
				return vmulq_n_f16(a, b);
				}

				// FIXME: Fix it when fp16 non-storage-only type becomes available.
				// CHECK-LABEL: test_vmulh_lane_f16
				// CHECK: [[CONV0:%.*]] = fpext half %a to float
				// CHECK: [[CONV1:%.]] = fpext half %{{.}} to float
				// CHECK: [[MUL:%.]] = fmul float [[CONV0:%.]], [[CONV0:%.*]]
				// CHECK: [[CONV3:%.*]] = fptrunc float %mul to half
				// CHECK: ret half [[CONV3:%.*]]
				float16_t test_vmulh_lane_f16(float16_t a, float16x4_t b) {
				return vmulh_lane_f16(a, b, 3);
				}

				// CHECK-LABEL: test_vmulh_laneq_f16
				// CHECK: [[CONV0:%.*]] = fpext half %a to float
				// CHECK: [[CONV1:%.]] = fpext half %{{.}} to float
				// CHECK: [[MUL:%.]] = fmul float [[CONV0:%.]], [[CONV0:%.*]]
				// CHECK: [[CONV3:%.*]] = fptrunc float %mul to half
				// CHECK: ret half [[CONV3:%.*]]
				float16_t test_vmulh_laneq_f16(float16_t a, float16x8_t b) {
				return vmulh_laneq_f16(a, b, 7);
				}

				// CHECK-LABEL: test_vmulx_lane_f16
				// CHECK: [[TMP0:%.*]] = shufflevector <4 x half> %b, <4 x half> %b, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[MUL:%.*]] = call <4 x half> @llvm.aarch64.neon.fmulx.v4f16(<4 x half> %a, <4 x half> [[TMP0]])
				// CHECK: ret <4 x half> [[MUL]]
				float16x4_t test_vmulx_lane_f16(float16x4_t a, float16x4_t b) {
				return vmulx_lane_f16(a, b, 3);
				}

				// CHECK-LABEL: test_vmulxq_lane_f16
				// CHECK: [[TMP0:%.*]] = shufflevector <4 x half> %b, <4 x half> %b, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[MUL:%.*]] = call <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half> %a, <8 x half> [[TMP0]])
				// CHECK: ret <8 x half> [[MUL]]
				float16x8_t test_vmulxq_lane_f16(float16x8_t a, float16x4_t b) {
				return vmulxq_lane_f16(a, b, 7);
				}

				// CHECK-LABEL: test_vmulx_laneq_f16
				// CHECK: [[TMP0:%.*]] = shufflevector <8 x half> %b, <8 x half> %b, <4 x i32> <i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[MUL:%.*]] = call <4 x half> @llvm.aarch64.neon.fmulx.v4f16(<4 x half> %a, <4 x half> [[TMP0]])
				// CHECK: ret <4 x half> [[MUL]]
				float16x4_t test_vmulx_laneq_f16(float16x4_t a, float16x8_t b) {
				return vmulx_laneq_f16(a, b, 7);
				}

				// CHECK-LABEL: test_vmulxq_laneq_f16
				// CHECK: [[TMP0:%.*]] = shufflevector <8 x half> %b, <8 x half> %b, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[MUL:%.*]] = call <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half> %a, <8 x half> [[TMP0]])
				// CHECK: ret <8 x half> [[MUL]]
				float16x8_t test_vmulxq_laneq_f16(float16x8_t a, float16x8_t b) {
				return vmulxq_laneq_f16(a, b, 7);
				}

				// CHECK-LABEL: test_vmulx_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <4 x half> undef, half %b, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %b, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %b, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %b, i32 3
				// CHECK: [[MUL:%.*]] = call <4 x half> @llvm.aarch64.neon.fmulx.v4f16(<4 x half> %a, <4 x half> [[TMP3]])
				// CHECK: ret <4 x half> [[MUL]]
				float16x4_t test_vmulx_n_f16(float16x4_t a, float16_t b) {
				return vmulx_n_f16(a, b);
				}

				// CHECK-LABEL: test_vmulxq_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <8 x half> undef, half %b, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %b, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %b, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %b, i32 3
				// CHECK: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %b, i32 4
				// CHECK: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %b, i32 5
				// CHECK: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %b, i32 6
				// CHECK: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %b, i32 7
				// CHECK: [[MUL:%.*]] = call <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half> %a, <8 x half> [[TMP7]])
				// CHECK: ret <8 x half> [[MUL]]
				float16x8_t test_vmulxq_n_f16(float16x8_t a, float16_t b) {
				return vmulxq_n_f16(a, b);
				}

				/* TODO: Not implemented yet (needs scalar intrinsic from arm_fp16.h)
				// CCHECK-LABEL: test_vmulxh_lane_f16
				// CCHECK: [[CONV0:%.*]] = fpext half %a to float
				// CCHECK: [[CONV1:%.]] = fpext half %{{.}} to float
				// CCHECK: [[MUL:%.]] = fmul float [[CONV0:%.]], [[CONV0:%.*]]
				// CCHECK: [[CONV3:%.*]] = fptrunc float %mul to half
				// CCHECK: ret half [[CONV3:%.*]]
				float16_t test_vmulxh_lane_f16(float16_t a, float16x4_t b) {
				return vmulxh_lane_f16(a, b, 3);
				}

				// CCHECK-LABEL: test_vmulxh_laneq_f16
				// CCHECK: [[CONV0:%.*]] = fpext half %a to float
				// CCHECK: [[CONV1:%.]] = fpext half %{{.}} to float
				// CCHECK: [[MUL:%.]] = fmul float [[CONV0:%.]], [[CONV0:%.*]]
				// CCHECK: [[CONV3:%.*]] = fptrunc float %mul to half
				// CCHECK: ret half [[CONV3:%.*]]
				float16_t test_vmulxh_laneq_f16(float16_t a, float16x8_t b) {
				return vmulxh_laneq_f16(a, b, 7);
				}
				*/

				// CHECK-LABEL: test_vmaxv_f16
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[MAX:%.*]] = call half @llvm.aarch64.neon.fmaxv.f16.v4f16(<4 x half> [[TMP1]])
				// CHECK: ret half [[MAX]]
				float16_t test_vmaxv_f16(float16x4_t a) {
				return vmaxv_f16(a);
				}

				// CHECK-LABEL: test_vmaxvq_f16
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[MAX:%.*]] = call half @llvm.aarch64.neon.fmaxv.f16.v8f16(<8 x half> [[TMP1]])
				// CHECK: ret half [[MAX]]
				float16_t test_vmaxvq_f16(float16x8_t a) {
				return vmaxvq_f16(a);
				}

				// CHECK-LABEL: test_vminv_f16
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[MAX:%.*]] = call half @llvm.aarch64.neon.fminv.f16.v4f16(<4 x half> [[TMP1]])
				// CHECK: ret half [[MAX]]
				float16_t test_vminv_f16(float16x4_t a) {
				return vminv_f16(a);
				}

				// CHECK-LABEL: test_vminvq_f16
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[MAX:%.*]] = call half @llvm.aarch64.neon.fminv.f16.v8f16(<8 x half> [[TMP1]])
				// CHECK: ret half [[MAX]]
				float16_t test_vminvq_f16(float16x8_t a) {
				return vminvq_f16(a);
				}

				// CHECK-LABEL: test_vmaxnmv_f16
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[MAX:%.*]] = call half @llvm.aarch64.neon.fmaxnmv.f16.v4f16(<4 x half> [[TMP1]])
				// CHECK: ret half [[MAX]]
				float16_t test_vmaxnmv_f16(float16x4_t a) {
				return vmaxnmv_f16(a);
				}

				// CHECK-LABEL: test_vmaxnmvq_f16
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[MAX:%.*]] = call half @llvm.aarch64.neon.fmaxnmv.f16.v8f16(<8 x half> [[TMP1]])
				// CHECK: ret half [[MAX]]
				float16_t test_vmaxnmvq_f16(float16x8_t a) {
				return vmaxnmvq_f16(a);
				}

				// CHECK-LABEL: test_vminnmv_f16
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[MAX:%.*]] = call half @llvm.aarch64.neon.fminnmv.f16.v4f16(<4 x half> [[TMP1]])
				// CHECK: ret half [[MAX]]
				float16_t test_vminnmv_f16(float16x4_t a) {
				return vminnmv_f16(a);
				}

				// CHECK-LABEL: test_vminnmvq_f16
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[MAX:%.*]] = call half @llvm.aarch64.neon.fminnmv.f16.v8f16(<8 x half> [[TMP1]])
				// CHECK: ret half [[MAX]]
				float16_t test_vminnmvq_f16(float16x8_t a) {
				return vminnmvq_f16(a);
				}

				// CHECK-LABEL: test_vbsl_f16
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %b to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %c to <8 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
				// CHECK: [[TMP4:%.*]] = and <4 x i16> %a, [[TMP2]]
				// CHECK: [[TMP5:%.*]] = xor <4 x i16> %a, <i16 -1, i16 -1, i16 -1, i16 -1>
				// CHECK: [[TMP6:%.*]] = and <4 x i16> [[TMP5]], [[TMP3]]
				// CHECK: [[TMP7:%.*]] = or <4 x i16> [[TMP4]], [[TMP6]]
				// CHECK: [[TMP8:%.*]] = bitcast <4 x i16> [[TMP7]] to <4 x half>
				// CHECK: ret <4 x half> [[TMP8]]
				float16x4_t test_vbsl_f16(uint16x4_t a, float16x4_t b, float16x4_t c) {
				return vbsl_f16(a, b, c);
				}

				// CHECK-LABEL: test_vbslq_f16
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %b to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %c to <16 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
				// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
				// CHECK: [[TMP4:%.*]] = and <8 x i16> %a, [[TMP2]]
				// CHECK: [[TMP5:%.*]] = xor <8 x i16> %a, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
				// CHECK: [[TMP6:%.*]] = and <8 x i16> [[TMP5]], [[TMP3]]
				// CHECK: [[TMP7:%.*]] = or <8 x i16> [[TMP4]], [[TMP6]]
				// CHECK: [[TMP8:%.*]] = bitcast <8 x i16> [[TMP7]] to <8 x half>
				// CHECK: ret <8 x half> [[TMP8]]
				float16x8_t test_vbslq_f16(uint16x8_t a, float16x8_t b, float16x8_t c) {
				return vbslq_f16(a, b, c);
				}

				// CHECK-LABEL: test_vzip_f16
				// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8
				// CHECK: [[__RET_I:%.*]] = alloca %struct.float16x4x2_t, align 8
				// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET_I]] to i8*
				// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <4 x half>*
				// CHECK: [[VZIP0_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 0, i32 4, i32 1, i32 5>
				// CHECK: store <4 x half> [[VZIP0_I]], <4 x half>* [[TMP1]]
				// CHECK: [[TMP2:%.]] = getelementptr inbounds <4 x half>, <4 x half> [[TMP1]], i32 1
				// CHECK: [[VZIP1_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 2, i32 6, i32 3, i32 7>
				// CHECK: store <4 x half> [[VZIP1_I]], <4 x half>* [[TMP2]]
				// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*
				// CHECK: [[TMP6:%.]] = bitcast %struct.float16x4x2_t [[__RET_I]] to i8*
				// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP5]], i8* [[TMP6]], i64 16, i32 8, i1 false)
				float16x4x2_t test_vzip_f16(float16x4_t a, float16x4_t b) {
				return vzip_f16(a, b);
				}

				// CHECK-LABEL: test_vzipq_f16
				// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16
				// CHECK: [[__RET_I:%.*]] = alloca %struct.float16x8x2_t, align 16
				// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET_I]] to i8*
				// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <8 x half>*
				// CHECK: [[VZIP0_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>
				// CHECK: store <8 x half> [[VZIP0_I]], <8 x half>* [[TMP1]]
				// CHECK: [[TMP2:%.]] = getelementptr inbounds <8 x half>, <8 x half> [[TMP1]], i32 1
				// CHECK: [[VZIP1_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
				// CHECK: store <8 x half> [[VZIP1_I]], <8 x half>* [[TMP2]]
				// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*
				// CHECK: [[TMP6:%.]] = bitcast %struct.float16x8x2_t [[__RET_I]] to i8*
				// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP5]], i8* [[TMP6]], i64 32, i32 16, i1 false)
				float16x8x2_t test_vzipq_f16(float16x8_t a, float16x8_t b) {
				return vzipq_f16(a, b);
				}

				// CHECK-LABEL: test_vuzp_f16
				// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8
				// CHECK: [[__RET_I:%.*]] = alloca %struct.float16x4x2_t, align 8
				// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET_I]] to i8*
				// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <4 x half>*
				// CHECK: [[VZIP0_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				// CHECK: store <4 x half> [[VZIP0_I]], <4 x half>* [[TMP1]]
				// CHECK: [[TMP2:%.]] = getelementptr inbounds <4 x half>, <4 x half> [[TMP1]], i32 1
				// CHECK: [[VZIP1_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
				// CHECK: store <4 x half> [[VZIP1_I]], <4 x half>* [[TMP2]]
				// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*
				// CHECK: [[TMP6:%.]] = bitcast %struct.float16x4x2_t [[__RET_I]] to i8*
				// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP5]], i8* [[TMP6]], i64 16, i32 8, i1 false)
				float16x4x2_t test_vuzp_f16(float16x4_t a, float16x4_t b) {
				return vuzp_f16(a, b);
				}

				// CHECK-LABEL: test_vuzpq_f16
				// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16
				// CHECK: [[__RET_I:%.*]] = alloca %struct.float16x8x2_t, align 16
				// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET_I]] to i8*
				// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <8 x half>*
				// CHECK: [[VZIP0_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
				// CHECK: store <8 x half> [[VZIP0_I]], <8 x half>* [[TMP1]]
				// CHECK: [[TMP2:%.]] = getelementptr inbounds <8 x half>, <8 x half> [[TMP1]], i32 1
				// CHECK: [[VZIP1_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
				// CHECK: store <8 x half> [[VZIP1_I]], <8 x half>* [[TMP2]]
				// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*
				// CHECK: [[TMP6:%.]] = bitcast %struct.float16x8x2_t [[__RET_I]] to i8*
				// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP5]], i8* [[TMP6]], i64 32, i32 16, i1 false)
				float16x8x2_t test_vuzpq_f16(float16x8_t a, float16x8_t b) {
				return vuzpq_f16(a, b);
				}

				// CHECK-LABEL: test_vtrn_f16
				// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x4x2_t, align 8
				// CHECK: [[__RET_I:%.*]] = alloca %struct.float16x4x2_t, align 8
				// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET_I]] to i8*
				// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <4 x half>*
				// CHECK: [[VZIP0_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
				// CHECK: store <4 x half> [[VZIP0_I]], <4 x half>* [[TMP1]]
				// CHECK: [[TMP2:%.]] = getelementptr inbounds <4 x half>, <4 x half> [[TMP1]], i32 1
				// CHECK: [[VZIP1_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
				// CHECK: store <4 x half> [[VZIP1_I]], <4 x half>* [[TMP2]]
				// CHECK: [[TMP5:%.]] = bitcast %struct.float16x4x2_t [[RETVAL]] to i8*
				// CHECK: [[TMP6:%.]] = bitcast %struct.float16x4x2_t [[__RET_I]] to i8*
				// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP5]], i8* [[TMP6]], i64 16, i32 8, i1 false)
				float16x4x2_t test_vtrn_f16(float16x4_t a, float16x4_t b) {
				return vtrn_f16(a, b);
				}

				// CHECK-LABEL: test_vtrnq_f16
				// CHECK: [[RETVAL:%.*]] = alloca %struct.float16x8x2_t, align 16
				// CHECK: [[__RET_I:%.*]] = alloca %struct.float16x8x2_t, align 16
				// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET_I]] to i8*
				// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to <8 x half>*
				// CHECK: [[VZIP0_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
				// CHECK: store <8 x half> [[VZIP0_I]], <8 x half>* [[TMP1]]
				// CHECK: [[TMP2:%.]] = getelementptr inbounds <8 x half>, <8 x half> [[TMP1]], i32 1
				// CHECK: [[VZIP1_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
				// CHECK: store <8 x half> [[VZIP1_I]], <8 x half>* [[TMP2]]
				// CHECK: [[TMP5:%.]] = bitcast %struct.float16x8x2_t [[RETVAL]] to i8*
				// CHECK: [[TMP6:%.]] = bitcast %struct.float16x8x2_t [[__RET_I]] to i8*
				// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[TMP5]], i8* [[TMP6]], i64 32, i32 16, i1 false)
				float16x8x2_t test_vtrnq_f16(float16x8_t a, float16x8_t b) {
				return vtrnq_f16(a, b);
				}

				// CHECK-LABEL: test_vmov_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <4 x half> undef, half %a, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %a, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %a, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %a, i32 3
				// CHECK: ret <4 x half> [[TMP3]]
				float16x4_t test_vmov_n_f16(float16_t a) {
				return vmov_n_f16(a);
				}

				// CHECK-LABEL: test_vmovq_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <8 x half> undef, half %a, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %a, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %a, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %a, i32 3
				// CHECK: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %a, i32 4
				// CHECK: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %a, i32 5
				// CHECK: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %a, i32 6
				// CHECK: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %a, i32 7
				// CHECK: ret <8 x half> [[TMP7]]
				float16x8_t test_vmovq_n_f16(float16_t a) {
				return vmovq_n_f16(a);
				}

				// CHECK-LABEL: test_vdup_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <4 x half> undef, half %a, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %a, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %a, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %a, i32 3
				// CHECK: ret <4 x half> [[TMP3]]
				float16x4_t test_vdup_n_f16(float16_t a) {
				return vdup_n_f16(a);
				}

				// CHECK-LABEL: test_vdupq_n_f16
				// CHECK: [[TMP0:%.*]] = insertelement <8 x half> undef, half %a, i32 0
				// CHECK: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %a, i32 1
				// CHECK: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %a, i32 2
				// CHECK: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %a, i32 3
				// CHECK: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %a, i32 4
				// CHECK: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %a, i32 5
				// CHECK: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %a, i32 6
				// CHECK: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %a, i32 7
				// CHECK: ret <8 x half> [[TMP7]]
				float16x8_t test_vdupq_n_f16(float16_t a) {
				return vdupq_n_f16(a);
				}

				// CHECK-LABEL: test_vdup_lane_f16
				// CHECK: [[SHFL:%.*]] = shufflevector <4 x half> %a, <4 x half> %a, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
				// CHECK: ret <4 x half> [[SHFL]]
				float16x4_t test_vdup_lane_f16(float16x4_t a) {
				return vdup_lane_f16(a, 3);
				}

				// CHECK-LABEL: test_vdupq_lane_f16
				// CHECK: [[SHFL:%.*]] = shufflevector <4 x half> %a, <4 x half> %a, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: ret <8 x half> [[SHFL]]
				float16x8_t test_vdupq_lane_f16(float16x4_t a) {
				return vdupq_lane_f16(a, 7);
				}

				// CHECK-LABEL: @test_vext_f16(
				// CHECK: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
				// CHECK: [[VEXT:%.*]] = shufflevector <4 x half> [[TMP2]], <4 x half> [[TMP3]], <4 x i32> <i32 2, i32 3, i32 4, i32 5>
				// CHECK: ret <4 x half> [[VEXT]]
				float16x4_t test_vext_f16(float16x4_t a, float16x4_t b) {
				return vext_f16(a, b, 2);
				}

				// CHECK-LABEL: @test_vextq_f16(
				// CHECK: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
				// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
				// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
				// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
				// CHECK: [[VEXT:%.*]] = shufflevector <8 x half> [[TMP2]], <8 x half> [[TMP3]], <8 x i32> <i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12>
				// CHECK: ret <8 x half> [[VEXT]]
				float16x8_t test_vextq_f16(float16x8_t a, float16x8_t b) {
				return vextq_f16(a, b, 5);
				}

				// CHECK-LABEL: @test_vrev64_f16(
				// CHECK: [[SHFL:%.*]] = shufflevector <4 x half> %a, <4 x half> %a, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
				// CHECK: ret <4 x half> [[SHFL]]
				float16x4_t test_vrev64_f16(float16x4_t a) {
				return vrev64_f16(a);
				}

				// CHECK-LABEL: @test_vrev64q_f16(
				// CHECK: [[SHFL:%.*]] = shufflevector <8 x half> %a, <8 x half> %a, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
				// CHECK: ret <8 x half> [[SHFL]]
				float16x8_t test_vrev64q_f16(float16x8_t a) {
				return vrev64q_f16(a);
				}

				// CHECK-LABEL: @test_vzip1_f16(
				// CHECK: [[SHFL:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 0, i32 4, i32 1, i32 5>
				// CHECK: ret <4 x half> [[SHFL]]
				float16x4_t test_vzip1_f16(float16x4_t a, float16x4_t b) {
				return vzip1_f16(a, b);
				}

				// CHECK-LABEL: @test_vzip1q_f16(
				// CHECK: [[SHFL:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>
				// CHECK: ret <8 x half> [[SHFL]]
				float16x8_t test_vzip1q_f16(float16x8_t a, float16x8_t b) {
				return vzip1q_f16(a, b);
				}

				// CHECK-LABEL: @test_vzip2_f16(
				// CHECK: [[SHFL:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 2, i32 6, i32 3, i32 7>
				// CHECK: ret <4 x half> [[SHFL]]
				float16x4_t test_vzip2_f16(float16x4_t a, float16x4_t b) {
				return vzip2_f16(a, b);
				}

				// CHECK-LABEL: @test_vzip2q_f16(
				// CHECK: [[SHFL:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
				// CHECK: ret <8 x half> [[SHFL]]
				float16x8_t test_vzip2q_f16(float16x8_t a, float16x8_t b) {
				return vzip2q_f16(a, b);
				}

				// CHECK-LABEL: @test_vuzp1_f16(
				// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				// CHECK: ret <4 x half> [[SHUFFLE_I]]
				float16x4_t test_vuzp1_f16(float16x4_t a, float16x4_t b) {
				return vuzp1_f16(a, b);
				}

				// CHECK-LABEL: @test_vuzp1q_f16(
				// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
				// CHECK: ret <8 x half> [[SHUFFLE_I]]
				float16x8_t test_vuzp1q_f16(float16x8_t a, float16x8_t b) {
				return vuzp1q_f16(a, b);
				}

				// CHECK-LABEL: @test_vuzp2_f16(
				// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
				// CHECK: ret <4 x half> [[SHUFFLE_I]]
				float16x4_t test_vuzp2_f16(float16x4_t a, float16x4_t b) {
				return vuzp2_f16(a, b);
				}

				// CHECK-LABEL: @test_vuzp2q_f16(
				// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
				// CHECK: ret <8 x half> [[SHUFFLE_I]]
				float16x8_t test_vuzp2q_f16(float16x8_t a, float16x8_t b) {
				return vuzp2q_f16(a, b);
				}

				// CHECK-LABEL: @test_vtrn1_f16(
				// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
				// CHECK: ret <4 x half> [[SHUFFLE_I]]
				float16x4_t test_vtrn1_f16(float16x4_t a, float16x4_t b) {
				return vtrn1_f16(a, b);
				}

				// CHECK-LABEL: @test_vtrn1q_f16(
				// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14>
				// CHECK: ret <8 x half> [[SHUFFLE_I]]
				float16x8_t test_vtrn1q_f16(float16x8_t a, float16x8_t b) {
				return vtrn1q_f16(a, b);
				}

				// CHECK-LABEL: @test_vtrn2_f16(
				// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <4 x half> %a, <4 x half> %b, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
				// CHECK: ret <4 x half> [[SHUFFLE_I]]
				float16x4_t test_vtrn2_f16(float16x4_t a, float16x4_t b) {
				return vtrn2_f16(a, b);
				}

				// CHECK-LABEL: @test_vtrn2q_f16(
				// CHECK: [[SHUFFLE_I:%.*]] = shufflevector <8 x half> %a, <8 x half> %b, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
				// CHECK: ret <8 x half> [[SHUFFLE_I]]
				float16x8_t test_vtrn2q_f16(float16x8_t a, float16x8_t b) {
				return vtrn2q_f16(a, b);
				}

cfe/trunk/test/CodeGen/arm_neon_intrinsics.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,890 Lines • ▼ Show 20 Lines
	// CHECK: [[VLD1:%.]] = call <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8 [[TMP0]], i32 4)			// CHECK: [[VLD1:%.]] = call <2 x i64> @llvm.arm.neon.vld1.v2i64.p0i8(i8 [[TMP0]], i32 4)
	// CHECK: ret <2 x i64> [[VLD1]]			// CHECK: ret <2 x i64> [[VLD1]]
	int64x2_t test_vld1q_s64(int64_t const * a) {			int64x2_t test_vld1q_s64(int64_t const * a) {
	return vld1q_s64(a);			return vld1q_s64(a);
	}			}

	// CHECK-LABEL: @test_vld1q_f16(			// CHECK-LABEL: @test_vld1q_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD1:%.]] = call <8 x i16> @llvm.arm.neon.vld1.v8i16.p0i8(i8 [[TMP0]], i32 2)			// CHECK: [[VLD1:%.]] = call <8 x half> @llvm.arm.neon.vld1.v8f16.p0i8(i8 [[TMP0]], i32 2)
	// CHECK: [[TMP1:%.*]] = bitcast <8 x i16> [[VLD1]] to <8 x half>			// CHECK: ret <8 x half> [[VLD1]]
	// CHECK: ret <8 x half> [[TMP1]]
	float16x8_t test_vld1q_f16(float16_t const * a) {			float16x8_t test_vld1q_f16(float16_t const * a) {
	return vld1q_f16(a);			return vld1q_f16(a);
	}			}

	// CHECK-LABEL: @test_vld1q_f32(			// CHECK-LABEL: @test_vld1q_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[VLD1:%.]] = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8 [[TMP0]], i32 4)			// CHECK: [[VLD1:%.]] = call <4 x float> @llvm.arm.neon.vld1.v4f32.p0i8(i8 [[TMP0]], i32 4)
	// CHECK: ret <4 x float> [[VLD1]]			// CHECK: ret <4 x float> [[VLD1]]
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	// CHECK: [[VLD1:%.]] = call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8 [[TMP0]], i32 4)			// CHECK: [[VLD1:%.]] = call <1 x i64> @llvm.arm.neon.vld1.v1i64.p0i8(i8 [[TMP0]], i32 4)
	// CHECK: ret <1 x i64> [[VLD1]]			// CHECK: ret <1 x i64> [[VLD1]]
	int64x1_t test_vld1_s64(int64_t const * a) {			int64x1_t test_vld1_s64(int64_t const * a) {
	return vld1_s64(a);			return vld1_s64(a);
	}			}

	// CHECK-LABEL: @test_vld1_f16(			// CHECK-LABEL: @test_vld1_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD1:%.]] = call <4 x i16> @llvm.arm.neon.vld1.v4i16.p0i8(i8 [[TMP0]], i32 2)			// CHECK: [[VLD1:%.]] = call <4 x half> @llvm.arm.neon.vld1.v4f16.p0i8(i8 [[TMP0]], i32 2)
	// CHECK: [[TMP1:%.*]] = bitcast <4 x i16> [[VLD1]] to <4 x half>			// CHECK: ret <4 x half> [[VLD1]]
	// CHECK: ret <4 x half> [[TMP1]]
	float16x4_t test_vld1_f16(float16_t const * a) {			float16x4_t test_vld1_f16(float16_t const * a) {
	return vld1_f16(a);			return vld1_f16(a);
	}			}

	// CHECK-LABEL: @test_vld1_f32(			// CHECK-LABEL: @test_vld1_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[VLD1:%.]] = call <2 x float> @llvm.arm.neon.vld1.v2f32.p0i8(i8 [[TMP0]], i32 4)			// CHECK: [[VLD1:%.]] = call <2 x float> @llvm.arm.neon.vld1.v2f32.p0i8(i8 [[TMP0]], i32 4)
	// CHECK: ret <2 x float> [[VLD1]]			// CHECK: ret <2 x float> [[VLD1]]
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	// CHECK: [[LANE:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP3]], <2 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP3]], <2 x i32> zeroinitializer
	// CHECK: ret <2 x i64> [[LANE]]			// CHECK: ret <2 x i64> [[LANE]]
	int64x2_t test_vld1q_dup_s64(int64_t const * a) {			int64x2_t test_vld1q_dup_s64(int64_t const * a) {
	return vld1q_dup_s64(a);			return vld1q_dup_s64(a);
	}			}

	// CHECK-LABEL: @test_vld1q_dup_f16(			// CHECK-LABEL: @test_vld1q_dup_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]], align 2			// CHECK: [[TMP2:%.]] = load half, half [[TMP1]], align 2
	// CHECK: [[TMP3:%.*]] = insertelement <8 x i16> undef, i16 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <8 x half> undef, half [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP3]], <8 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <8 x half> [[TMP3]], <8 x half> [[TMP3]], <8 x i32> zeroinitializer
	// CHECK: [[TMP4:%.*]] = bitcast <8 x i16> [[LANE]] to <8 x half>			// CHECK: ret <8 x half> [[LANE]]
	// CHECK: ret <8 x half> [[TMP4]]
	float16x8_t test_vld1q_dup_f16(float16_t const * a) {			float16x8_t test_vld1q_dup_f16(float16_t const * a) {
	return vld1q_dup_f16(a);			return vld1q_dup_f16(a);
	}			}

	// CHECK-LABEL: @test_vld1q_dup_f32(			// CHECK-LABEL: @test_vld1q_dup_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*
	// CHECK: [[TMP2:%.]] = load float, float [[TMP1]], align 4			// CHECK: [[TMP2:%.]] = load float, float [[TMP1]], align 4
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <1 x i64> [[TMP3]], <1 x i64> [[TMP3]], <1 x i32> zeroinitializer
	// CHECK: ret <1 x i64> [[LANE]]			// CHECK: ret <1 x i64> [[LANE]]
	int64x1_t test_vld1_dup_s64(int64_t const * a) {			int64x1_t test_vld1_dup_s64(int64_t const * a) {
	return vld1_dup_s64(a);			return vld1_dup_s64(a);
	}			}

	// CHECK-LABEL: @test_vld1_dup_f16(			// CHECK-LABEL: @test_vld1_dup_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP2:%.]] = load i16, i16 [[TMP1]], align 2			// CHECK: [[TMP2:%.]] = load half, half [[TMP1]], align 2
	// CHECK: [[TMP3:%.*]] = insertelement <4 x i16> undef, i16 [[TMP2]], i32 0			// CHECK: [[TMP3:%.*]] = insertelement <4 x half> undef, half [[TMP2]], i32 0
	// CHECK: [[LANE:%.*]] = shufflevector <4 x i16> [[TMP3]], <4 x i16> [[TMP3]], <4 x i32> zeroinitializer			// CHECK: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> zeroinitializer
	// CHECK: [[TMP4:%.*]] = bitcast <4 x i16> [[LANE]] to <4 x half>			// CHECK: ret <4 x half> [[LANE]]
	// CHECK: ret <4 x half> [[TMP4]]
	float16x4_t test_vld1_dup_f16(float16_t const * a) {			float16x4_t test_vld1_dup_f16(float16_t const * a) {
	return vld1_dup_f16(a);			return vld1_dup_f16(a);
	}			}

	// CHECK-LABEL: @test_vld1_dup_f32(			// CHECK-LABEL: @test_vld1_dup_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*			// CHECK: [[TMP1:%.]] = bitcast i8 [[TMP0]] to float*
	// CHECK: [[TMP2:%.]] = load float, float [[TMP1]], align 4			// CHECK: [[TMP2:%.]] = load float, float [[TMP1]], align 4
	▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	// CHECK: ret <2 x i64> [[VLD1Q_LANE]]			// CHECK: ret <2 x i64> [[VLD1Q_LANE]]
	int64x2_t test_vld1q_lane_s64(int64_t const * a, int64x2_t b) {			int64x2_t test_vld1q_lane_s64(int64_t const * a, int64x2_t b) {
	return vld1q_lane_s64(a, b, 1);			return vld1q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: @test_vld1q_lane_f16(			// CHECK-LABEL: @test_vld1q_lane_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]], align 2			// CHECK: [[TMP4:%.]] = load half, half [[TMP3]], align 2
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x i16> [[TMP2]], i16 [[TMP4]], i32 7			// CHECK: [[VLD1_LANE:%.*]] = insertelement <8 x half> [[TMP2]], half [[TMP4]], i32 7
	// CHECK: [[TMP5:%.*]] = bitcast <8 x i16> [[VLD1_LANE]] to <8 x half>			// CHECK: ret <8 x half> [[VLD1_LANE]]
	// CHECK: ret <8 x half> [[TMP5]]
	float16x8_t test_vld1q_lane_f16(float16_t const * a, float16x8_t b) {			float16x8_t test_vld1q_lane_f16(float16_t const * a, float16x8_t b) {
	return vld1q_lane_f16(a, b, 7);			return vld1q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vld1q_lane_f32(			// CHECK-LABEL: @test_vld1q_lane_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>			// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x float>
	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	// CHECK: ret <1 x i64> [[VLD1_LANE]]			// CHECK: ret <1 x i64> [[VLD1_LANE]]
	int64x1_t test_vld1_lane_s64(int64_t const * a, int64x1_t b) {			int64x1_t test_vld1_lane_s64(int64_t const * a, int64x1_t b) {
	return vld1_lane_s64(a, b, 0);			return vld1_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: @test_vld1_lane_f16(			// CHECK-LABEL: @test_vld1_lane_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP3:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: [[TMP4:%.]] = load i16, i16 [[TMP3]], align 2			// CHECK: [[TMP4:%.]] = load half, half [[TMP3]], align 2
	// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x i16> [[TMP2]], i16 [[TMP4]], i32 3			// CHECK: [[VLD1_LANE:%.*]] = insertelement <4 x half> [[TMP2]], half [[TMP4]], i32 3
	// CHECK: [[TMP5:%.*]] = bitcast <4 x i16> [[VLD1_LANE]] to <4 x half>			// CHECK: ret <4 x half> [[VLD1_LANE]]
	// CHECK: ret <4 x half> [[TMP5]]
	float16x4_t test_vld1_lane_f16(float16_t const * a, float16x4_t b) {			float16x4_t test_vld1_lane_f16(float16_t const * a, float16x4_t b) {
	return vld1_lane_f16(a, b, 3);			return vld1_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vld1_lane_f32(			// CHECK-LABEL: @test_vld1_lane_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x float>
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	int32x4x2_t test_vld2q_s32(int32_t const * a) {			int32x4x2_t test_vld2q_s32(int32_t const * a) {
	return vld2q_s32(a);			return vld2q_s32(a);
	}			}

	// CHECK-LABEL: @test_vld2q_f16(			// CHECK-LABEL: @test_vld2q_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x2_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD2Q_V:%.*]] = call { <8 x i16>, <8 x i16>			// CHECK: [[VLD2Q_V:%.*]] = call { <8 x half>, <8 x half>
	float16x8x2_t test_vld2q_f16(float16_t const * a) {			float16x8x2_t test_vld2q_f16(float16_t const * a) {
	return vld2q_f16(a);			return vld2q_f16(a);
	}			}

	// CHECK-LABEL: @test_vld2q_f32(			// CHECK-LABEL: @test_vld2q_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	int64x1x2_t test_vld2_s64(int64_t const * a) {			int64x1x2_t test_vld2_s64(int64_t const * a) {
	return vld2_s64(a);			return vld2_s64(a);
	}			}

	// CHECK-LABEL: @test_vld2_f16(			// CHECK-LABEL: @test_vld2_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD2_V:%.*]] = call { <4 x i16>, <4 x i16>			// CHECK: [[VLD2_V:%.*]] = call { <4 x half>, <4 x half>
	float16x4x2_t test_vld2_f16(float16_t const * a) {			float16x4x2_t test_vld2_f16(float16_t const * a) {
	return vld2_f16(a);			return vld2_f16(a);
	}			}

	// CHECK-LABEL: @test_vld2_f32(			// CHECK-LABEL: @test_vld2_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	int64x1x2_t test_vld2_dup_s64(int64_t const * a) {			int64x1x2_t test_vld2_dup_s64(int64_t const * a) {
	return vld2_dup_s64(a);			return vld2_dup_s64(a);
	}			}

	// CHECK-LABEL: @test_vld2_dup_f16(			// CHECK-LABEL: @test_vld2_dup_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x2_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD_DUP:%.*]] = call { <4 x i16>, <4 x i16>			// CHECK: [[VLD_DUP:%.*]] = call { <4 x half>, <4 x half>
	float16x4x2_t test_vld2_dup_f16(float16_t const * a) {			float16x4x2_t test_vld2_dup_f16(float16_t const * a) {
	return vld2_dup_f16(a);			return vld2_dup_f16(a);
	}			}

	// CHECK-LABEL: @test_vld2_dup_f32(			// CHECK-LABEL: @test_vld2_dup_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x2_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16			// CHECK: [[TMP5:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
	// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <8 x half> [[TMP5]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>			// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
	// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
	// CHECK: [[VLD2Q_LANE_V:%.*]] = call { <8 x i16>, <8 x i16>			// CHECK: [[VLD2Q_LANE_V:%.*]] = call { <8 x half>, <8 x half>
	float16x8x2_t test_vld2q_lane_f16(float16_t const * a, float16x8x2_t b) {			float16x8x2_t test_vld2q_lane_f16(float16_t const * a, float16x8x2_t b) {
	return vld2q_lane_f16(a, b, 7);			return vld2q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vld2q_lane_f32(			// CHECK-LABEL: @test_vld2q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x2_t, align 16
	▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8			// CHECK: [[TMP5:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
	// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>			// CHECK: [[TMP6:%.*]] = bitcast <4 x half> [[TMP5]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>			// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
	// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
	// CHECK: [[VLD2_LANE_V:%.*]] = call { <4 x i16>, <4 x i16>			// CHECK: [[VLD2_LANE_V:%.*]] = call { <4 x half>, <4 x half>
	float16x4x2_t test_vld2_lane_f16(float16_t const * a, float16x4x2_t b) {			float16x4x2_t test_vld2_lane_f16(float16_t const * a, float16x4x2_t b) {
	return vld2_lane_f16(a, b, 3);			return vld2_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vld2_lane_f32(			// CHECK-LABEL: @test_vld2_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x2_t, align 8
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	int32x4x3_t test_vld3q_s32(int32_t const * a) {			int32x4x3_t test_vld3q_s32(int32_t const * a) {
	return vld3q_s32(a);			return vld3q_s32(a);
	}			}

	// CHECK-LABEL: @test_vld3q_f16(			// CHECK-LABEL: @test_vld3q_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x3_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD3Q_V:%.*]] = call { <8 x i16>, <8 x i16>, <8 x i16>			// CHECK: [[VLD3Q_V:%.*]] = call { <8 x half>, <8 x half>, <8 x half>
	float16x8x3_t test_vld3q_f16(float16_t const * a) {			float16x8x3_t test_vld3q_f16(float16_t const * a) {
	return vld3q_f16(a);			return vld3q_f16(a);
	}			}

	// CHECK-LABEL: @test_vld3q_f32(			// CHECK-LABEL: @test_vld3q_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	int64x1x3_t test_vld3_s64(int64_t const * a) {			int64x1x3_t test_vld3_s64(int64_t const * a) {
	return vld3_s64(a);			return vld3_s64(a);
	}			}

	// CHECK-LABEL: @test_vld3_f16(			// CHECK-LABEL: @test_vld3_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD3_V:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD3_V:%.*]] = call { <4 x half>, <4 x half>, <4 x half>
	float16x4x3_t test_vld3_f16(float16_t const * a) {			float16x4x3_t test_vld3_f16(float16_t const * a) {
	return vld3_f16(a);			return vld3_f16(a);
	}			}

	// CHECK-LABEL: @test_vld3_f32(			// CHECK-LABEL: @test_vld3_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	int64x1x3_t test_vld3_dup_s64(int64_t const * a) {			int64x1x3_t test_vld3_dup_s64(int64_t const * a) {
	return vld3_dup_s64(a);			return vld3_dup_s64(a);
	}			}

	// CHECK-LABEL: @test_vld3_dup_f16(			// CHECK-LABEL: @test_vld3_dup_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x3_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD_DUP:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD_DUP:%.*]] = call { <4 x half>, <4 x half>, <4 x half>
	float16x4x3_t test_vld3_dup_f16(float16_t const * a) {			float16x4x3_t test_vld3_dup_f16(float16_t const * a) {
	return vld3_dup_f16(a);			return vld3_dup_f16(a);
	}			}

	// CHECK-LABEL: @test_vld3_dup_f32(			// CHECK-LABEL: @test_vld3_dup_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x3_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP7:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>			// CHECK: [[TMP8:%.*]] = bitcast <8 x half> [[TMP7]] to <16 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>			// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x half>
	// CHECK: [[VLD3Q_LANE_V:%.*]] = call { <8 x i16>, <8 x i16>, <8 x i16>			// CHECK: [[VLD3Q_LANE_V:%.*]] = call { <8 x half>, <8 x half>, <8 x half>
	float16x8x3_t test_vld3q_lane_f16(float16_t const * a, float16x8x3_t b) {			float16x8x3_t test_vld3q_lane_f16(float16_t const * a, float16x8x3_t b) {
	return vld3q_lane_f16(a, b, 7);			return vld3q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vld3q_lane_f32(			// CHECK-LABEL: @test_vld3q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x3_t, align 16
	▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP7:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>			// CHECK: [[TMP8:%.*]] = bitcast <4 x half> [[TMP7]] to <8 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>			// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x half>
	// CHECK: [[VLD3_LANE_V:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD3_LANE_V:%.*]] = call { <4 x half>, <4 x half>, <4 x half>
	float16x4x3_t test_vld3_lane_f16(float16_t const * a, float16x4x3_t b) {			float16x4x3_t test_vld3_lane_f16(float16_t const * a, float16x4x3_t b) {
	return vld3_lane_f16(a, b, 3);			return vld3_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vld3_lane_f32(			// CHECK-LABEL: @test_vld3_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x3_t, align 8
	▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	int32x4x4_t test_vld4q_s32(int32_t const * a) {			int32x4x4_t test_vld4q_s32(int32_t const * a) {
	return vld4q_s32(a);			return vld4q_s32(a);
	}			}

	// CHECK-LABEL: @test_vld4q_f16(			// CHECK-LABEL: @test_vld4q_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float16x8x4_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x8x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD4Q_V:%.*]] = call { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>			// CHECK: [[VLD4Q_V:%.*]] = call { <8 x half>, <8 x half>, <8 x half>, <8 x half>
	float16x8x4_t test_vld4q_f16(float16_t const * a) {			float16x8x4_t test_vld4q_f16(float16_t const * a) {
	return vld4q_f16(a);			return vld4q_f16(a);
	}			}

	// CHECK-LABEL: @test_vld4q_f32(			// CHECK-LABEL: @test_vld4q_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x4x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	int64x1x4_t test_vld4_s64(int64_t const * a) {			int64x1x4_t test_vld4_s64(int64_t const * a) {
	return vld4_s64(a);			return vld4_s64(a);
	}			}

	// CHECK-LABEL: @test_vld4_f16(			// CHECK-LABEL: @test_vld4_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD4_V:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD4_V:%.*]] = call { <4 x half>, <4 x half>, <4 x half>, <4 x half>
	float16x4x4_t test_vld4_f16(float16_t const * a) {			float16x4x4_t test_vld4_f16(float16_t const * a) {
	return vld4_f16(a);			return vld4_f16(a);
	}			}

	// CHECK-LABEL: @test_vld4_f32(			// CHECK-LABEL: @test_vld4_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	int64x1x4_t test_vld4_dup_s64(int64_t const * a) {			int64x1x4_t test_vld4_dup_s64(int64_t const * a) {
	return vld4_dup_s64(a);			return vld4_dup_s64(a);
	}			}

	// CHECK-LABEL: @test_vld4_dup_f16(			// CHECK-LABEL: @test_vld4_dup_f16(
	// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float16x4x4_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float16x4x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast half %a to i8*			// CHECK: [[TMP1:%.]] = bitcast half %a to i8*
	// CHECK: [[VLD_DUP:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD_DUP:%.*]] = call { <4 x half>, <4 x half>, <4 x half>, <4 x half>
	float16x4x4_t test_vld4_dup_f16(float16_t const * a) {			float16x4x4_t test_vld4_dup_f16(float16_t const * a) {
	return vld4_dup_f16(a);			return vld4_dup_f16(a);
	}			}

	// CHECK-LABEL: @test_vld4_dup_f32(			// CHECK-LABEL: @test_vld4_dup_f32(
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__RET]] to i8*			// CHECK: [[TMP0:%.]] = bitcast %struct.float32x2x4_t [[__RET]] to i8*
	// CHECK: [[TMP1:%.]] = bitcast float %a to i8*			// CHECK: [[TMP1:%.]] = bitcast float %a to i8*
	▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP9:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>			// CHECK: [[TMP10:%.*]] = bitcast <8 x half> [[TMP9]] to <16 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP11:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16			// CHECK: [[TMP11:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
	// CHECK: [[TMP12:%.*]] = bitcast <8 x half> [[TMP11]] to <16 x i8>			// CHECK: [[TMP12:%.*]] = bitcast <8 x half> [[TMP11]] to <16 x i8>
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP6]] to <8 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP8]] to <8 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP10]] to <8 x half>
	// CHECK: [[TMP16:%.*]] = bitcast <16 x i8> [[TMP12]] to <8 x i16>			// CHECK: [[TMP16:%.*]] = bitcast <16 x i8> [[TMP12]] to <8 x half>
	// CHECK: [[VLD4Q_LANE_V:%.*]] = call { <8 x i16>, <8 x i16>, <8 x i16>, <8 x i16>			// CHECK: [[VLD4Q_LANE_V:%.*]] = call { <8 x half>, <8 x half>, <8 x half>, <8 x half>
	float16x8x4_t test_vld4q_lane_f16(float16_t const * a, float16x8x4_t b) {			float16x8x4_t test_vld4q_lane_f16(float16_t const * a, float16x8x4_t b) {
	return vld4q_lane_f16(a, b, 7);			return vld4q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vld4q_lane_f32(			// CHECK-LABEL: @test_vld4q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__RET:%.*]] = alloca %struct.float32x4x4_t, align 16
	▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP9:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>			// CHECK: [[TMP10:%.*]] = bitcast <4 x half> [[TMP9]] to <8 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP11:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8			// CHECK: [[TMP11:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
	// CHECK: [[TMP12:%.*]] = bitcast <4 x half> [[TMP11]] to <8 x i8>			// CHECK: [[TMP12:%.*]] = bitcast <4 x half> [[TMP11]] to <8 x i8>
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP6]] to <4 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP8]] to <4 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP10]] to <4 x half>
	// CHECK: [[TMP16:%.*]] = bitcast <8 x i8> [[TMP12]] to <4 x i16>			// CHECK: [[TMP16:%.*]] = bitcast <8 x i8> [[TMP12]] to <4 x half>
	// CHECK: [[VLD4_LANE_V:%.*]] = call { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16>			// CHECK: [[VLD4_LANE_V:%.*]] = call { <4 x half>, <4 x half>, <4 x half>, <4 x half>
	float16x4x4_t test_vld4_lane_f16(float16_t const * a, float16x4x4_t b) {			float16x4x4_t test_vld4_lane_f16(float16_t const * a, float16x4x4_t b) {
	return vld4_lane_f16(a, b, 3);			return vld4_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vld4_lane_f32(			// CHECK-LABEL: @test_vld4_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__RET:%.*]] = alloca %struct.float32x2x4_t, align 8
	▲ Show 20 Lines • Show All 8,874 Lines • ▼ Show 20 Lines
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_s64(int64_t * a, int64x2_t b) {			void test_vst1q_s64(int64_t * a, int64x2_t b) {
	vst1q_s64(a, b);			vst1q_s64(a, b);
	}			}

	// CHECK-LABEL: @test_vst1q_f16(			// CHECK-LABEL: @test_vst1q_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst1.p0i8.v8i16(i8* [[TMP0]], <8 x i16> [[TMP2]], i32 2)			// CHECK: call void @llvm.arm.neon.vst1.p0i8.v8f16(i8* [[TMP0]], <8 x half> [[TMP2]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_f16(float16_t * a, float16x8_t b) {			void test_vst1q_f16(float16_t * a, float16x8_t b) {
	vst1q_f16(a, b);			vst1q_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst1q_f32(			// CHECK-LABEL: @test_vst1q_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
	▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_s64(int64_t * a, int64x1_t b) {			void test_vst1_s64(int64_t * a, int64x1_t b) {
	vst1_s64(a, b);			vst1_s64(a, b);
	}			}

	// CHECK-LABEL: @test_vst1_f16(			// CHECK-LABEL: @test_vst1_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst1.p0i8.v4i16(i8* [[TMP0]], <4 x i16> [[TMP2]], i32 2)			// CHECK: call void @llvm.arm.neon.vst1.p0i8.v4f16(i8* [[TMP0]], <4 x half> [[TMP2]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_f16(float16_t * a, float16x4_t b) {			void test_vst1_f16(float16_t * a, float16x4_t b) {
	vst1_f16(a, b);			vst1_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst1_f32(			// CHECK-LABEL: @test_vst1_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_lane_s64(int64_t * a, int64x2_t b) {			void test_vst1q_lane_s64(int64_t * a, int64x2_t b) {
	vst1q_lane_s64(a, b, 1);			vst1q_lane_s64(a, b, 1);
	}			}

	// CHECK-LABEL: @test_vst1q_lane_f16(			// CHECK-LABEL: @test_vst1q_lane_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// CHECK: [[TMP3:%.*]] = extractelement <8 x i16> [[TMP2]], i32 7			// CHECK: [[TMP3:%.*]] = extractelement <8 x half> [[TMP2]], i32 7
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: store i16 [[TMP3]], i16* [[TMP4]], align 2			// CHECK: store half [[TMP3]], half* [[TMP4]], align 2
	// CHECK: ret void			// CHECK: ret void
	void test_vst1q_lane_f16(float16_t * a, float16x8_t b) {			void test_vst1q_lane_f16(float16_t * a, float16x8_t b) {
	vst1q_lane_f16(a, b, 7);			vst1q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vst1q_lane_f32(			// CHECK-LABEL: @test_vst1q_lane_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x float> %b to <16 x i8>
	▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_s64(int64_t * a, int64x1_t b) {			void test_vst1_lane_s64(int64_t * a, int64x1_t b) {
	vst1_lane_s64(a, b, 0);			vst1_lane_s64(a, b, 0);
	}			}

	// CHECK-LABEL: @test_vst1_lane_f16(			// CHECK-LABEL: @test_vst1_lane_f16(
	// CHECK: [[TMP0:%.]] = bitcast half %a to i8*			// CHECK: [[TMP0:%.]] = bitcast half %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>			// CHECK: [[TMP2:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// CHECK: [[TMP3:%.*]] = extractelement <4 x i16> [[TMP2]], i32 3			// CHECK: [[TMP3:%.*]] = extractelement <4 x half> [[TMP2]], i32 3
	// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to i16*			// CHECK: [[TMP4:%.]] = bitcast i8 [[TMP0]] to half*
	// CHECK: store i16 [[TMP3]], i16* [[TMP4]], align 2			// CHECK: store half [[TMP3]], half* [[TMP4]], align 2
	// CHECK: ret void			// CHECK: ret void
	void test_vst1_lane_f16(float16_t * a, float16x4_t b) {			void test_vst1_lane_f16(float16_t * a, float16x4_t b) {
	vst1_lane_f16(a, b, 3);			vst1_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vst1_lane_f32(			// CHECK-LABEL: @test_vst1_lane_f32(
	// CHECK: [[TMP0:%.]] = bitcast float %a to i8*			// CHECK: [[TMP0:%.]] = bitcast float %a to i8*
	// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>			// CHECK: [[TMP1:%.*]] = bitcast <2 x float> %b to <8 x i8>
	▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16			// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
	// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>			// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst2.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP8]], <8 x i16> [[TMP9]], i32 2)			// CHECK: call void @llvm.arm.neon.vst2.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP8]], <8 x half> [[TMP9]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_f16(float16_t * a, float16x8x2_t b) {			void test_vst2q_f16(float16_t * a, float16x8x2_t b) {
	vst2q_f16(a, b);			vst2q_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst2q_f32(			// CHECK-LABEL: @test_vst2q_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
	▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8			// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
	// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>			// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst2.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP8]], <4 x i16> [[TMP9]], i32 2)			// CHECK: call void @llvm.arm.neon.vst2.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP8]], <4 x half> [[TMP9]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_f16(float16_t * a, float16x4x2_t b) {			void test_vst2_f16(float16_t * a, float16x4x2_t b) {
	vst2_f16(a, b);			vst2_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst2_f32(			// CHECK-LABEL: @test_vst2_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
	▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16			// CHECK: [[TMP4:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX]], align 16
	// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>			// CHECK: [[TMP5:%.*]] = bitcast <8 x half> [[TMP4]] to <16 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x2_t, %struct.float16x8x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <8 x half>], [2 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
	// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst2lane.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP8]], <8 x i16> [[TMP9]], i32 7, i32 2)			// CHECK: call void @llvm.arm.neon.vst2lane.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP8]], <8 x half> [[TMP9]], i32 7, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2q_lane_f16(float16_t * a, float16x8x2_t b) {			void test_vst2q_lane_f16(float16_t * a, float16x8x2_t b) {
	vst2q_lane_f16(a, b, 7);			vst2q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vst2q_lane_f32(			// CHECK-LABEL: @test_vst2q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x2_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x2_t, align 16
	▲ Show 20 Lines • Show All 205 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0			// CHECK: [[ARRAYIDX:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL]], i32 0, i32 0
	// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8			// CHECK: [[TMP4:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX]], align 8
	// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>			// CHECK: [[TMP5:%.*]] = bitcast <4 x half> [[TMP4]] to <8 x i8>
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x2_t, %struct.float16x4x2_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [2 x <4 x half>], [2 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
	// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP8:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP9:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst2lane.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP8]], <4 x i16> [[TMP9]], i32 3, i32 2)			// CHECK: call void @llvm.arm.neon.vst2lane.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP8]], <4 x half> [[TMP9]], i32 3, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst2_lane_f16(float16_t * a, float16x4x2_t b) {			void test_vst2_lane_f16(float16_t * a, float16x4x2_t b) {
	vst2_lane_f16(a, b, 3);			vst2_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vst2_lane_f32(			// CHECK-LABEL: @test_vst2_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x2_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x2_t, align 8
	▲ Show 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst3.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], <8 x i16> [[TMP12]], i32 2)			// CHECK: call void @llvm.arm.neon.vst3.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP10]], <8 x half> [[TMP11]], <8 x half> [[TMP12]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_f16(float16_t * a, float16x8x3_t b) {			void test_vst3q_f16(float16_t * a, float16x8x3_t b) {
	vst3q_f16(a, b);			vst3q_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst3q_f32(			// CHECK-LABEL: @test_vst3q_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
	▲ Show 20 Lines • Show All 331 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst3.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], <4 x i16> [[TMP12]], i32 2)			// CHECK: call void @llvm.arm.neon.vst3.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP10]], <4 x half> [[TMP11]], <4 x half> [[TMP12]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_f16(float16_t * a, float16x4x3_t b) {			void test_vst3_f16(float16_t * a, float16x4x3_t b) {
	vst3_f16(a, b);			vst3_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst3_f32(			// CHECK-LABEL: @test_vst3_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
	▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16			// CHECK: [[TMP6:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX2]], align 16
	// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <8 x half> [[TMP6]] to <16 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x3_t, %struct.float16x8x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <8 x half>], [3 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
	// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst3lane.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP10]], <8 x i16> [[TMP11]], <8 x i16> [[TMP12]], i32 7, i32 2)			// CHECK: call void @llvm.arm.neon.vst3lane.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP10]], <8 x half> [[TMP11]], <8 x half> [[TMP12]], i32 7, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3q_lane_f16(float16_t * a, float16x8x3_t b) {			void test_vst3q_lane_f16(float16_t * a, float16x8x3_t b) {
	vst3q_lane_f16(a, b, 7);			vst3q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vst3q_lane_f32(			// CHECK-LABEL: @test_vst3q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x3_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x3_t, align 16
	▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL1:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1			// CHECK: [[ARRAYIDX2:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL1]], i32 0, i32 1
	// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8			// CHECK: [[TMP6:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX2]], align 8
	// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>			// CHECK: [[TMP7:%.*]] = bitcast <4 x half> [[TMP6]] to <8 x i8>
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x3_t, %struct.float16x4x3_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [3 x <4 x half>], [3 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
	// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP10:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP11:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst3lane.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP10]], <4 x i16> [[TMP11]], <4 x i16> [[TMP12]], i32 3, i32 2)			// CHECK: call void @llvm.arm.neon.vst3lane.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP10]], <4 x half> [[TMP11]], <4 x half> [[TMP12]], i32 3, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst3_lane_f16(float16_t * a, float16x4x3_t b) {			void test_vst3_lane_f16(float16_t * a, float16x4x3_t b) {
	vst3_lane_f16(a, b, 3);			vst3_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vst3_lane_f32(			// CHECK-LABEL: @test_vst3_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x3_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x3_t, align 8
	▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16			// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
	// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>			// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst4.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], <8 x i16> [[TMP15]], i32 2)			// CHECK: call void @llvm.arm.neon.vst4.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], <8 x half> [[TMP15]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_f16(float16_t * a, float16x8x4_t b) {			void test_vst4q_f16(float16_t * a, float16x8x4_t b) {
	vst4q_f16(a, b);			vst4q_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst4q_f32(			// CHECK-LABEL: @test_vst4q_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
	▲ Show 20 Lines • Show All 384 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8			// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
	// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>			// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst4.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], <4 x i16> [[TMP15]], i32 2)			// CHECK: call void @llvm.arm.neon.vst4.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], <4 x half> [[TMP15]], i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_f16(float16_t * a, float16x4x4_t b) {			void test_vst4_f16(float16_t * a, float16x4x4_t b) {
	vst4_f16(a, b);			vst4_f16(a, b);
	}			}

	// CHECK-LABEL: @test_vst4_f32(			// CHECK-LABEL: @test_vst4_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
	▲ Show 20 Lines • Show All 258 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16			// CHECK: [[TMP8:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX4]], align 16
	// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <8 x half> [[TMP8]] to <16 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x8x4_t, %struct.float16x8x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <8 x half>], [4 x <8 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16			// CHECK: [[TMP10:%.]] = load <8 x half>, <8 x half> [[ARRAYIDX6]], align 16
	// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>			// CHECK: [[TMP11:%.*]] = bitcast <8 x half> [[TMP10]] to <16 x i8>
	// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <16 x i8> [[TMP5]] to <8 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <16 x i8> [[TMP7]] to <8 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <16 x i8> [[TMP9]] to <8 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <16 x i8> [[TMP11]] to <8 x half>
	// CHECK: call void @llvm.arm.neon.vst4lane.p0i8.v8i16(i8* [[TMP3]], <8 x i16> [[TMP12]], <8 x i16> [[TMP13]], <8 x i16> [[TMP14]], <8 x i16> [[TMP15]], i32 7, i32 2)			// CHECK: call void @llvm.arm.neon.vst4lane.p0i8.v8f16(i8* [[TMP3]], <8 x half> [[TMP12]], <8 x half> [[TMP13]], <8 x half> [[TMP14]], <8 x half> [[TMP15]], i32 7, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4q_lane_f16(float16_t * a, float16x8x4_t b) {			void test_vst4q_lane_f16(float16_t * a, float16x8x4_t b) {
	vst4q_lane_f16(a, b, 7);			vst4q_lane_f16(a, b, 7);
	}			}

	// CHECK-LABEL: @test_vst4q_lane_f32(			// CHECK-LABEL: @test_vst4q_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[B:%.*]] = alloca %struct.float32x4x4_t, align 16
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16			// CHECK: [[__S1:%.*]] = alloca %struct.float32x4x4_t, align 16
	▲ Show 20 Lines • Show All 285 Lines • ▼ Show 20 Lines
	// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL3:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2			// CHECK: [[ARRAYIDX4:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL3]], i32 0, i32 2
	// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8			// CHECK: [[TMP8:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX4]], align 8
	// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>			// CHECK: [[TMP9:%.*]] = bitcast <4 x half> [[TMP8]] to <8 x i8>
	// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0			// CHECK: [[VAL5:%.]] = getelementptr inbounds %struct.float16x4x4_t, %struct.float16x4x4_t [[__S1]], i32 0, i32 0
	// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3			// CHECK: [[ARRAYIDX6:%.]] = getelementptr inbounds [4 x <4 x half>], [4 x <4 x half>] [[VAL5]], i32 0, i32 3
	// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8			// CHECK: [[TMP10:%.]] = load <4 x half>, <4 x half> [[ARRAYIDX6]], align 8
	// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>			// CHECK: [[TMP11:%.*]] = bitcast <4 x half> [[TMP10]] to <8 x i8>
	// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x i16>			// CHECK: [[TMP12:%.*]] = bitcast <8 x i8> [[TMP5]] to <4 x half>
	// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x i16>			// CHECK: [[TMP13:%.*]] = bitcast <8 x i8> [[TMP7]] to <4 x half>
	// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x i16>			// CHECK: [[TMP14:%.*]] = bitcast <8 x i8> [[TMP9]] to <4 x half>
	// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x i16>			// CHECK: [[TMP15:%.*]] = bitcast <8 x i8> [[TMP11]] to <4 x half>
	// CHECK: call void @llvm.arm.neon.vst4lane.p0i8.v4i16(i8* [[TMP3]], <4 x i16> [[TMP12]], <4 x i16> [[TMP13]], <4 x i16> [[TMP14]], <4 x i16> [[TMP15]], i32 3, i32 2)			// CHECK: call void @llvm.arm.neon.vst4lane.p0i8.v4f16(i8* [[TMP3]], <4 x half> [[TMP12]], <4 x half> [[TMP13]], <4 x half> [[TMP14]], <4 x half> [[TMP15]], i32 3, i32 2)
	// CHECK: ret void			// CHECK: ret void
	void test_vst4_lane_f16(float16_t * a, float16x4x4_t b) {			void test_vst4_lane_f16(float16_t * a, float16x4x4_t b) {
	vst4_lane_f16(a, b, 3);			vst4_lane_f16(a, b, 3);
	}			}

	// CHECK-LABEL: @test_vst4_lane_f32(			// CHECK-LABEL: @test_vst4_lane_f32(
	// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[B:%.*]] = alloca %struct.float32x2x4_t, align 8
	// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8			// CHECK: [[__S1:%.*]] = alloca %struct.float32x2x4_t, align 8
	▲ Show 20 Lines • Show All 2,083 Lines • Show Last 20 Lines

cfe/trunk/utils/TableGen/NeonEmitter.cpp

Show First 20 Lines • Show All 854 Lines • ▼ Show 20 Lines	void Type::applyModifier(char Mod) {
case 'f':		case 'f':
Float = true;		Float = true;
ElementBitwidth = 32;		ElementBitwidth = 32;
break;		break;
case 'F':		case 'F':
Float = true;		Float = true;
ElementBitwidth = 64;		ElementBitwidth = 64;
break;		break;
		case 'H':
		Float = true;
		ElementBitwidth = 16;
		break;
case 'g':		case 'g':
if (AppliedQuad)		if (AppliedQuad)
Bitwidth /= 2;		Bitwidth /= 2;
break;		break;
case 'j':		case 'j':
if (!AppliedQuad)		if (!AppliedQuad)
Bitwidth *= 2;		Bitwidth *= 2;
break;		break;
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	if (typeCode != '\0')
S.push_back(typeCode);		S.push_back(typeCode);
if (printNumber)		if (printNumber)
S += utostr(T.getElementSizeInBits());		S += utostr(T.getElementSizeInBits());

return S;		return S;
}		}

static bool isFloatingPointProtoModifier(char Mod) {		static bool isFloatingPointProtoModifier(char Mod) {
return Mod == 'F' \|\| Mod == 'f';		return Mod == 'F' \|\| Mod == 'f' \|\| Mod == 'H';
}		}

std::string Intrinsic::getBuiltinTypeStr() {		std::string Intrinsic::getBuiltinTypeStr() {
ClassKind LocalCK = getClassKind(true);		ClassKind LocalCK = getClassKind(true);
std::string S;		std::string S;

Type RetT = getReturnType();		Type RetT = getReturnType();
if ((LocalCK == ClassI \|\| LocalCK == ClassW) && RetT.isScalar() &&		if ((LocalCK == ClassI \|\| LocalCK == ClassW) && RetT.isScalar() &&
▲ Show 20 Lines • Show All 1,416 Lines • Show Last 20 Lines