This is an archive of the discontinued LLVM Phabricator instance.

Differential D1752

Implement aarch64 neon instruction class AdvSIMD (by element) - Clang
ClosedPublic

Authored by Jiangning on Sep 25 2013, 2:01 AM.

Download Raw Diff

Details

Reviewers

t.p.northover

Diff Detail

Event Timeline

Jiangning updated this revision to Unknown Object (????).Sep 27 2013, 2:50 AM

Hi Jiangning,

Mostly looks reasonable. I've just got one question.

utils/TableGen/NeonEmitter.cpp
1571–1572	Why are these distinct? As far as I can see they are treated in exactly the same way by TableGen. Or am I missing something not shown by a simple grep; some endsWith('Q') weirdness or something?

Jiangning updated this revision to Unknown Object (????).Sep 29 2013, 8:25 PM

Hi Tim,

I removed all newly added xxx_LNQ enums.

Thanks,
-Jiangning

utils/TableGen/NeonEmitter.cpp
1571–1572	Yeah! Kevin also mentioned this problem to me. I uploaded a new version and removed all newly added xxx_LNQ enums.

t.p.northover added inline comments.Sep 30 2013, 2:31 AM

utils/TableGen/NeonEmitter.cpp
1553–1554	Sorry, I should have spotted this earlier, but I think that an fma is semantically distinct from the C expression "a+bc". The latter we can* transform into an fma in -ffast-math mode, but not generally. So we probably need an intrinsic to make sure that if the user asks for a fused multiply, that's the operation they get. Actually, there'a already a generic llvm intrinsic to handle this: "@llvm.fma.*"; we should probably use that rather than an ARM/AArch64-specific one.

Jiangning added inline comments.Sep 30 2013, 4:21 AM

utils/TableGen/NeonEmitter.cpp
1553–1554	Tim, I see your point, so do you mean we must generate fmla instruction for intrinsic function vfma_lane_f32(), no matter if it is in -ffast-math mode or not? Then I think we have to generate fmls for intrinsic function vfms_lane_f32() as well. I don't see LLVM IR has @llvm.fms.* defined, so we have to define an aarch64 specific LLVM intrinsic, or we can use an expression containing llvm.fma.* to represent it? Since it's a fused calculation, I doubt we should do the latter. I want to confirm with you before modifying the code. Thanks, -Jiangning

Hi Jiangning,

I see your point, so do you mean we must generate fmla instruction for intrinsic function vfma_lane_f32(), no matter if it is in -ffast-math mode or not? Then I think we have to generate fmls for intrinsic function vfms_lane_f32() as well.

I believe so.

I don't see LLVM IR has @llvm.fms.* defined, so we have to define an aarch64 specific LLVM intrinsic, or we can use an expression containing llvm.fma.* to represent it?

I think I worked out that it was equivalent to @lllvm.fma(-x, y, z)
(and @llvm.fma(x, -y, z)). The negation is exact, and the fusing works
out to be the same for "z + (-x)*y" as for "z - x*y".

By the way, be wary of the operand order. @llvm.fma(x,y,z) calculates
"x*y+z", but "fmla x, y, z" calculates x + y*z. I *think* both me and
Ana got that wrong at least once. I know I did.

Cheers.

Tim.

Jiangning updated this revision to Unknown Object (????).Oct 1 2013, 6:42 AM

Tim,

I modified acle intrinsics for fma/fms by using @llvm.fma.*.

Thanks,
-Jiangning

test/CodeGen/aarch64-neon-2velem.c
5	Add the test without -fp-contract=fast.

t.p.northover accepted this revision.Apr 3 2014, 4:48 AM

Committed in rL191945.

Revision Contents

Path

Size

include/

clang/

Basic/

arm_neon.td

69 lines

lib/

CodeGen/

CGBuiltin.cpp

40 lines

test/

CodeGen/

aarch64-neon-2velem.c

802 lines

utils/

TableGen/

NeonEmitter.cpp

110 lines

Diff 4577

include/clang/Basic/arm_neon.td

Context not available.
	def OP_MLAL_N : Op;	def OP_MLAL_N : Op;
	def OP_MLSL_N : Op;	def OP_MLSL_N : Op;
	def OP_MUL_LN: Op;	def OP_MUL_LN: Op;
		def OP_MULX_LN: Op;
	def OP_MULL_LN : Op;	def OP_MULL_LN : Op;
		def OP_MULLHi_LN : Op;
	def OP_MLA_LN: Op;	def OP_MLA_LN: Op;
	def OP_MLS_LN: Op;	def OP_MLS_LN: Op;
	def OP_MLAL_LN : Op;	def OP_MLAL_LN : Op;
		def OP_MLALHi_LN : Op;
	def OP_MLSL_LN : Op;	def OP_MLSL_LN : Op;
		def OP_MLSLHi_LN : Op;
	def OP_QDMULL_LN : Op;	def OP_QDMULL_LN : Op;
		def OP_QDMULLHi_LN : Op;
	def OP_QDMLAL_LN : Op;	def OP_QDMLAL_LN : Op;
		def OP_QDMLALHi_LN : Op;
	def OP_QDMLSL_LN : Op;	def OP_QDMLSL_LN : Op;
		def OP_QDMLSLHi_LN : Op;
	def OP_QDMULH_LN : Op;	def OP_QDMULH_LN : Op;
	def OP_QRDMULH_LN : Op;	def OP_QRDMULH_LN : Op;
		def OP_FMS_LN : Op;
		def OP_FMS_LNQ : Op;
	def OP_EQ : Op;	def OP_EQ : Op;
	def OP_GE : Op;	def OP_GE : Op;
	def OP_LE : Op;	def OP_LE : Op;
Context not available.
	// f: float (int args)	// f: float (int args)
	// d: default	// d: default
	// g: default, ignore 'Q' size modifier.	// g: default, ignore 'Q' size modifier.
		// j: default, force 'Q' size modifier.
	// w: double width elements, same num elts	// w: double width elements, same num elts
	// n: double width elements, half num elts	// n: double width elements, half num elts
	// h: half width elements, double num elts	// h: half width elements, double num elts
Context not available.

	////////////////////////////////////////////////////////////////////////////////	////////////////////////////////////////////////////////////////////////////////
	// Multiplication Extended	// Multiplication Extended
	def MULX : SInst<"vmulx", "ddd", "fQfQd">;	def MULX : SInst<"vmulx", "ddd", "fdQfQd">;

	////////////////////////////////////////////////////////////////////////////////	////////////////////////////////////////////////////////////////////////////////
	// Division	// Division
Context not available.
	def VQDMLSL_HIGH : SOpInst<"vqdmlsl_high", "wwkk", "si", OP_QDMLSLHi>;	def VQDMLSL_HIGH : SOpInst<"vqdmlsl_high", "wwkk", "si", OP_QDMLSLHi>;

	////////////////////////////////////////////////////////////////////////////////	////////////////////////////////////////////////////////////////////////////////

		def VMLA_LANEQ : IOpInst<"vmla_laneq", "dddji",
		"siUsUifQsQiQUsQUiQf", OP_MLA_LN>;
		def VMLS_LANEQ : IOpInst<"vmls_laneq", "dddji",
		"siUsUifQsQiQUsQUiQf", OP_MLS_LN>;

		def VFMA_LANE : IInst<"vfma_lane", "dddgi", "fdQfQd">;
		def VFMA_LANEQ : IInst<"vfma_laneq", "dddji", "fdQfQd">;
		def VFMS_LANE : IOpInst<"vfms_lane", "dddgi", "fdQfQd", OP_FMS_LN>;
		def VFMS_LANEQ : IOpInst<"vfms_laneq", "dddji", "fdQfQd", OP_FMS_LNQ>;

		def VMLAL_LANEQ : SOpInst<"vmlal_laneq", "wwdki", "siUsUi", OP_MLAL_LN>;
		def VMLAL_HIGH_LANE : SOpInst<"vmlal_high_lane", "wwkdi", "siUsUi",
		OP_MLALHi_LN>;
		def VMLAL_HIGH_LANEQ : SOpInst<"vmlal_high_laneq", "wwkki", "siUsUi",
		OP_MLALHi_LN>;
		def VMLSL_LANEQ : SOpInst<"vmlsl_laneq", "wwdki", "siUsUi", OP_MLSL_LN>;
		def VMLSL_HIGH_LANE : SOpInst<"vmlsl_high_lane", "wwkdi", "siUsUi",
		OP_MLSLHi_LN>;
		def VMLSL_HIGH_LANEQ : SOpInst<"vmlsl_high_laneq", "wwkki", "siUsUi",
		OP_MLSLHi_LN>;

		def VQDMLAL_LANEQ : SOpInst<"vqdmlal_laneq", "wwdki", "si", OP_QDMLAL_LN>;
		def VQDMLAL_HIGH_LANE : SOpInst<"vqdmlal_high_lane", "wwkdi", "si",
		OP_QDMLALHi_LN>;
		def VQDMLAL_HIGH_LANEQ : SOpInst<"vqdmlal_high_laneq", "wwkki", "si",
		OP_QDMLALHi_LN>;
		def VQDMLSL_LANEQ : SOpInst<"vqdmlsl_laneq", "wwdki", "si", OP_QDMLSL_LN>;
		def VQDMLSL_HIGH_LANE : SOpInst<"vqdmlsl_high_lane", "wwkdi", "si",
		OP_QDMLSLHi_LN>;
		def VQDMLSL_HIGH_LANEQ : SOpInst<"vqdmlsl_high_laneq", "wwkki", "si",
		OP_QDMLSLHi_LN>;

		// Newly add double parameter for vmul_lane in aarch64
		def VMUL_LANE_A64 : IOpInst<"vmul_lane", "ddgi", "dQd", OP_MUL_LN>;

		def VMUL_LANEQ : IOpInst<"vmul_laneq", "ddji",
		"sifdUsUiQsQiQfQUsQUiQfQd", OP_MUL_LN>;
		def VMULL_LANEQ : SOpInst<"vmull_laneq", "wdki", "siUsUi", OP_MULL_LN>;
		def VMULL_HIGH_LANE : SOpInst<"vmull_high_lane", "wkdi", "siUsUi",
		OP_MULLHi_LN>;
		def VMULL_HIGH_LANEQ : SOpInst<"vmull_high_laneq", "wkki", "siUsUi",
		OP_MULLHi_LN>;

		def VQDMULL_LANEQ : SOpInst<"vqdmull_laneq", "wdki", "si", OP_QDMULL_LN>;
		def VQDMULL_HIGH_LANE : SOpInst<"vqdmull_high_lane", "wkdi", "si",
		OP_QDMULLHi_LN>;
		def VQDMULL_HIGH_LANEQ : SOpInst<"vqdmull_high_laneq", "wkki", "si",
		OP_QDMULLHi_LN>;

		def VQDMULH_LANEQ : SOpInst<"vqdmulh_laneq", "ddji", "siQsQi", OP_QDMULH_LN>;
		def VQRDMULH_LANEQ : SOpInst<"vqrdmulh_laneq", "ddji", "siQsQi", OP_QRDMULH_LN>;

		def VMULX_LANE : IOpInst<"vmulx_lane", "ddgi", "fdQfQd", OP_MULX_LN>;
		def VMULX_LANEQ : IOpInst<"vmulx_laneq", "ddji", "fdQfQd", OP_MULX_LN>;

		////////////////////////////////////////////////////////////////////////////////
	// Scalar Arithmetic	// Scalar Arithmetic

	// Scalar Addition	// Scalar Addition
Context not available.

lib/CodeGen/CGBuiltin.cpp

Context not available.
	}	}

	// AArch64-only builtins	// AArch64-only builtins
		case AArch64::BI__builtin_neon_vfma_lane_v:
		case AArch64::BI__builtin_neon_vfmaq_laneq_v: {
		Value *F = CGM.getIntrinsic(Intrinsic::fma, Ty);
		Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
		Ops[1] = Builder.CreateBitCast(Ops[1], Ty);

		Ops[2] = Builder.CreateBitCast(Ops[2], Ty);
		Ops[2] = EmitNeonSplat(Ops[2], cast<ConstantInt>(Ops[3]));
		return Builder.CreateCall3(F, Ops[2], Ops[1], Ops[0]);
		}
		case AArch64::BI__builtin_neon_vfmaq_lane_v: {
		Value *F = CGM.getIntrinsic(Intrinsic::fma, Ty);
		Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
		Ops[1] = Builder.CreateBitCast(Ops[1], Ty);

		llvm::VectorType *VTy = cast<llvm::VectorType>(Ty);
		llvm::Type *STy = llvm::VectorType::get(VTy->getElementType(),
		VTy->getNumElements() / 2);
		Ops[2] = Builder.CreateBitCast(Ops[2], STy);
		Value* SV = llvm::ConstantVector::getSplat(VTy->getNumElements(),
		cast<ConstantInt>(Ops[3]));
		Ops[2] = Builder.CreateShuffleVector(Ops[2], Ops[2], SV, "lane");

		return Builder.CreateCall3(F, Ops[2], Ops[1], Ops[0]);
		}
		case AArch64::BI__builtin_neon_vfma_laneq_v: {
		Value *F = CGM.getIntrinsic(Intrinsic::fma, Ty);
		Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
		Ops[1] = Builder.CreateBitCast(Ops[1], Ty);

		llvm::VectorType *VTy = cast<llvm::VectorType>(Ty);
		llvm::Type *STy = llvm::VectorType::get(VTy->getElementType(),
		VTy->getNumElements() * 2);
		Ops[2] = Builder.CreateBitCast(Ops[2], STy);
		Value* SV = llvm::ConstantVector::getSplat(VTy->getNumElements(),
		cast<ConstantInt>(Ops[3]));
		Ops[2] = Builder.CreateShuffleVector(Ops[2], Ops[2], SV, "lane");

		return Builder.CreateCall3(F, Ops[2], Ops[1], Ops[0]);
		}
	case AArch64::BI__builtin_neon_vfms_v:	case AArch64::BI__builtin_neon_vfms_v:
	case AArch64::BI__builtin_neon_vfmsq_v: {	case AArch64::BI__builtin_neon_vfmsq_v: {
	Value *F = CGM.getIntrinsic(Intrinsic::fma, Ty);	Value *F = CGM.getIntrinsic(Intrinsic::fma, Ty);
Context not available.

test/CodeGen/aarch64-neon-2velem.c

This file was added.

				// REQUIRES: aarch64-registered-target
				// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +neon \
				// RUN: -ffp-contract=fast -S -O3 -o - %s \| FileCheck %s
				// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +neon \
				// RUN: -S -O3 -o - %s \| FileCheck %s
				JiangningAuthorUnsubmitted Not Done Reply Inline Actions Add the test without -fp-contract=fast. Jiangning: Add the test without -fp-contract=fast.

				// Test new aarch64 intrinsics and types

				#include <arm_neon.h>

				int16x4_t test_vmla_lane_s16(int16x4_t a, int16x4_t b, int16x4_t v) {
				// CHECK: test_vmla_lane_s16
				return vmla_lane_s16(a, b, v, 1);
				// CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int16x8_t test_vmlaq_lane_s16(int16x8_t a, int16x8_t b, int16x4_t v) {
				// CHECK: test_vmlaq_lane_s16
				return vmlaq_lane_s16(a, b, v, 1);
				// CHECK: mla {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int32x2_t test_vmla_lane_s32(int32x2_t a, int32x2_t b, int32x2_t v) {
				// CHECK: test_vmla_lane_s32
				return vmla_lane_s32(a, b, v, 1);
				// CHECK: mla {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlaq_lane_s32(int32x4_t a, int32x4_t b, int32x2_t v) {
				// CHECK: test_vmlaq_lane_s32
				return vmlaq_lane_s32(a, b, v, 1);
				// CHECK: mla {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int16x4_t test_vmla_laneq_s16(int16x4_t a, int16x4_t b, int16x8_t v) {
				// CHECK: test_vmla_laneq_s16
				return vmla_laneq_s16(a, b, v, 1);
				// CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int16x8_t test_vmlaq_laneq_s16(int16x8_t a, int16x8_t b, int16x8_t v) {
				// CHECK: test_vmlaq_laneq_s16
				return vmlaq_laneq_s16(a, b, v, 1);
				// CHECK: mla {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int32x2_t test_vmla_laneq_s32(int32x2_t a, int32x2_t b, int32x4_t v) {
				// CHECK: test_vmla_laneq_s32
				return vmla_laneq_s32(a, b, v, 1);
				// CHECK: mla {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlaq_laneq_s32(int32x4_t a, int32x4_t b, int32x4_t v) {
				// CHECK: test_vmlaq_laneq_s32
				return vmlaq_laneq_s32(a, b, v, 1);
				// CHECK: mla {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int16x4_t test_vmls_lane_s16(int16x4_t a, int16x4_t b, int16x4_t v) {
				// CHECK: test_vmls_lane_s16
				return vmls_lane_s16(a, b, v, 1);
				// CHECK: mls {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int16x8_t test_vmlsq_lane_s16(int16x8_t a, int16x8_t b, int16x4_t v) {
				// CHECK: test_vmlsq_lane_s16
				return vmlsq_lane_s16(a, b, v, 1);
				// CHECK: mls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int32x2_t test_vmls_lane_s32(int32x2_t a, int32x2_t b, int32x2_t v) {
				// CHECK: test_vmls_lane_s32
				return vmls_lane_s32(a, b, v, 1);
				// CHECK: mls {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsq_lane_s32(int32x4_t a, int32x4_t b, int32x2_t v) {
				// CHECK: test_vmlsq_lane_s32
				return vmlsq_lane_s32(a, b, v, 1);
				// CHECK: mls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int16x4_t test_vmls_laneq_s16(int16x4_t a, int16x4_t b, int16x8_t v) {
				// CHECK: test_vmls_laneq_s16
				return vmls_laneq_s16(a, b, v, 1);
				// CHECK: mls {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int16x8_t test_vmlsq_laneq_s16(int16x8_t a, int16x8_t b, int16x8_t v) {
				// CHECK: test_vmlsq_laneq_s16
				return vmlsq_laneq_s16(a, b, v, 1);
				// CHECK: mls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int32x2_t test_vmls_laneq_s32(int32x2_t a, int32x2_t b, int32x4_t v) {
				// CHECK: test_vmls_laneq_s32
				return vmls_laneq_s32(a, b, v, 1);
				// CHECK: mls {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsq_laneq_s32(int32x4_t a, int32x4_t b, int32x4_t v) {
				// CHECK: test_vmlsq_laneq_s32
				return vmlsq_laneq_s32(a, b, v, 1);
				// CHECK: mls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int16x4_t test_vmul_lane_s16(int16x4_t a, int16x4_t v) {
				// CHECK: test_vmul_lane_s16
				return vmul_lane_s16(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int16x8_t test_vmulq_lane_s16(int16x8_t a, int16x4_t v) {
				// CHECK: test_vmulq_lane_s16
				return vmulq_lane_s16(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int32x2_t test_vmul_lane_s32(int32x2_t a, int32x2_t v) {
				// CHECK: test_vmul_lane_s32
				return vmul_lane_s32(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmulq_lane_s32(int32x4_t a, int32x2_t v) {
				// CHECK: test_vmulq_lane_s32
				return vmulq_lane_s32(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				uint16x4_t test_vmul_lane_u16(uint16x4_t a, uint16x4_t v) {
				// CHECK: test_vmul_lane_u16
				return vmul_lane_u16(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				uint16x8_t test_vmulq_lane_u16(uint16x8_t a, uint16x4_t v) {
				// CHECK: test_vmulq_lane_u16
				return vmulq_lane_u16(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				uint32x2_t test_vmul_lane_u32(uint32x2_t a, uint32x2_t v) {
				// CHECK: test_vmul_lane_u32
				return vmul_lane_u32(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				uint32x4_t test_vmulq_lane_u32(uint32x4_t a, uint32x2_t v) {
				// CHECK: test_vmulq_lane_u32
				return vmulq_lane_u32(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int16x4_t test_vmul_laneq_s16(int16x4_t a, int16x8_t v) {
				// CHECK: test_vmul_laneq_s16
				return vmul_laneq_s16(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int16x8_t test_vmulq_laneq_s16(int16x8_t a, int16x8_t v) {
				// CHECK: test_vmulq_laneq_s16
				return vmulq_laneq_s16(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int32x2_t test_vmul_laneq_s32(int32x2_t a, int32x4_t v) {
				// CHECK: test_vmul_laneq_s32
				return vmul_laneq_s32(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmulq_laneq_s32(int32x4_t a, int32x4_t v) {
				// CHECK: test_vmulq_laneq_s32
				return vmulq_laneq_s32(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				uint16x4_t test_vmul_laneq_u16(uint16x4_t a, uint16x8_t v) {
				// CHECK: test_vmul_laneq_u16
				return vmul_laneq_u16(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				uint16x8_t test_vmulq_laneq_u16(uint16x8_t a, uint16x8_t v) {
				// CHECK: test_vmulq_laneq_u16
				return vmulq_laneq_u16(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				uint32x2_t test_vmul_laneq_u32(uint32x2_t a, uint32x4_t v) {
				// CHECK: test_vmul_laneq_u32
				return vmul_laneq_u32(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				uint32x4_t test_vmulq_laneq_u32(uint32x4_t a, uint32x4_t v) {
				// CHECK: test_vmulq_laneq_u32
				return vmulq_laneq_u32(a, v, 1);
				// CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float32x2_t test_vfma_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v) {
				// CHECK: test_vfma_lane_f32
				return vfma_lane_f32(a, b, v, 1);
				// CHECK: fmla {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				float32x4_t test_vfmaq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v) {
				// CHECK: test_vfmaq_lane_f32
				return vfmaq_lane_f32(a, b, v, 1);
				// CHECK: fmla {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float32x2_t test_vfma_laneq_f32(float32x2_t a, float32x2_t b, float32x4_t v) {
				// CHECK: test_vfma_laneq_f32
				return vfma_laneq_f32(a, b, v, 1);
				// CHECK: fmla {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				float32x4_t test_vfmaq_laneq_f32(float32x4_t a, float32x4_t b, float32x4_t v) {
				// CHECK: test_vfmaq_laneq_f32
				return vfmaq_laneq_f32(a, b, v, 1);
				// CHECK: fmla {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float32x2_t test_vfms_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v) {
				// CHECK: test_vfms_lane_f32
				return vfms_lane_f32(a, b, v, 1);
				// CHECK: fmls {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				float32x4_t test_vfmsq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v) {
				// CHECK: test_vfmsq_lane_f32
				return vfmsq_lane_f32(a, b, v, 1);
				// CHECK: fmls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float32x2_t test_vfms_laneq_f32(float32x2_t a, float32x2_t b, float32x4_t v) {
				// CHECK: test_vfms_laneq_f32
				return vfms_laneq_f32(a, b, v, 1);
				// CHECK: fmls {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				float32x4_t test_vfmsq_laneq_f32(float32x4_t a, float32x4_t b, float32x4_t v) {
				// CHECK: test_vfmsq_laneq_f32
				return vfmsq_laneq_f32(a, b, v, 1);
				// CHECK: fmls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float64x2_t test_vfmaq_lane_f64(float64x2_t a, float64x2_t b, float64x1_t v) {
				// CHECK: test_vfmaq_lane_f64
				return vfmaq_lane_f64(a, b, v, 0);
				// CHECK: fmla {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[0]
				}

				float64x2_t test_vfmaq_laneq_f64_0(float64x2_t a, float64x2_t b, float64x2_t v) {
				// CHECK: test_vfmaq_laneq_f64
				return vfmaq_laneq_f64(a, b, v, 0);
				// CHECK: fmla {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[0]
				}

				float64x2_t test_vfmaq_laneq_f64(float64x2_t a, float64x2_t b, float64x2_t v) {
				// CHECK: test_vfmaq_laneq_f64
				return vfmaq_laneq_f64(a, b, v, 1);
				// CHECK: fmla {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[1]
				}

				float64x2_t test_vfmsq_lane_f64(float64x2_t a, float64x2_t b, float64x1_t v) {
				// CHECK: test_vfmsq_lane_f64
				return vfmsq_lane_f64(a, b, v, 0);
				// CHECK: fmls {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[0]
				}

				float64x2_t test_vfmsq_laneq_f64_0(float64x2_t a, float64x2_t b, float64x2_t v) {
				// CHECK: test_vfmsq_laneq_f64
				return vfmsq_laneq_f64(a, b, v, 0);
				// CHECK: fmls {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[0]
				}

				float64x2_t test_vfmsq_laneq_f64(float64x2_t a, float64x2_t b, float64x2_t v) {
				// CHECK: test_vfmsq_laneq_f64
				return vfmsq_laneq_f64(a, b, v, 1);
				// CHECK: fmls {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[1]
				}

				int32x4_t test_vmlal_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v) {
				// CHECK: test_vmlal_lane_s16
				return vmlal_lane_s16(a, b, v, 1);
				// CHECK: smlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlal_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v) {
				// CHECK: test_vmlal_lane_s32
				return vmlal_lane_s32(a, b, v, 1);
				// CHECK: smlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlal_laneq_s16(int32x4_t a, int16x4_t b, int16x8_t v) {
				// CHECK: test_vmlal_laneq_s16
				return vmlal_laneq_s16(a, b, v, 1);
				// CHECK: smlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlal_laneq_s32(int64x2_t a, int32x2_t b, int32x4_t v) {
				// CHECK: test_vmlal_laneq_s32
				return vmlal_laneq_s32(a, b, v, 1);
				// CHECK: smlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlal_high_lane_s16(int32x4_t a, int16x8_t b, int16x4_t v) {
				// CHECK: test_vmlal_high_lane_s16
				return vmlal_high_lane_s16(a, b, v, 1);
				// CHECK: smlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlal_high_lane_s32(int64x2_t a, int32x4_t b, int32x2_t v) {
				// CHECK: test_vmlal_high_lane_s32
				return vmlal_high_lane_s32(a, b, v, 1);
				// CHECK: smlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlal_high_laneq_s16(int32x4_t a, int16x8_t b, int16x8_t v) {
				// CHECK: test_vmlal_high_laneq_s16
				return vmlal_high_laneq_s16(a, b, v, 1);
				// CHECK: smlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlal_high_laneq_s32(int64x2_t a, int32x4_t b, int32x4_t v) {
				// CHECK: test_vmlal_high_laneq_s32
				return vmlal_high_laneq_s32(a, b, v, 1);
				// CHECK: smlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsl_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v) {
				// CHECK: test_vmlsl_lane_s16
				return vmlsl_lane_s16(a, b, v, 1);
				// CHECK: smlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlsl_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v) {
				// CHECK: test_vmlsl_lane_s32
				return vmlsl_lane_s32(a, b, v, 1);
				// CHECK: smlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsl_laneq_s16(int32x4_t a, int16x4_t b, int16x8_t v) {
				// CHECK: test_vmlsl_laneq_s16
				return vmlsl_laneq_s16(a, b, v, 1);
				// CHECK: smlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlsl_laneq_s32(int64x2_t a, int32x2_t b, int32x4_t v) {
				// CHECK: test_vmlsl_laneq_s32
				return vmlsl_laneq_s32(a, b, v, 1);
				// CHECK: smlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsl_high_lane_s16(int32x4_t a, int16x8_t b, int16x4_t v) {
				// CHECK: test_vmlsl_high_lane_s16
				return vmlsl_high_lane_s16(a, b, v, 1);
				// CHECK: smlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlsl_high_lane_s32(int64x2_t a, int32x4_t b, int32x2_t v) {
				// CHECK: test_vmlsl_high_lane_s32
				return vmlsl_high_lane_s32(a, b, v, 1);
				// CHECK: smlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsl_high_laneq_s16(int32x4_t a, int16x8_t b, int16x8_t v) {
				// CHECK: test_vmlsl_high_laneq_s16
				return vmlsl_high_laneq_s16(a, b, v, 1);
				// CHECK: smlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlsl_high_laneq_s32(int64x2_t a, int32x4_t b, int32x4_t v) {
				// CHECK: test_vmlsl_high_laneq_s32
				return vmlsl_high_laneq_s32(a, b, v, 1);
				// CHECK: smlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlal_lane_u16(int32x4_t a, int16x4_t b, int16x4_t v) {
				// CHECK: test_vmlal_lane_u16
				return vmlal_lane_u16(a, b, v, 1);
				// CHECK: umlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlal_lane_u32(int64x2_t a, int32x2_t b, int32x2_t v) {
				// CHECK: test_vmlal_lane_u32
				return vmlal_lane_u32(a, b, v, 1);
				// CHECK: umlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlal_laneq_u16(int32x4_t a, int16x4_t b, int16x8_t v) {
				// CHECK: test_vmlal_laneq_u16
				return vmlal_laneq_u16(a, b, v, 1);
				// CHECK: umlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlal_laneq_u32(int64x2_t a, int32x2_t b, int32x4_t v) {
				// CHECK: test_vmlal_laneq_u32
				return vmlal_laneq_u32(a, b, v, 1);
				// CHECK: umlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlal_high_lane_u16(int32x4_t a, int16x8_t b, int16x4_t v) {
				// CHECK: test_vmlal_high_lane_u16
				return vmlal_high_lane_u16(a, b, v, 1);
				// CHECK: umlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlal_high_lane_u32(int64x2_t a, int32x4_t b, int32x2_t v) {
				// CHECK: test_vmlal_high_lane_u32
				return vmlal_high_lane_u32(a, b, v, 1);
				// CHECK: umlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlal_high_laneq_u16(int32x4_t a, int16x8_t b, int16x8_t v) {
				// CHECK: test_vmlal_high_laneq_u16
				return vmlal_high_laneq_u16(a, b, v, 1);
				// CHECK: umlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlal_high_laneq_u32(int64x2_t a, int32x4_t b, int32x4_t v) {
				// CHECK: test_vmlal_high_laneq_u32
				return vmlal_high_laneq_u32(a, b, v, 1);
				// CHECK: umlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsl_lane_u16(int32x4_t a, int16x4_t b, int16x4_t v) {
				// CHECK: test_vmlsl_lane_u16
				return vmlsl_lane_u16(a, b, v, 1);
				// CHECK: umlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlsl_lane_u32(int64x2_t a, int32x2_t b, int32x2_t v) {
				// CHECK: test_vmlsl_lane_u32
				return vmlsl_lane_u32(a, b, v, 1);
				// CHECK: umlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsl_laneq_u16(int32x4_t a, int16x4_t b, int16x8_t v) {
				// CHECK: test_vmlsl_laneq_u16
				return vmlsl_laneq_u16(a, b, v, 1);
				// CHECK: umlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlsl_laneq_u32(int64x2_t a, int32x2_t b, int32x4_t v) {
				// CHECK: test_vmlsl_laneq_u32
				return vmlsl_laneq_u32(a, b, v, 1);
				// CHECK: umlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsl_high_lane_u16(int32x4_t a, int16x8_t b, int16x4_t v) {
				// CHECK: test_vmlsl_high_lane_u16
				return vmlsl_high_lane_u16(a, b, v, 1);
				// CHECK: umlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlsl_high_lane_u32(int64x2_t a, int32x4_t b, int32x2_t v) {
				// CHECK: test_vmlsl_high_lane_u32
				return vmlsl_high_lane_u32(a, b, v, 1);
				// CHECK: umlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmlsl_high_laneq_u16(int32x4_t a, int16x8_t b, int16x8_t v) {
				// CHECK: test_vmlsl_high_laneq_u16
				return vmlsl_high_laneq_u16(a, b, v, 1);
				// CHECK: umlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmlsl_high_laneq_u32(int64x2_t a, int32x4_t b, int32x4_t v) {
				// CHECK: test_vmlsl_high_laneq_u32
				return vmlsl_high_laneq_u32(a, b, v, 1);
				// CHECK: umlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmull_lane_s16(int16x4_t a, int16x4_t v) {
				// CHECK: test_vmull_lane_s16
				return vmull_lane_s16(a, v, 1);
				// CHECK: smull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmull_lane_s32(int32x2_t a, int32x2_t v) {
				// CHECK: test_vmull_lane_s32
				return vmull_lane_s32(a, v, 1);
				// CHECK: smull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				uint32x4_t test_vmull_lane_u16(uint16x4_t a, uint16x4_t v) {
				// CHECK: test_vmull_lane_u16
				return vmull_lane_u16(a, v, 1);
				// CHECK: umull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				uint64x2_t test_vmull_lane_u32(uint32x2_t a, uint32x2_t v) {
				// CHECK: test_vmull_lane_u32
				return vmull_lane_u32(a, v, 1);
				// CHECK: umull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmull_high_lane_s16(int16x8_t a, int16x4_t v) {
				// CHECK: test_vmull_high_lane_s16
				return vmull_high_lane_s16(a, v, 1);
				// CHECK: smull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmull_high_lane_s32(int32x4_t a, int32x2_t v) {
				// CHECK: test_vmull_high_lane_s32
				return vmull_high_lane_s32(a, v, 1);
				// CHECK: smull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				uint32x4_t test_vmull_high_lane_u16(uint16x8_t a, uint16x4_t v) {
				// CHECK: test_vmull_high_lane_u16
				return vmull_high_lane_u16(a, v, 1);
				// CHECK: umull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				uint64x2_t test_vmull_high_lane_u32(uint32x4_t a, uint32x2_t v) {
				// CHECK: test_vmull_high_lane_u32
				return vmull_high_lane_u32(a, v, 1);
				// CHECK: umull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmull_laneq_s16(int16x4_t a, int16x8_t v) {
				// CHECK: test_vmull_laneq_s16
				return vmull_laneq_s16(a, v, 1);
				// CHECK: smull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmull_laneq_s32(int32x2_t a, int32x4_t v) {
				// CHECK: test_vmull_laneq_s32
				return vmull_laneq_s32(a, v, 1);
				// CHECK: smull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				uint32x4_t test_vmull_laneq_u16(uint16x4_t a, uint16x8_t v) {
				// CHECK: test_vmull_laneq_u16
				return vmull_laneq_u16(a, v, 1);
				// CHECK: umull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				uint64x2_t test_vmull_laneq_u32(uint32x2_t a, uint32x4_t v) {
				// CHECK: test_vmull_laneq_u32
				return vmull_laneq_u32(a, v, 1);
				// CHECK: umull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vmull_high_laneq_s16(int16x8_t a, int16x8_t v) {
				// CHECK: test_vmull_high_laneq_s16
				return vmull_high_laneq_s16(a, v, 1);
				// CHECK: smull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vmull_high_laneq_s32(int32x4_t a, int32x4_t v) {
				// CHECK: test_vmull_high_laneq_s32
				return vmull_high_laneq_s32(a, v, 1);
				// CHECK: smull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				uint32x4_t test_vmull_high_laneq_u16(uint16x8_t a, uint16x8_t v) {
				// CHECK: test_vmull_high_laneq_u16
				return vmull_high_laneq_u16(a, v, 1);
				// CHECK: umull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				uint64x2_t test_vmull_high_laneq_u32(uint32x4_t a, uint32x4_t v) {
				// CHECK: test_vmull_high_laneq_u32
				return vmull_high_laneq_u32(a, v, 1);
				// CHECK: umull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqdmlal_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v) {
				// CHECK: test_vqdmlal_lane_s16
				return vqdmlal_lane_s16(a, b, v, 1);
				// CHECK: sqdmlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vqdmlal_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v) {
				// CHECK: test_vqdmlal_lane_s32
				return vqdmlal_lane_s32(a, b, v, 1);
				// CHECK: sqdmlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqdmlal_high_lane_s16(int32x4_t a, int16x8_t b, int16x4_t v) {
				// CHECK: test_vqdmlal_high_lane_s16
				return vqdmlal_high_lane_s16(a, b, v, 1);
				// CHECK: sqdmlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vqdmlal_high_lane_s32(int64x2_t a, int32x4_t b, int32x2_t v) {
				// CHECK: test_vqdmlal_high_lane_s32
				return vqdmlal_high_lane_s32(a, b, v, 1);
				// CHECK: sqdmlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqdmlsl_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v) {
				// CHECK: test_vqdmlsl_lane_s16
				return vqdmlsl_lane_s16(a, b, v, 1);
				// CHECK: sqdmlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vqdmlsl_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v) {
				// CHECK: test_vqdmlsl_lane_s32
				return vqdmlsl_lane_s32(a, b, v, 1);
				// CHECK: sqdmlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqdmlsl_high_lane_s16(int32x4_t a, int16x8_t b, int16x4_t v) {
				// CHECK: test_vqdmlsl_high_lane_s16
				return vqdmlsl_high_lane_s16(a, b, v, 1);
				// CHECK: sqdmlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vqdmlsl_high_lane_s32(int64x2_t a, int32x4_t b, int32x2_t v) {
				// CHECK: test_vqdmlsl_high_lane_s32
				return vqdmlsl_high_lane_s32(a, b, v, 1);
				// CHECK: sqdmlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqdmull_lane_s16(int16x4_t a, int16x4_t v) {
				// CHECK: test_vqdmull_lane_s16
				return vqdmull_lane_s16(a, v, 1);
				// CHECK: sqdmull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vqdmull_lane_s32(int32x2_t a, int32x2_t v) {
				// CHECK: test_vqdmull_lane_s32
				return vqdmull_lane_s32(a, v, 1);
				// CHECK: sqdmull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqdmull_laneq_s16(int16x4_t a, int16x8_t v) {
				// CHECK: test_vqdmull_laneq_s16
				return vqdmull_laneq_s16(a, v, 1);
				// CHECK: sqdmull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vqdmull_laneq_s32(int32x2_t a, int32x4_t v) {
				// CHECK: test_vqdmull_laneq_s32
				return vqdmull_laneq_s32(a, v, 1);
				// CHECK: sqdmull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqdmull_high_lane_s16(int16x8_t a, int16x4_t v) {
				// CHECK: test_vqdmull_high_lane_s16
				return vqdmull_high_lane_s16(a, v, 1);
				// CHECK: sqdmull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vqdmull_high_lane_s32(int32x4_t a, int32x2_t v) {
				// CHECK: test_vqdmull_high_lane_s32
				return vqdmull_high_lane_s32(a, v, 1);
				// CHECK: sqdmull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqdmull_high_laneq_s16(int16x8_t a, int16x8_t v) {
				// CHECK: test_vqdmull_high_laneq_s16
				return vqdmull_high_laneq_s16(a, v, 1);
				// CHECK: sqdmull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int64x2_t test_vqdmull_high_laneq_s32(int32x4_t a, int32x4_t v) {
				// CHECK: test_vqdmull_high_laneq_s32
				return vqdmull_high_laneq_s32(a, v, 1);
				// CHECK: sqdmull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int16x4_t test_vqdmulh_lane_s16(int16x4_t a, int16x4_t v) {
				// CHECK: test_vqdmulh_lane_s16
				return vqdmulh_lane_s16(a, v, 1);
				// CHECK: sqdmulh {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int16x8_t test_vqdmulhq_lane_s16(int16x8_t a, int16x4_t v) {
				// CHECK: test_vqdmulhq_lane_s16
				return vqdmulhq_lane_s16(a, v, 1);
				// CHECK: sqdmulh {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int32x2_t test_vqdmulh_lane_s32(int32x2_t a, int32x2_t v) {
				// CHECK: test_vqdmulh_lane_s32
				return vqdmulh_lane_s32(a, v, 1);
				// CHECK: sqdmulh {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqdmulhq_lane_s32(int32x4_t a, int32x2_t v) {
				// CHECK: test_vqdmulhq_lane_s32
				return vqdmulhq_lane_s32(a, v, 1);
				// CHECK: sqdmulh {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				int16x4_t test_vqrdmulh_lane_s16(int16x4_t a, int16x4_t v) {
				// CHECK: test_vqrdmulh_lane_s16
				return vqrdmulh_lane_s16(a, v, 1);
				// CHECK: sqrdmulh {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
				}

				int16x8_t test_vqrdmulhq_lane_s16(int16x8_t a, int16x4_t v) {
				// CHECK: test_vqrdmulhq_lane_s16
				return vqrdmulhq_lane_s16(a, v, 1);
				// CHECK: sqrdmulh {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[1]
				}

				int32x2_t test_vqrdmulh_lane_s32(int32x2_t a, int32x2_t v) {
				// CHECK: test_vqrdmulh_lane_s32
				return vqrdmulh_lane_s32(a, v, 1);
				// CHECK: sqrdmulh {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				int32x4_t test_vqrdmulhq_lane_s32(int32x4_t a, int32x2_t v) {
				// CHECK: test_vqrdmulhq_lane_s32
				return vqrdmulhq_lane_s32(a, v, 1);
				// CHECK: sqrdmulh {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float32x2_t test_vmul_lane_f32(float32x2_t a, float32x2_t v) {
				// CHECK: test_vmul_lane_f32
				return vmul_lane_f32(a, v, 1);
				// CHECK: fmul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				float32x4_t test_vmulq_lane_f32(float32x4_t a, float32x2_t v) {
				// CHECK: test_vmulq_lane_f32
				return vmulq_lane_f32(a, v, 1);
				// CHECK: fmul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float64x2_t test_vmulq_lane_f64(float64x2_t a, float64x1_t v) {
				// CHECK: test_vmulq_lane_f64
				return vmulq_lane_f64(a, v, 0);
				// CHECK: fmul {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[0]
				}

				float32x2_t test_vmul_laneq_f32(float32x2_t a, float32x4_t v) {
				// CHECK: test_vmul_laneq_f32
				return vmul_laneq_f32(a, v, 1);
				// CHECK: fmul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				float32x4_t test_vmulq_laneq_f32(float32x4_t a, float32x4_t v) {
				// CHECK: test_vmulq_laneq_f32
				return vmulq_laneq_f32(a, v, 1);
				// CHECK: fmul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float64x2_t test_vmulq_laneq_f64_0(float64x2_t a, float64x2_t v) {
				// CHECK: test_vmulq_laneq_f64
				return vmulq_laneq_f64(a, v, 0);
				// CHECK: fmul {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[0]
				}

				float64x2_t test_vmulq_laneq_f64(float64x2_t a, float64x2_t v) {
				// CHECK: test_vmulq_laneq_f64
				return vmulq_laneq_f64(a, v, 1);
				// CHECK: fmul {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[1]
				}

				float32x2_t test_vmulx_lane_f32(float32x2_t a, float32x2_t v) {
				// CHECK: test_vmulx_lane_f32
				return vmulx_lane_f32(a, v, 1);
				// CHECK: fmulx {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				float32x4_t test_vmulxq_lane_f32(float32x4_t a, float32x2_t v) {
				// CHECK: test_vmulxq_lane_f32
				return vmulxq_lane_f32(a, v, 1);
				// CHECK: fmulx {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float64x2_t test_vmulxq_lane_f64(float64x2_t a, float64x1_t v) {
				// CHECK: test_vmulxq_lane_f64
				return vmulxq_lane_f64(a, v, 0);
				// CHECK: fmulx {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[0]
				}

				float32x2_t test_vmulx_laneq_f32(float32x2_t a, float32x4_t v) {
				// CHECK: test_vmulx_laneq_f32
				return vmulx_laneq_f32(a, v, 1);
				// CHECK: fmulx {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				}

				float32x4_t test_vmulxq_laneq_f32(float32x4_t a, float32x4_t v) {
				// CHECK: test_vmulxq_laneq_f32
				return vmulxq_laneq_f32(a, v, 1);
				// CHECK: fmulx {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				}

				float64x2_t test_vmulxq_laneq_f64_0(float64x2_t a, float64x2_t v) {
				// CHECK: test_vmulxq_laneq_f64
				return vmulxq_laneq_f64(a, v, 0);
				// CHECK: fmulx {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[0]
				}

				float64x2_t test_vmulxq_laneq_f64(float64x2_t a, float64x2_t v) {
				// CHECK: test_vmulxq_laneq_f64
				return vmulxq_laneq_f64(a, v, 1);
				// CHECK: fmulx {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[1]
				}

utils/TableGen/NeonEmitter.cpp

Context not available.
	OpMlalN,	OpMlalN,
	OpMlslN,	OpMlslN,
	OpMulLane,	OpMulLane,
		OpMulXLane,
	OpMullLane,	OpMullLane,
		OpMullHiLane,
	OpMlaLane,	OpMlaLane,
	OpMlsLane,	OpMlsLane,
	OpMlalLane,	OpMlalLane,
		OpMlalHiLane,
	OpMlslLane,	OpMlslLane,
		OpMlslHiLane,
	OpQDMullLane,	OpQDMullLane,
		OpQDMullHiLane,
	OpQDMlalLane,	OpQDMlalLane,
		OpQDMlalHiLane,
	OpQDMlslLane,	OpQDMlslLane,
		OpQDMlslHiLane,
	OpQDMulhLane,	OpQDMulhLane,
	OpQRDMulhLane,	OpQRDMulhLane,
		OpFMSLane,
		OpFMSLaneQ,
	OpEq,	OpEq,
	OpGe,	OpGe,
	OpLe,	OpLe,
Context not available.
	OpMap["OP_MLAL_N"] = OpMlalN;	OpMap["OP_MLAL_N"] = OpMlalN;
	OpMap["OP_MLSL_N"] = OpMlslN;	OpMap["OP_MLSL_N"] = OpMlslN;
	OpMap["OP_MUL_LN"]= OpMulLane;	OpMap["OP_MUL_LN"]= OpMulLane;
		OpMap["OP_MULX_LN"]= OpMulXLane;
	OpMap["OP_MULL_LN"] = OpMullLane;	OpMap["OP_MULL_LN"] = OpMullLane;
		OpMap["OP_MULLHi_LN"] = OpMullHiLane;
	OpMap["OP_MLA_LN"]= OpMlaLane;	OpMap["OP_MLA_LN"]= OpMlaLane;
	OpMap["OP_MLS_LN"]= OpMlsLane;	OpMap["OP_MLS_LN"]= OpMlsLane;
	OpMap["OP_MLAL_LN"] = OpMlalLane;	OpMap["OP_MLAL_LN"] = OpMlalLane;
		OpMap["OP_MLALHi_LN"] = OpMlalHiLane;
	OpMap["OP_MLSL_LN"] = OpMlslLane;	OpMap["OP_MLSL_LN"] = OpMlslLane;
		OpMap["OP_MLSLHi_LN"] = OpMlslHiLane;
	OpMap["OP_QDMULL_LN"] = OpQDMullLane;	OpMap["OP_QDMULL_LN"] = OpQDMullLane;
		OpMap["OP_QDMULLHi_LN"] = OpQDMullHiLane;
	OpMap["OP_QDMLAL_LN"] = OpQDMlalLane;	OpMap["OP_QDMLAL_LN"] = OpQDMlalLane;
		OpMap["OP_QDMLALHi_LN"] = OpQDMlalHiLane;
	OpMap["OP_QDMLSL_LN"] = OpQDMlslLane;	OpMap["OP_QDMLSL_LN"] = OpQDMlslLane;
		OpMap["OP_QDMLSLHi_LN"] = OpQDMlslHiLane;
	OpMap["OP_QDMULH_LN"] = OpQDMulhLane;	OpMap["OP_QDMULH_LN"] = OpQDMulhLane;
	OpMap["OP_QRDMULH_LN"] = OpQRDMulhLane;	OpMap["OP_QRDMULH_LN"] = OpQRDMulhLane;
		OpMap["OP_FMS_LN"] = OpFMSLane;
		OpMap["OP_FMS_LNQ"] = OpFMSLaneQ;
	OpMap["OP_EQ"] = OpEq;	OpMap["OP_EQ"] = OpEq;
	OpMap["OP_GE"] = OpGe;	OpMap["OP_GE"] = OpGe;
	OpMap["OP_LE"] = OpLe;	OpMap["OP_LE"] = OpLe;
Context not available.
	case 'g':	case 'g':
	quad = false;	quad = false;
	break;	break;
		case 'j':
		quad = true;
		break;
	case 'w':	case 'w':
	type = Widen(type);	type = Widen(type);
	quad = true;	quad = true;
Context not available.
	type = 's';	type = 's';
	usgn = true;	usgn = true;
	}	}
	usgn = usgn \| poly \| ((ck == ClassI \|\| ck == ClassW) && scal && type != 'f');	usgn = usgn \| poly \| ((ck == ClassI \|\| ck == ClassW) &&
		scal && type != 'f' && type != 'd');

	if (scal) {	if (scal) {
	SmallString<128> s;	SmallString<128> s;
Context not available.
	return "vv"; // void result with void first argument	return "vv"; // void result with void first argument
	if (mod == 'f' \|\| (ck != ClassB && type == 'f'))	if (mod == 'f' \|\| (ck != ClassB && type == 'f'))
	return quad ? "V4f" : "V2f";	return quad ? "V4f" : "V2f";
		if (ck != ClassB && type == 'd')
		return quad ? "V2d" : "V1d";
	if (ck != ClassB && type == 's')	if (ck != ClassB && type == 's')
	return quad ? "V8s" : "V4s";	return quad ? "V8s" : "V4s";
	if (ck != ClassB && type == 'i')	if (ck != ClassB && type == 'i')
Context not available.

	if (mod == 'f' \|\| (ck != ClassB && type == 'f'))	if (mod == 'f' \|\| (ck != ClassB && type == 'f'))
	return quad ? "V4f" : "V2f";	return quad ? "V4f" : "V2f";
		if (ck != ClassB && type == 'd')
		return quad ? "V2d" : "V1d";
	if (ck != ClassB && type == 's')	if (ck != ClassB && type == 's')
	return quad ? "V8s" : "V4s";	return quad ? "V8s" : "V4s";
	if (ck != ClassB && type == 'i')	if (ck != ClassB && type == 'i')
Context not available.
	NormedProto += 'q';	NormedProto += 'q';
	break;	break;
	case 'g':	case 'g':
		case 'j':
	case 'h':	case 'h':
	case 'e':	case 'e':
	NormedProto += 'd';	NormedProto += 'd';
Context not available.
	case OpMulLane:	case OpMulLane:
	s += "__a * " + SplatLane(nElts, "__b", "__c") + ";";	s += "__a * " + SplatLane(nElts, "__b", "__c") + ";";
	break;	break;
		case OpMulXLane:
		s += MangleName("vmulx", typestr, ClassS) + "(__a, " +
		SplatLane(nElts, "__b", "__c") + ");";
		break;
	case OpMul:	case OpMul:
	s += "__a * __b;";	s += "__a * __b;";
	break;	break;
Context not available.
	s += MangleName("vmull", typestr, ClassS) + "(__a, " +	s += MangleName("vmull", typestr, ClassS) + "(__a, " +
	SplatLane(nElts, "__b", "__c") + ");";	SplatLane(nElts, "__b", "__c") + ");";
	break;	break;
		case OpMullHiLane:
		s += MangleName("vmull", typestr, ClassS) + "(" +
		GetHigh("__a", typestr) + ", " + SplatLane(nElts, "__b", "__c") + ");";
		break;
	case OpMlaN:	case OpMlaN:
	s += "__a + (__b * " + Duplicate(nElts, typestr, "__c") + ");";	s += "__a + (__b * " + Duplicate(nElts, typestr, "__c") + ");";
	break;	break;
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Sorry, I should have spotted this earlier, but I think that an fma is semantically distinct from the C expression "a+bc". The latter we can* transform into an fma in -ffast-math mode, but not generally. So we probably need an intrinsic to make sure that if the user asks for a fused multiply, that's the operation they get. Actually, there'a already a generic llvm intrinsic to handle this: "@llvm.fma."; we should probably use that rather than an ARM/AArch64-specific one. t.p.northover:* Sorry, I should have spotted this earlier, but I think that an fma is semantically distinct…
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions Tim, I see your point, so do you mean we must generate fmla instruction for intrinsic function vfma_lane_f32(), no matter if it is in -ffast-math mode or not? Then I think we have to generate fmls for intrinsic function vfms_lane_f32() as well. I don't see LLVM IR has @llvm.fms.* defined, so we have to define an aarch64 specific LLVM intrinsic, or we can use an expression containing llvm.fma.* to represent it? Since it's a fused calculation, I doubt we should do the latter. I want to confirm with you before modifying the code. Thanks, -Jiangning Jiangning: Tim, I see your point, so do you mean we must generate fmla instruction for intrinsic function…
Context not available.
	s += "__a + " + MangleName("vmull", typestr, ClassS) + "(__b, " +	s += "__a + " + MangleName("vmull", typestr, ClassS) + "(__b, " +
	SplatLane(nElts, "__c", "__d") + ");";	SplatLane(nElts, "__c", "__d") + ");";
	break;	break;
		case OpMlalHiLane:
		s += "__a + " + MangleName("vmull", typestr, ClassS) + "(" +
		GetHigh("__b", typestr) + ", " + SplatLane(nElts, "__c", "__d") + ");";
		break;
	case OpMlal:	case OpMlal:
	s += "__a + " + MangleName("vmull", typestr, ClassS) + "(__b, __c);";	s += "__a + " + MangleName("vmull", typestr, ClassS) + "(__b, __c);";
	break;	break;
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Why are these distinct? As far as I can see they are treated in exactly the same way by TableGen. Or am I missing something not shown by a simple grep; some endsWith('Q') weirdness or something? t.p.northover: Why are these distinct? As far as I can see they are treated in exactly the same way by…
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions Yeah! Kevin also mentioned this problem to me. I uploaded a new version and removed all newly added xxx_LNQ enums. Jiangning: Yeah! Kevin also mentioned this problem to me. I uploaded a new version and removed all newly…
Context not available.
	case OpMlsLane:	case OpMlsLane:
	s += "__a - (__b * " + SplatLane(nElts, "__c", "__d") + ");";	s += "__a - (__b * " + SplatLane(nElts, "__c", "__d") + ");";
	break;	break;
		case OpFMSLane:
		s += TypeString(proto[1], typestr) + " __a1 = __a; \\\n ";
		s += TypeString(proto[2], typestr) + " __b1 = __b; \\\n ";
		s += TypeString(proto[3], typestr) + " __c1 = __c; \\\n ";
		s += MangleName("vfma_lane", typestr, ClassS) + "(__a1, __b1, -__c1, __d);";
		break;
		case OpFMSLaneQ:
		s += TypeString(proto[1], typestr) + " __a1 = __a; \\\n ";
		s += TypeString(proto[2], typestr) + " __b1 = __b; \\\n ";
		s += TypeString(proto[3], typestr) + " __c1 = __c; \\\n ";
		s += MangleName("vfma_laneq", typestr, ClassS) + "(__a1, __b1, -__c1, __d);";
		break;
	case OpMls:	case OpMls:
	s += "__a - (__b * __c);";	s += "__a - (__b * __c);";
	break;	break;
Context not available.
	s += "__a - " + MangleName("vmull", typestr, ClassS) + "(__b, " +	s += "__a - " + MangleName("vmull", typestr, ClassS) + "(__b, " +
	SplatLane(nElts, "__c", "__d") + ");";	SplatLane(nElts, "__c", "__d") + ");";
	break;	break;
		case OpMlslHiLane:
		s += "__a - " + MangleName("vmull", typestr, ClassS) + "(" +
		GetHigh("__b", typestr) + ", " + SplatLane(nElts, "__c", "__d") + ");";
		break;
	case OpMlsl:	case OpMlsl:
	s += "__a - " + MangleName("vmull", typestr, ClassS) + "(__b, __c);";	s += "__a - " + MangleName("vmull", typestr, ClassS) + "(__b, __c);";
	break;	break;
Context not available.
	s += MangleName("vqdmull", typestr, ClassS) + "(__a, " +	s += MangleName("vqdmull", typestr, ClassS) + "(__a, " +
	SplatLane(nElts, "__b", "__c") + ");";	SplatLane(nElts, "__b", "__c") + ");";
	break;	break;
		case OpQDMullHiLane:
		s += MangleName("vqdmull", typestr, ClassS) + "(" +
		GetHigh("__a", typestr) + ", " + SplatLane(nElts, "__b", "__c") + ");";
		break;
	case OpQDMlalLane:	case OpQDMlalLane:
	s += MangleName("vqdmlal", typestr, ClassS) + "(__a, __b, " +	s += MangleName("vqdmlal", typestr, ClassS) + "(__a, __b, " +
	SplatLane(nElts, "__c", "__d") + ");";	SplatLane(nElts, "__c", "__d") + ");";
	break;	break;
		case OpQDMlalHiLane:
		s += MangleName("vqdmlal", typestr, ClassS) + "(__a, " +
		GetHigh("__b", typestr) + ", " + SplatLane(nElts, "__c", "__d") + ");";
		break;
	case OpQDMlslLane:	case OpQDMlslLane:
	s += MangleName("vqdmlsl", typestr, ClassS) + "(__a, __b, " +	s += MangleName("vqdmlsl", typestr, ClassS) + "(__a, __b, " +
	SplatLane(nElts, "__c", "__d") + ");";	SplatLane(nElts, "__c", "__d") + ");";
	break;	break;
		case OpQDMlslHiLane:
		s += MangleName("vqdmlsl", typestr, ClassS) + "(__a, " +
		GetHigh("__b", typestr) + ", " + SplatLane(nElts, "__c", "__d") + ");";
		break;
	case OpQDMulhLane:	case OpQDMulhLane:
	s += MangleName("vqdmulh", typestr, ClassS) + "(__a, " +	s += MangleName("vqdmulh", typestr, ClassS) + "(__a, " +
	SplatLane(nElts, "__b", "__c") + ");";	SplatLane(nElts, "__b", "__c") + ");";
Context not available.

	// Emit Neon vector typedefs.	// Emit Neon vector typedefs.
	std::string TypedefTypes(	std::string TypedefTypes(
	"cQcsQsiQilQlUcQUcUsQUsUiQUiUlQUlhQhfQfQdPcQPcPsQPs");	"cQcsQsiQilQlUcQUcUsQUsUiQUiUlQUlhQhfQfdQdPcQPcPsQPs");
	SmallVector<StringRef, 24> TDTypeVec;	SmallVector<StringRef, 24> TDTypeVec;
	ParseTypes(0, TypedefTypes, TDTypeVec);	ParseTypes(0, TypedefTypes, TDTypeVec);

	// Emit vector typedefs.	// Emit vector typedefs.
		bool isA64 = false;
	for (unsigned i = 0, e = TDTypeVec.size(); i != e; ++i) {	for (unsigned i = 0, e = TDTypeVec.size(); i != e; ++i) {
	bool dummy, quad = false, poly = false;	bool dummy, quad = false, poly = false;
	char type = ClassifyType(TDTypeVec[i], quad, poly, dummy);	char type = ClassifyType(TDTypeVec[i], quad, poly, dummy);
	bool isA64 = false;	bool preinsert = false;
		bool postinsert = false;

	if (type == 'd' && quad)	if (type == 'd') {
		preinsert = isA64? false: true;
	isA64 = true;	isA64 = true;
		} else {
	if (isA64)	postinsert = isA64? true: false;
		isA64 = false;
		}
		if (postinsert)
		OS << "#endif\n";
		if (preinsert)
	OS << "#ifdef __aarch64__\n";	OS << "#ifdef __aarch64__\n";

	if (poly)	if (poly)
Context not available.
	OS << TypeString('s', TDTypeVec[i]);	OS << TypeString('s', TDTypeVec[i]);
	OS << " " << TypeString('d', TDTypeVec[i]) << ";\n";	OS << " " << TypeString('d', TDTypeVec[i]) << ";\n";

	if (isA64)
	OS << "#endif\n";
	}	}
	OS << "\n";	OS << "\n";

	// Emit struct typedefs.	// Emit struct typedefs.
		isA64 = false;
	for (unsigned vi = 2; vi != 5; ++vi) {	for (unsigned vi = 2; vi != 5; ++vi) {
	for (unsigned i = 0, e = TDTypeVec.size(); i != e; ++i) {	for (unsigned i = 0, e = TDTypeVec.size(); i != e; ++i) {
	bool dummy, quad = false, poly = false;	bool dummy, quad = false, poly = false;
	char type = ClassifyType(TDTypeVec[i], quad, poly, dummy);	char type = ClassifyType(TDTypeVec[i], quad, poly, dummy);
	bool isA64 = false;	bool preinsert = false;
		bool postinsert = false;

	if (type == 'd' && quad)	if (type == 'd') {
		preinsert = isA64? false: true;
	isA64 = true;	isA64 = true;
		} else {
	if (isA64)	postinsert = isA64? true: false;
		isA64 = false;
		}
		if (postinsert)
		OS << "#endif\n";
		if (preinsert)
	OS << "#ifdef __aarch64__\n";	OS << "#ifdef __aarch64__\n";

	std::string ts = TypeString('d', TDTypeVec[i]);	std::string ts = TypeString('d', TDTypeVec[i]);
Context not available.
	OS << "[" << utostr(vi) << "]";	OS << "[" << utostr(vi) << "]";
	OS << ";\n} ";	OS << ";\n} ";
	OS << vs << ";\n";	OS << vs << ";\n";

	if (isA64)
	OS << "#endif\n";

	OS << "\n";	OS << "\n";
	}	}
	}	}
Context not available.
	case 'f':	case 'f':
	case 'i':	case 'i':
	return (2 << (int)quad) - 1;	return (2 << (int)quad) - 1;
		case 'd':
	case 'l':	case 'l':
	return (1 << (int)quad) - 1;	return (1 << (int)quad) - 1;
	default:	default:
Context not available.