This is an archive of the discontinued LLVM Phabricator instance.

Differential D1753

Implement aarch64 neon instruction class AdvSIMD (by element) - LLVM
ClosedPublic

Authored by Jiangning on Sep 25 2013, 2:03 AM.

Download Raw Diff

Details

Reviewers

t.p.northover

Diff Detail

Event Timeline

Hi Jiangning,

Quite a lot of comments, I'm afraid, but mostly just me moaning about names (which probably makes it worse). I think the important ones are:

Why are NEON_DUPIMM and NEON_VDUP separate? Actually, why is NEON_VDUP even in this patch since you don't use it?
I don't think we should be forming fma without fp-contract=fast but you seem to be checking it in your tests. That's worrying.

Cheers.

Tim.

lib/Target/AArch64/AArch64ISelLowering.h
144–145	I think this has semantics identical to the existing NEON_DUPIMM doesn't it? A large part of the reason I pushed for that solution was that it could be shared in these instructions. NEON_DUP is probably the best name here. "NEON" already implies that a duplicate will involve vectors and AArch64 instructions don't have a 'V' prefix.
lib/Target/AArch64/AArch64InstrFormats.td
1017–1018	Braces normally on the same line.
lib/Target/AArch64/AArch64InstrNEON.td
1569–1572	The antonym of "top" is "bottom". I'm not overly struck on either, but we should probably try to be consistent.
3275	The documentation still uses "Rm" for this register, doesn't it? Inventing "Re" seems confusing.
3286	(Not a comment on this patch, I don't have a good solution. But we really need think about what to do with these instruction format names. Perhaps pass them through MD5 to make them more readable or something.)
3443–3444	'q' means nothing on AArch64 NEON. It's like writing "pattern for lane with ymm".
3662	Perhaps "-ffp-contract=fast generates fma". "enforces to generate" doesn't work in English.
3986	What's this got to do with an fma? The "f" stands for either "float" or "fused", neither of which applies here.
4000	Likewise
test/CodeGen/AArch64/neon-2velem.ll
354–356	None of these should be forming an fmla without fp-contract=fast, should they?

Jiangning updated this revision to Unknown Object (????).Sep 27 2013, 2:49 AM

Hi Tim,

Overall, I accept almost all of your comments. Particularly, I removed the patterns for generating fma when -ffp-contract=fast is unavailable. I also removed Neon_DUP from my patch.

For register definition 'Re', I didn't make change. Re is different from Rm and the Rm in documentation for this instruction class is different from the Rm defined for A64Inst. Rm is being defined in llvm as Inst{20-16}, while Re is not, because Inst{20} is being shared for both Rm and bit 'M'. This is also why the index is restricted to V0-V15 for vector with H element, I think.

Thanks,
-Jiangning

lib/Target/AArch64/AArch64ISelLowering.h
144–145	OK. I will remove NEON_VDUP, and Kevin will add it back in his new patch.
lib/Target/AArch64/AArch64InstrFormats.td
1017–1018	OK. I will reformat this one as well as similar ones.
lib/Target/AArch64/AArch64InstrNEON.td
1569–1572	OK. I will not use bottom, because it's longer. I will use High instead.
3275	Re is different from Rm and the Rm in documentation for this instruction class is different from the Rm defined for A64Inst. Rm is being defined in llvm as Inst{20-16}, while Re is not, because Inst{20} is being shared for both Rm and bit 'M'. This is also why the index is restricted to V0-V15 for vector with H element, I think.
3286	I don't have solution either. I'm trying to make differences for different patterns by describing 1) argument types 2) encoding variants. The encoding variants are purely caused by encoding limitation, so it's really hard to give an even meaningful name.
3443–3444	OK. I will change "without q' to 'in 64-bit vector', and change 'with q' to '128-bit vector'.
3662	OK. I thought I wrote 'force' rather than 'enforce'. :-) Anyway, I will change to the wording you gave.
3986	OK. I will remove letter 'f'.
4000	Likewise
test/CodeGen/AArch64/neon-2velem.ll
354–356	Yes. I will remove the test without -ffp-contract=fast.

Hi Jiangning,

Sorry I didn't get a chance to reply on Friday, and thanks for
reworking the patch.

For register definition 'Re', I didn't make change.

Fair enough.

I think it looks pretty much OK now, though I'd be happier if there
was a test that we *weren't* generating fmla instructions all the time
(perhaps with a comment about it being intentional). It's a very
tempting "optimisation" to make.

I think you should commit, with or without that though. I'll take a
look over the revision after-wards.

Cheers.

Tim.

Tim,

I added some new pattern matches to support the fms_lane intrinsics.

Thanks,
-Jiangning

lib/Target/AArch64/AArch64InstrNEON.td
3687	Some new patterns are added to capture a + b * (-c).

t.p.northover accepted this revision.Apr 3 2014, 4:49 AM

Committed in rL191944.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.h

10 lines

AArch64ISelLowering.cpp

49 lines

AArch64InstrFormats.td

27 lines

AArch64InstrNEON.td

843 lines

AArch64RegisterInfo.td

17 lines

Disassembler/

AArch64Disassembler.cpp

12 lines

test/

CodeGen/

AArch64/

neon-2velem.ll

1335 lines

MC/

AArch64/

neon-2velem.s

271 lines

neon-diagnostics.s

759 lines

Diff 4470

lib/Target/AArch64/AArch64ISelLowering.h

Context not available.

	// Vector saturating shift	// Vector saturating shift
	NEON_QSHLs,	NEON_QSHLs,
	NEON_QSHLu	NEON_QSHLu,

		// Vector dup
		NEON_VDUP,
		t.p.northoverUnsubmitted Not Done Reply Inline Actions I think this has semantics identical to the existing NEON_DUPIMM doesn't it? A large part of the reason I pushed for that solution was that it could be shared in these instructions. NEON_DUP is probably the best name here. "NEON" already implies that a duplicate will involve vectors and AArch64 instructions don't have a 'V' prefix. t.p.northover: I think this has semantics identical to the existing NEON_DUPIMM doesn't it? A large part of…
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions OK. I will remove NEON_VDUP, and Kevin will add it back in his new patch. Jiangning: OK. I will remove NEON_VDUP, and Kevin will add it back in his new patch.

		// Vector dup by lane
		NEON_VDUPLANE
	};	};
	}	}

Context not available.
	SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,	SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG,
	const AArch64Subtarget *ST) const;	const AArch64Subtarget *ST) const;

		SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;

	void SaveVarArgRegisters(CCState &CCInfo, SelectionDAG &DAG, SDLoc DL,	void SaveVarArgRegisters(CCState &CCInfo, SelectionDAG &DAG, SDLoc DL,
	SDValue &Chain) const;	SDValue &Chain) const;

Context not available.

lib/Target/AArch64/AArch64ISelLowering.cpp

Property	Old Value	New Value
File Mode	100644	100755

Context not available.
	setOperationAction(ISD::BUILD_VECTOR, MVT::v1f64, Custom);	setOperationAction(ISD::BUILD_VECTOR, MVT::v1f64, Custom);
	setOperationAction(ISD::BUILD_VECTOR, MVT::v2f64, Custom);	setOperationAction(ISD::BUILD_VECTOR, MVT::v2f64, Custom);

		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v4i16, Custom);
		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v8i16, Custom);
		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2i32, Custom);
		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v4i32, Custom);
		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f32, Custom);
		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v4f32, Custom);
		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v1f64, Custom);
		setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f64, Custom);

		setOperationAction(ISD::CONCAT_VECTORS, MVT::v8i16, Legal);
		setOperationAction(ISD::CONCAT_VECTORS, MVT::v4i32, Legal);
	setOperationAction(ISD::CONCAT_VECTORS, MVT::v2i64, Legal);	setOperationAction(ISD::CONCAT_VECTORS, MVT::v2i64, Legal);
		setOperationAction(ISD::CONCAT_VECTORS, MVT::v4f32, Legal);
		setOperationAction(ISD::CONCAT_VECTORS, MVT::v2f64, Legal);

	setOperationAction(ISD::SETCC, MVT::v8i8, Custom);	setOperationAction(ISD::SETCC, MVT::v8i8, Custom);
	setOperationAction(ISD::SETCC, MVT::v16i8, Custom);	setOperationAction(ISD::SETCC, MVT::v16i8, Custom);
Context not available.
	return "AArch64ISD::NEON_QSHLs";	return "AArch64ISD::NEON_QSHLs";
	case AArch64ISD::NEON_QSHLu:	case AArch64ISD::NEON_QSHLu:
	return "AArch64ISD::NEON_QSHLu";	return "AArch64ISD::NEON_QSHLu";
		case AArch64ISD::NEON_VDUP:
		return "AArch64ISD::NEON_VDUP";
		case AArch64ISD::NEON_VDUPLANE:
		return "AArch64ISD::NEON_VDUPLANE";
	default:	default:
	return NULL;	return NULL;
	}	}
Context not available.
	case ISD::VASTART: return LowerVASTART(Op, DAG);	case ISD::VASTART: return LowerVASTART(Op, DAG);
	case ISD::BUILD_VECTOR:	case ISD::BUILD_VECTOR:
	return LowerBUILD_VECTOR(Op, DAG, getSubtarget());	return LowerBUILD_VECTOR(Op, DAG, getSubtarget());
		case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG);
	}	}

	return SDValue();	return SDValue();
Context not available.
	return SDValue();	return SDValue();
	}	}

		SDValue
		AArch64TargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
		SelectionDAG &DAG) const {
		SDValue V1 = Op.getOperand(0);
		SDLoc dl(Op);
		EVT VT = Op.getValueType();
		ShuffleVectorSDNode *SVN = cast<ShuffleVectorSDNode>(Op.getNode());

		// Convert shuffles that are directly supported on NEON to target-specific
		// DAG nodes, instead of keeping them as shuffles and matching them again
		// during code selection. This is more efficient and avoids the possibility
		// of inconsistencies between legalization and selection.
		ArrayRef<int> ShuffleMask = SVN->getMask();

		unsigned EltSize = VT.getVectorElementType().getSizeInBits();
		if (EltSize <= 64) {
		if (ShuffleVectorSDNode::isSplatMask(&ShuffleMask[0], VT)) {
		int Lane = SVN->getSplatIndex();
		// If this is undef splat, generate it via "just" vdup, if possible.
		if (Lane == -1) Lane = 0;

		// FIXME: add more lowering to generate AArch64ISD::NEON_VDUP

		return DAG.getNode(AArch64ISD::NEON_VDUPLANE, dl, VT, V1,
		DAG.getConstant(Lane, MVT::i64));
		}
		}

		return SDValue();
		}

	AArch64TargetLowering::ConstraintType	AArch64TargetLowering::ConstraintType
	AArch64TargetLowering::getConstraintType(const std::string &Constraint) const {	AArch64TargetLowering::getConstraintType(const std::string &Constraint) const {
	if (Constraint.size() == 1) {	if (Constraint.size() == 1) {
Context not available.

lib/Target/AArch64/AArch64InstrFormats.td

Property	Old Value	New Value
File Mode	100644	100755

Context not available.
	let Inst{28-24} = 0b01110;	let Inst{28-24} = 0b01110;
	let Inst{23-22} = size;	let Inst{23-22} = size;
	let Inst{21} = 0b1;	let Inst{21} = 0b1;
	// Inherit Rm in 20-16	// Inherit Rm in 20-16
	let Inst{15-11} = opcode;	let Inst{15-11} = opcode;
	let Inst{10} = 0b1;	let Inst{10} = 0b1;
	// Inherit Rn in 9-5	// Inherit Rn in 9-5
Context not available.
	let Inst{28-24} = 0b01110;	let Inst{28-24} = 0b01110;
	let Inst{23-22} = size;	let Inst{23-22} = size;
	let Inst{21} = 0b1;	let Inst{21} = 0b1;
	// Inherit Rm in 20-16	// Inherit Rm in 20-16
	let Inst{15-12} = opcode;	let Inst{15-12} = opcode;
	let Inst{11} = 0b0;	let Inst{11} = 0b0;
	let Inst{10} = 0b0;	let Inst{10} = 0b0;
Context not available.
	// Inherit Rd in 4-0	// Inherit Rd in 4-0
	}	}

		// Format AdvSIMD two registers and an element
		class NeonI_2VElem<bit q, bit u, bits<2> size, bits<4> opcode,
		dag outs, dag ins, string asmstr,
		list<dag> patterns, InstrItinClass itin>
		: A64InstRdnm<outs, ins, asmstr, patterns, itin>
		{
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Braces normally on the same line. t.p.northover: Braces normally on the same line.
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions OK. I will reformat this one as well as similar ones. Jiangning: OK. I will reformat this one as well as similar ones.
		let Inst{31} = 0b0;
		let Inst{30} = q;
		let Inst{29} = u;
		let Inst{28-24} = 0b01111;
		let Inst{23-22} = size;
		// l in Inst{21}
		// m in Inst{20}
		// Inherit Rm in 19-16
		let Inst{15-12} = opcode;
		// h in Inst{11}
		let Inst{10} = 0b0;
		// Inherit Rn in 9-5
		// Inherit Rd in 4-0
		}

	// Format AdvSIMD 1 vector register with modified immediate	// Format AdvSIMD 1 vector register with modified immediate
	class NeonI_1VModImm<bit q, bit op,	class NeonI_1VModImm<bit q, bit op,
	dag outs, dag ins, string asmstr,	dag outs, dag ins, string asmstr,
Context not available.
	let Inst{28-24} = 0b11110;	let Inst{28-24} = 0b11110;
	let Inst{23-22} = size;	let Inst{23-22} = size;
	let Inst{21} = 0b1;	let Inst{21} = 0b1;
	// Inherit Rm in 20-16	// Inherit Rm in 20-16
	let Inst{15-11} = opcode;	let Inst{15-11} = opcode;
	let Inst{10} = 0b1;	let Inst{10} = 0b1;
	// Inherit Rn in 9-5	// Inherit Rn in 9-5
Context not available.

lib/Target/AArch64/AArch64InstrNEON.td

Context not available.
	def Neon_sqrshlImm : SDNode<"AArch64ISD::NEON_QSHLs", SDTARMVSH>;	def Neon_sqrshlImm : SDNode<"AArch64ISD::NEON_QSHLs", SDTARMVSH>;
	def Neon_uqrshlImm : SDNode<"AArch64ISD::NEON_QSHLu", SDTARMVSH>;	def Neon_uqrshlImm : SDNode<"AArch64ISD::NEON_QSHLu", SDTARMVSH>;

		def Neon_vdup : SDNode<"AArch64ISD::NEON_VDUP", SDTypeProfile<1, 1,
		[SDTCisVec<0>, SDTCisEltOfVec<1, 0>]>>;

		def Neon_vduplane : SDNode<"AArch64ISD::NEON_VDUPLANE", SDTypeProfile<1, 2,
		[SDTCisVec<0>, SDTCisVec<1>, SDTCisVT<2, i64>]>>;

	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
	// Multiclasses	// Multiclasses
Context not available.
	def Neon_top4S : PatFrag<(ops node:$in),	def Neon_top4S : PatFrag<(ops node:$in),
	(extract_subvector (v4i32 node:$in), (iPTR 2))>;	(extract_subvector (v4i32 node:$in), (iPTR 2))>;

		def Neon_low8H : PatFrag<(ops node:$in),
		t.p.northoverUnsubmitted Not Done Reply Inline Actions The antonym of "top" is "bottom". I'm not overly struck on either, but we should probably try to be consistent. t.p.northover: The antonym of "top" is "bottom". I'm not overly struck on either, but we should probably try…
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions OK. I will not use bottom, because it's longer. I will use High instead. Jiangning: OK. I will not use bottom, because it's longer. I will use High instead.
		(v4i16 (extract_subvector (v8i16 node:$in),
		(iPTR 0)))>;
		def Neon_low4S : PatFrag<(ops node:$in),
		(v2i32 (extract_subvector (v4i32 node:$in),
		(iPTR 0)))>;
		def Neon_low4f : PatFrag<(ops node:$in),
		(v2f32 (extract_subvector (v4f32 node:$in),
		(iPTR 0)))>;

	class N2VShiftLong<bit q, bit u, bits<5> opcode, string asmop, string DestT,	class N2VShiftLong<bit q, bit u, bits<5> opcode, string asmop, string DestT,
	string SrcT, ValueType DestTy, ValueType SrcTy,	string SrcT, ValueType DestTy, ValueType SrcTy,
	Operand ImmTy, SDPatternOperator ExtOp>	Operand ImmTy, SDPatternOperator ExtOp>
Context not available.
	defm SQRSHRNvvi : NeonI_N2VShR_Narrow<0b0, 0b10011, "sqrshrn">;	defm SQRSHRNvvi : NeonI_N2VShR_Narrow<0b0, 0b10011, "sqrshrn">;
	defm UQRSHRNvvi : NeonI_N2VShR_Narrow<0b1, 0b10011, "uqrshrn">;	defm UQRSHRNvvi : NeonI_N2VShR_Narrow<0b1, 0b10011, "uqrshrn">;

	def Neon_combine : PatFrag<(ops node:$Rm, node:$Rn),	def Neon_combine_2D : PatFrag<(ops node:$Rm, node:$Rn),
	(v2i64 (concat_vectors (v1i64 node:$Rm),	(v2i64 (concat_vectors (v1i64 node:$Rm),
	(v1i64 node:$Rn)))>;	(v1i64 node:$Rn)))>;
		def Neon_combine_8H : PatFrag<(ops node:$Rm, node:$Rn),
		(v8i16 (concat_vectors (v4i16 node:$Rm),
		(v4i16 node:$Rn)))>;
		def Neon_combine_4S : PatFrag<(ops node:$Rm, node:$Rn),
		(v4i32 (concat_vectors (v2i32 node:$Rm),
		(v2i32 node:$Rn)))>;
		def Neon_combine_4f : PatFrag<(ops node:$Rm, node:$Rn),
		(v4f32 (concat_vectors (v2f32 node:$Rm),
		(v2f32 node:$Rn)))>;
		def Neon_combine_2d : PatFrag<(ops node:$Rm, node:$Rn),
		(v2f64 (concat_vectors (v1f64 node:$Rm),
		(v1f64 node:$Rn)))>;

	def Neon_lshrImm8H : PatFrag<(ops node:$lhs, node:$rhs),	def Neon_lshrImm8H : PatFrag<(ops node:$lhs, node:$rhs),
	(v8i16 (srl (v8i16 node:$lhs),	(v8i16 (srl (v8i16 node:$lhs),
Context not available.
	imm:$Imm))),	imm:$Imm))),
	(SHRNvvi_2S VPR128:$Rn, imm:$Imm)>;	(SHRNvvi_2S VPR128:$Rn, imm:$Imm)>;

	def : Pat<(Neon_combine (v1i64 VPR64:$src), (v1i64 (bitconvert	def : Pat<(Neon_combine_2D (v1i64 VPR64:$src), (v1i64 (bitconvert
	(v8i8 (trunc (!cast<PatFrag>("Neon_" # shr # "Imm8H")	(v8i8 (trunc (!cast<PatFrag>("Neon_" # shr # "Imm8H")
	VPR128:$Rn, imm:$Imm)))))),	VPR128:$Rn, imm:$Imm)))))),
	(SHRNvvi_16B (SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),	(SHRNvvi_16B (SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),
	VPR128:$Rn, imm:$Imm)>;	VPR128:$Rn, imm:$Imm)>;
	def : Pat<(Neon_combine (v1i64 VPR64:$src), (v1i64 (bitconvert	def : Pat<(Neon_combine_2D (v1i64 VPR64:$src), (v1i64 (bitconvert
	(v4i16 (trunc (!cast<PatFrag>("Neon_" # shr # "Imm4S")	(v4i16 (trunc (!cast<PatFrag>("Neon_" # shr # "Imm4S")
	VPR128:$Rn, imm:$Imm)))))),	VPR128:$Rn, imm:$Imm)))))),
	(SHRNvvi_8H (SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),	(SHRNvvi_8H (SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),
	VPR128:$Rn, imm:$Imm)>;	VPR128:$Rn, imm:$Imm)>;
	def : Pat<(Neon_combine (v1i64 VPR64:$src), (v1i64 (bitconvert	def : Pat<(Neon_combine_2D (v1i64 VPR64:$src), (v1i64 (bitconvert
	(v2i32 (trunc (!cast<PatFrag>("Neon_" # shr # "Imm2D")	(v2i32 (trunc (!cast<PatFrag>("Neon_" # shr # "Imm2D")
	VPR128:$Rn, imm:$Imm)))))),	VPR128:$Rn, imm:$Imm)))))),
	(SHRNvvi_4S (SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),	(SHRNvvi_4S (SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),
Context not available.
	def : Pat<(v2i32 (op (v2i64 VPR128:$Rn), imm:$Imm)),	def : Pat<(v2i32 (op (v2i64 VPR128:$Rn), imm:$Imm)),
	(!cast<Instruction>(prefix # "_2S") VPR128:$Rn, imm:$Imm)>;	(!cast<Instruction>(prefix # "_2S") VPR128:$Rn, imm:$Imm)>;

	def : Pat<(Neon_combine (v1i64 VPR64:$src),	def : Pat<(Neon_combine_2D (v1i64 VPR64:$src),
	(v1i64 (bitconvert (v8i8 (op (v8i16 VPR128:$Rn), imm:$Imm))))),	(v1i64 (bitconvert (v8i8 (op (v8i16 VPR128:$Rn), imm:$Imm))))),
	(!cast<Instruction>(prefix # "_16B")	(!cast<Instruction>(prefix # "_16B")
	(SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),	(SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),
	VPR128:$Rn, imm:$Imm)>;	VPR128:$Rn, imm:$Imm)>;
	def : Pat<(Neon_combine (v1i64 VPR64:$src),	def : Pat<(Neon_combine_2D (v1i64 VPR64:$src),
	(v1i64 (bitconvert (v4i16 (op (v4i32 VPR128:$Rn), imm:$Imm))))),	(v1i64 (bitconvert (v4i16 (op (v4i32 VPR128:$Rn), imm:$Imm))))),
	(!cast<Instruction>(prefix # "_8H")	(!cast<Instruction>(prefix # "_8H")
	(SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),	(SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),
	VPR128:$Rn, imm:$Imm)>;	VPR128:$Rn, imm:$Imm)>;
	def : Pat<(Neon_combine (v1i64 VPR64:$src),	def : Pat<(Neon_combine_2D (v1i64 VPR64:$src),
	(v1i64 (bitconvert (v2i32 (op (v2i64 VPR128:$Rn), imm:$Imm))))),	(v1i64 (bitconvert (v2i32 (op (v2i64 VPR128:$Rn), imm:$Imm))))),
	(!cast<Instruction>(prefix # "_4S")	(!cast<Instruction>(prefix # "_4S")
	(SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),	(SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),
Context not available.
	// part.	// part.
	class NarrowHighHalfPat<Instruction INST, ValueType DstTy, ValueType SrcTy,	class NarrowHighHalfPat<Instruction INST, ValueType DstTy, ValueType SrcTy,
	SDPatternOperator coreop>	SDPatternOperator coreop>
	: Pat<(Neon_combine (v1i64 VPR64:$src),	: Pat<(Neon_combine_2D (v1i64 VPR64:$src),
	(v1i64 (bitconvert (DstTy (coreop (SrcTy VPR128:$Rn),	(v1i64 (bitconvert (DstTy (coreop (SrcTy VPR128:$Rn),
	(SrcTy VPR128:$Rm)))))),	(SrcTy VPR128:$Rm)))))),
	(INST (SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),	(INST (SUBREG_TO_REG (i64 0), VPR64:$src, sub_64),
Context not available.
	opnode, v2i64, v2i32>;	opnode, v2i64, v2i32>;
	}	}

	def Neon_smlal : PatFrag<(ops node:$Rd, node:$Rm, node:$Rn),	def Neon_smlal : PatFrag<(ops node:$Rd, node:$Rn, node:$Rm),
	(add node:$Rd,	(add node:$Rd,
	(int_arm_neon_vmulls node:$Rn, node:$Rm))>;	(int_arm_neon_vmulls node:$Rn, node:$Rm))>;

	def Neon_umlal : PatFrag<(ops node:$Rd, node:$Rm, node:$Rn),	def Neon_umlal : PatFrag<(ops node:$Rd, node:$Rn, node:$Rm),
	(add node:$Rd,	(add node:$Rd,
	(int_arm_neon_vmullu node:$Rn, node:$Rm))>;	(int_arm_neon_vmullu node:$Rn, node:$Rm))>;

	def Neon_smlsl : PatFrag<(ops node:$Rd, node:$Rm, node:$Rn),	def Neon_smlsl : PatFrag<(ops node:$Rd, node:$Rn, node:$Rm),
	(sub node:$Rd,	(sub node:$Rd,
	(int_arm_neon_vmulls node:$Rn, node:$Rm))>;	(int_arm_neon_vmulls node:$Rn, node:$Rm))>;

	def Neon_umlsl : PatFrag<(ops node:$Rd, node:$Rm, node:$Rn),	def Neon_umlsl : PatFrag<(ops node:$Rd, node:$Rn, node:$Rm),
	(sub node:$Rd,	(sub node:$Rd,
	(int_arm_neon_vmullu node:$Rn, node:$Rm))>;	(int_arm_neon_vmullu node:$Rn, node:$Rm))>;

Context not available.
	let Constraints = "$src = $Rd";	let Constraints = "$src = $Rd";
	}	}

		// The followings are for instruction class (3V Elem)

		// Variant 1

		class NI_2VE<bit q, bit u, bits<2> size, bits<4> opcode,
		string asmop, string ResS, string OpS, string EleOpS,
		Operand OpImm, RegisterOperand ResVPR,
		RegisterOperand OpVPR, RegisterOperand EleOpVPR>
		: NeonI_2VElem<q, u, size, opcode,
		(outs ResVPR:$Rd), (ins ResVPR:$src, OpVPR:$Rn,
		EleOpVPR:$Re, OpImm:$Index),
		t.p.northoverUnsubmitted Not Done Reply Inline Actions The documentation still uses "Rm" for this register, doesn't it? Inventing "Re" seems confusing. t.p.northover: The documentation still uses "Rm" for this register, doesn't it? Inventing "Re" seems confusing.
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions Re is different from Rm and the Rm in documentation for this instruction class is different from the Rm defined for A64Inst. Rm is being defined in llvm as Inst{20-16}, while Re is not, because Inst{20} is being shared for both Rm and bit 'M'. This is also why the index is restricted to V0-V15 for vector with H element, I think. Jiangning: Re is different from Rm and the Rm in documentation for this instruction class is different…
		asmop # "\t$Rd." # ResS # ", $Rn." # OpS #
		", $Re." # EleOpS # "[$Index]",
		[],
		NoItinerary> {
		bits<3> Index;
		bits<5> Re;

		let Constraints = "$src = $Rd";
		}

		multiclass NI_2VE_v1<bit u, bits<4> opcode, string asmop>
		t.p.northoverUnsubmitted Not Done Reply Inline Actions (Not a comment on this patch, I don't have a good solution. But we really need think about what to do with these instruction format names. Perhaps pass them through MD5 to make them more readable or something.) t.p.northover: (Not a comment on this patch, I don't have a good solution. But we really need think about what…
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions I don't have solution either. I'm trying to make differences for different patterns by describing 1) argument types 2) encoding variants. The encoding variants are purely caused by encoding limitation, so it's really hard to give an even meaningful name. Jiangning: I don't have solution either. I'm trying to make differences for different patterns by…
		{
		// vector register class for element is always 128-bit to cover the max index
		def _2s4s : NI_2VE<0b0, u, 0b10, opcode, asmop, "2s", "2s", "s",
		neon_uimm2_bare, VPR64, VPR64, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		def _4s4s : NI_2VE<0b1, u, 0b10, opcode, asmop, "4s", "4s", "s",
		neon_uimm2_bare, VPR128, VPR128, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		// Index operations on 16-bit(H) elements are restricted to using v0-v15.
		def _4h8h : NI_2VE<0b0, u, 0b01, opcode, asmop, "4h", "4h", "h",
		neon_uimm3_bare, VPR64, VPR64, VPR128Lo> {
		let Inst{11} = {Index{2}};
		let Inst{21} = {Index{1}};
		let Inst{20} = {Index{0}};
		let Inst{19-16} = Re{3-0};
		}

		def _8h8h : NI_2VE<0b1, u, 0b01, opcode, asmop, "8h", "8h", "h",
		neon_uimm3_bare, VPR128, VPR128, VPR128Lo> {
		let Inst{11} = {Index{2}};
		let Inst{21} = {Index{1}};
		let Inst{20} = {Index{0}};
		let Inst{19-16} = Re{3-0};
		}
		}

		defm MLAvve : NI_2VE_v1<0b1, 0b0000, "mla">;
		defm MLSvve : NI_2VE_v1<0b1, 0b0100, "mls">;

		// Pattern for lane with q
		class NI_2VE_laneq<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand ResVPR, RegisterOperand OpVPR,
		RegisterOperand EleOpVPR, ValueType ResTy, ValueType OpTy,
		ValueType EleOpTy, SDPatternOperator coreop>
		: Pat<(ResTy (op (ResTy ResVPR:$src), (OpTy OpVPR:$Rn),
		(OpTy (coreop (EleOpTy EleOpVPR:$Re), (i64 OpImm:$Index))))),
		(INST ResVPR:$src, OpVPR:$Rn, EleOpVPR:$Re, OpImm:$Index)>;

		// Pattern for lane without q
		class NI_2VE_lane<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand ResVPR, RegisterOperand OpVPR,
		RegisterOperand EleOpVPR, ValueType ResTy, ValueType OpTy,
		ValueType EleOpTy, SDPatternOperator coreop>
		: Pat<(ResTy (op (ResTy ResVPR:$src), (OpTy OpVPR:$Rn),
		(OpTy (coreop (EleOpTy EleOpVPR:$Re), (i64 OpImm:$Index))))),
		(INST ResVPR:$src, OpVPR:$Rn,
		(SUBREG_TO_REG (i64 0), EleOpVPR:$Re, sub_64), OpImm:$Index)>;

		multiclass NI_2VE_v1_pat<string subop, SDPatternOperator op>
		{
		def : NI_2VE_laneq<!cast<Instruction>(subop # "_2s4s"), neon_uimm2_bare,
		op, VPR64, VPR64, VPR128, v2i32, v2i32, v4i32,
		BinOpFrag<(Neon_vduplane
		(Neon_low4S node:$LHS), node:$RHS)>>;

		def : NI_2VE_laneq<!cast<Instruction>(subop # "_4s4s"), neon_uimm2_bare,
		op, VPR128, VPR128, VPR128, v4i32, v4i32, v4i32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_laneq<!cast<Instruction>(subop # "_4h8h"), neon_uimm3_bare,
		op, VPR64, VPR64, VPR128Lo, v4i16, v4i16, v8i16,
		BinOpFrag<(Neon_vduplane
		(Neon_low8H node:$LHS), node:$RHS)>>;

		def : NI_2VE_laneq<!cast<Instruction>(subop # "_8h8h"), neon_uimm3_bare,
		op, VPR128, VPR128, VPR128Lo, v8i16, v8i16, v8i16,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		// Index can only be half of the max value for lane without q

		def : NI_2VE_lane<!cast<Instruction>(subop # "_2s4s"), neon_uimm1_bare,
		op, VPR64, VPR64, VPR64, v2i32, v2i32, v2i32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_lane<!cast<Instruction>(subop # "_4s4s"), neon_uimm1_bare,
		op, VPR128, VPR128, VPR64, v4i32, v4i32, v2i32,
		BinOpFrag<(Neon_vduplane
		(Neon_combine_4S node:$LHS, undef),
		node:$RHS)>>;

		def : NI_2VE_lane<!cast<Instruction>(subop # "_4h8h"), neon_uimm2_bare,
		op, VPR64, VPR64, VPR64Lo, v4i16, v4i16, v4i16,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_lane<!cast<Instruction>(subop # "_8h8h"), neon_uimm2_bare,
		op, VPR128, VPR128, VPR64Lo, v8i16, v8i16, v4i16,
		BinOpFrag<(Neon_vduplane
		(Neon_combine_8H node:$LHS, undef),
		node:$RHS)>>;
		}

		defm MLA_lane_v1 : NI_2VE_v1_pat<"MLAvve", Neon_mla>;
		defm MLS_lane_v1 : NI_2VE_v1_pat<"MLSvve", Neon_mls>;

		class NI_2VE_2op<bit q, bit u, bits<2> size, bits<4> opcode,
		string asmop, string ResS, string OpS, string EleOpS,
		Operand OpImm, RegisterOperand ResVPR,
		RegisterOperand OpVPR, RegisterOperand EleOpVPR>
		: NeonI_2VElem<q, u, size, opcode,
		(outs ResVPR:$Rd), (ins OpVPR:$Rn,
		EleOpVPR:$Re, OpImm:$Index),
		asmop # "\t$Rd." # ResS # ", $Rn." # OpS #
		", $Re." # EleOpS # "[$Index]",
		[],
		NoItinerary> {
		bits<3> Index;
		bits<5> Re;
		}

		multiclass NI_2VE_v1_2op<bit u, bits<4> opcode, string asmop>
		{
		// vector register class for element is always 128-bit to cover the max index
		def _2s4s : NI_2VE_2op<0b0, u, 0b10, opcode, asmop, "2s", "2s", "s",
		neon_uimm2_bare, VPR64, VPR64, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		def _4s4s : NI_2VE_2op<0b1, u, 0b10, opcode, asmop, "4s", "4s", "s",
		neon_uimm2_bare, VPR128, VPR128, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		// Index operations on 16-bit(H) elements are restricted to using v0-v15.
		def _4h8h : NI_2VE_2op<0b0, u, 0b01, opcode, asmop, "4h", "4h", "h",
		neon_uimm3_bare, VPR64, VPR64, VPR128Lo> {
		let Inst{11} = {Index{2}};
		let Inst{21} = {Index{1}};
		let Inst{20} = {Index{0}};
		let Inst{19-16} = Re{3-0};
		}

		def _8h8h : NI_2VE_2op<0b1, u, 0b01, opcode, asmop, "8h", "8h", "h",
		neon_uimm3_bare, VPR128, VPR128, VPR128Lo> {
		let Inst{11} = {Index{2}};
		let Inst{21} = {Index{1}};
		let Inst{20} = {Index{0}};
		let Inst{19-16} = Re{3-0};
		}
		}

		defm MULve : NI_2VE_v1_2op<0b0, 0b1000, "mul">;
		defm SQDMULHve : NI_2VE_v1_2op<0b0, 0b1100, "sqdmulh">;
		defm SQRDMULHve : NI_2VE_v1_2op<0b0, 0b1101, "sqrdmulh">;

		// Pattern for lane with q
		class NI_2VE_mul_laneq<Instruction INST, Operand OpImm, SDPatternOperator op,
		t.p.northoverUnsubmitted Not Done Reply Inline Actions 'q' means nothing on AArch64 NEON. It's like writing "pattern for lane with ymm". t.p.northover: 'q' means nothing on AArch64 NEON. It's like writing "pattern for lane with ymm".
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions OK. I will change "without q' to 'in 64-bit vector', and change 'with q' to '128-bit vector'. Jiangning: OK. I will change "without q' to 'in 64-bit vector', and change 'with q' to '128-bit vector'.
		RegisterOperand OpVPR, RegisterOperand EleOpVPR,
		ValueType ResTy, ValueType OpTy, ValueType EleOpTy,
		SDPatternOperator coreop>
		: Pat<(ResTy (op (OpTy OpVPR:$Rn),
		(OpTy (coreop (EleOpTy EleOpVPR:$Re), (i64 OpImm:$Index))))),
		(INST OpVPR:$Rn, EleOpVPR:$Re, OpImm:$Index)>;

		// Pattern for lane without q
		class NI_2VE_mul_lane<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand OpVPR, RegisterOperand EleOpVPR,
		ValueType ResTy, ValueType OpTy, ValueType EleOpTy,
		SDPatternOperator coreop>
		: Pat<(ResTy (op (OpTy OpVPR:$Rn),
		(OpTy (coreop (EleOpTy EleOpVPR:$Re), (i64 OpImm:$Index))))),
		(INST OpVPR:$Rn,
		(SUBREG_TO_REG (i64 0), EleOpVPR:$Re, sub_64), OpImm:$Index)>;

		multiclass NI_2VE_mul_v1_pat<string subop, SDPatternOperator op>
		{
		def : NI_2VE_mul_laneq<!cast<Instruction>(subop # "_2s4s"), neon_uimm2_bare,
		op, VPR64, VPR128, v2i32, v2i32, v4i32,
		BinOpFrag<(Neon_vduplane
		(Neon_low4S node:$LHS), node:$RHS)>>;

		def : NI_2VE_mul_laneq<!cast<Instruction>(subop # "_4s4s"), neon_uimm2_bare,
		op, VPR128, VPR128, v4i32, v4i32, v4i32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_mul_laneq<!cast<Instruction>(subop # "_4h8h"), neon_uimm3_bare,
		op, VPR64, VPR128Lo, v4i16, v4i16, v8i16,
		BinOpFrag<(Neon_vduplane
		(Neon_low8H node:$LHS), node:$RHS)>>;

		def : NI_2VE_mul_laneq<!cast<Instruction>(subop # "_8h8h"), neon_uimm3_bare,
		op, VPR128, VPR128Lo, v8i16, v8i16, v8i16,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		// Index can only be half of the max value for lane without q

		def : NI_2VE_mul_lane<!cast<Instruction>(subop # "_2s4s"), neon_uimm1_bare,
		op, VPR64, VPR64, v2i32, v2i32, v2i32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_mul_lane<!cast<Instruction>(subop # "_4s4s"), neon_uimm1_bare,
		op, VPR128, VPR64, v4i32, v4i32, v2i32,
		BinOpFrag<(Neon_vduplane
		(Neon_combine_4S node:$LHS, undef),
		node:$RHS)>>;

		def : NI_2VE_mul_lane<!cast<Instruction>(subop # "_4h8h"), neon_uimm2_bare,
		op, VPR64, VPR64Lo, v4i16, v4i16, v4i16,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_mul_lane<!cast<Instruction>(subop # "_8h8h"), neon_uimm2_bare,
		op, VPR128, VPR64Lo, v8i16, v8i16, v4i16,
		BinOpFrag<(Neon_vduplane
		(Neon_combine_8H node:$LHS, undef),
		node:$RHS)>>;
		}

		defm MUL_lane_v1 : NI_2VE_mul_v1_pat<"MULve", mul>;
		defm SQDMULH_lane_v1 : NI_2VE_mul_v1_pat<"SQDMULHve", int_arm_neon_vqdmulh>;
		defm SQRDMULH_lane_v1 : NI_2VE_mul_v1_pat<"SQRDMULHve", int_arm_neon_vqrdmulh>;

		// Variant 2

		multiclass NI_2VE_v2<bit u, bits<4> opcode, string asmop>
		{
		// vector register class for element is always 128-bit to cover the max index
		def _2s4s : NI_2VE<0b0, u, 0b10, opcode, asmop, "2s", "2s", "s",
		neon_uimm2_bare, VPR64, VPR64, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		def _4s4s : NI_2VE<0b1, u, 0b10, opcode, asmop, "4s", "4s", "s",
		neon_uimm2_bare, VPR128, VPR128, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		// _1d2d doesn't exist!

		def _2d2d : NI_2VE<0b1, u, 0b11, opcode, asmop, "2d", "2d", "d",
		neon_uimm1_bare, VPR128, VPR128, VPR128> {
		let Inst{11} = {Index{0}};
		let Inst{21} = 0b0;
		let Inst{20-16} = Re;
		}
		}

		defm FMLAvve : NI_2VE_v2<0b0, 0b0001, "fmla">;
		defm FMLSvve : NI_2VE_v2<0b0, 0b0101, "fmls">;

		// Pattern for lane without q
		class NI_2VE_lane_2d2d<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand ResVPR, RegisterOperand OpVPR,
		RegisterOperand EleOpVPR, ValueType ResTy,
		ValueType OpTy, ValueType EleOpTy,
		SDPatternOperator coreop>
		: Pat<(ResTy (op (ResTy ResVPR:$src), (OpTy OpVPR:$Rn),
		(OpTy (coreop (EleOpTy EleOpVPR:$Re), (EleOpTy EleOpVPR:$Re))))),
		(INST ResVPR:$src, OpVPR:$Rn,
		(SUBREG_TO_REG (i64 0), EleOpVPR:$Re, sub_64), 0)>;

		multiclass NI_2VE_v2_pat<string subop, SDPatternOperator op>
		{
		def : NI_2VE_laneq<!cast<Instruction>(subop # "_2s4s"), neon_uimm2_bare,
		op, VPR64, VPR64, VPR128, v2f32, v2f32, v4f32,
		BinOpFrag<(Neon_vduplane
		(Neon_low4f node:$LHS), node:$RHS)>>;

		def : NI_2VE_laneq<!cast<Instruction>(subop # "_4s4s"), neon_uimm2_bare,
		op, VPR128, VPR128, VPR128, v4f32, v4f32, v4f32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_laneq<!cast<Instruction>(subop # "_2d2d"), neon_uimm1_bare,
		op, VPR128, VPR128, VPR128, v2f64, v2f64, v2f64,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		// Index can only be half of the max value for lane without q

		def : NI_2VE_lane<!cast<Instruction>(subop # "_2s4s"), neon_uimm1_bare,
		op, VPR64, VPR64, VPR64, v2f32, v2f32, v2f32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_lane<!cast<Instruction>(subop # "_4s4s"), neon_uimm1_bare,
		op, VPR128, VPR128, VPR64, v4f32, v4f32, v2f32,
		BinOpFrag<(Neon_vduplane
		(Neon_combine_4f node:$LHS, undef),
		node:$RHS)>>;

		def : NI_2VE_lane_2d2d<!cast<Instruction>(subop # "_2d2d"), neon_uimm1_bare,
		op, VPR128, VPR128, VPR64, v2f64, v2f64, v1f64,
		BinOpFrag<(Neon_combine_2d node:$LHS, node:$RHS)>>;
		}

		defm FMLA_lane_v2 : NI_2VE_v2_pat<"FMLAvve", Neon_fmla>;
		defm FMLS_lane_v2 : NI_2VE_v2_pat<"FMLSvve", Neon_fmls>;

		multiclass NI_2VE_v2_2op<bit u, bits<4> opcode, string asmop>
		{
		// vector register class for element is always 128-bit to cover the max index
		def _2s4s : NI_2VE_2op<0b0, u, 0b10, opcode, asmop, "2s", "2s", "s",
		neon_uimm2_bare, VPR64, VPR64, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		def _4s4s : NI_2VE_2op<0b1, u, 0b10, opcode, asmop, "4s", "4s", "s",
		neon_uimm2_bare, VPR128, VPR128, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		// _1d2d doesn't exist!

		def _2d2d : NI_2VE_2op<0b1, u, 0b11, opcode, asmop, "2d", "2d", "d",
		neon_uimm1_bare, VPR128, VPR128, VPR128> {
		let Inst{11} = {Index{0}};
		let Inst{21} = 0b0;
		let Inst{20-16} = Re;
		}
		}

		defm FMULve : NI_2VE_v2_2op<0b0, 0b1001, "fmul">;
		defm FMULXve : NI_2VE_v2_2op<0b1, 0b1001, "fmulx">;

		class NI_2VE_mul_lane_2d<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand OpVPR, RegisterOperand EleOpVPR,
		ValueType ResTy, ValueType OpTy, ValueType EleOpTy,
		SDPatternOperator coreop>
		: Pat<(ResTy (op (OpTy OpVPR:$Rn),
		(OpTy (coreop (EleOpTy EleOpVPR:$Re), (EleOpTy EleOpVPR:$Re))))),
		(INST OpVPR:$Rn,
		(SUBREG_TO_REG (i64 0), EleOpVPR:$Re, sub_64), 0)>;

		multiclass NI_2VE_mul_v2_pat<string subop, SDPatternOperator op>
		{
		def : NI_2VE_mul_laneq<!cast<Instruction>(subop # "_2s4s"), neon_uimm2_bare,
		op, VPR64, VPR128, v2f32, v2f32, v4f32,
		BinOpFrag<(Neon_vduplane
		(Neon_low4f node:$LHS), node:$RHS)>>;

		def : NI_2VE_mul_laneq<!cast<Instruction>(subop # "_4s4s"), neon_uimm2_bare,
		op, VPR128, VPR128, v4f32, v4f32, v4f32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_mul_laneq<!cast<Instruction>(subop # "_2d2d"), neon_uimm1_bare,
		op, VPR128, VPR128, v2f64, v2f64, v2f64,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		// Index can only be half of the max value for lane without q

		def : NI_2VE_mul_lane<!cast<Instruction>(subop # "_2s4s"), neon_uimm1_bare,
		op, VPR64, VPR64, v2f32, v2f32, v2f32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_mul_lane<!cast<Instruction>(subop # "_4s4s"), neon_uimm1_bare,
		op, VPR128, VPR64, v4f32, v4f32, v2f32,
		BinOpFrag<(Neon_vduplane
		(Neon_combine_4f node:$LHS, undef),
		node:$RHS)>>;

		def : NI_2VE_mul_lane_2d<!cast<Instruction>(subop # "_2d2d"), neon_uimm1_bare,
		op, VPR128, VPR64, v2f64, v2f64, v1f64,
		BinOpFrag<(Neon_combine_2d node:$LHS, node:$RHS)>>;
		}

		defm FMUL_lane_v2 : NI_2VE_mul_v2_pat<"FMULve", fmul>;
		defm FMULX_lane_v2 : NI_2VE_mul_v2_pat<"FMULXve", int_aarch64_neon_vmulx>;

		// The followings are patterns using fma
		// -ffp-contract=fast enforces to generate fma
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Perhaps "-ffp-contract=fast generates fma". "enforces to generate" doesn't work in English. t.p.northover: Perhaps "-ffp-contract=fast generates fma". "enforces to generate" doesn't work in English.
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions OK. I thought I wrote 'force' rather than 'enforce'. :-) Anyway, I will change to the wording you gave. Jiangning: OK. I thought I wrote 'force' rather than 'enforce'. :-) Anyway, I will change to the wording…

		// Pattern for lane with q
		class NI_2VEswap_laneq<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand ResVPR, RegisterOperand OpVPR,
		ValueType ResTy, ValueType OpTy,
		SDPatternOperator coreop>
		: Pat<(ResTy (op (ResTy (coreop (OpTy OpVPR:$Re), (i64 OpImm:$Index))),
		(ResTy ResVPR:$src), (ResTy ResVPR:$Rn))),
		(INST ResVPR:$src, ResVPR:$Rn, OpVPR:$Re, OpImm:$Index)>;

		// Pattern for lane without q
		class NI_2VEswap_lane<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand ResVPR, RegisterOperand OpVPR,
		ValueType ResTy, ValueType OpTy,
		SDPatternOperator coreop>
		: Pat<(ResTy (op (ResTy (coreop (OpTy OpVPR:$Re), (i64 OpImm:$Index))),
		(ResTy ResVPR:$Rn), (ResTy ResVPR:$src))),
		(INST ResVPR:$src, ResVPR:$Rn,
		(SUBREG_TO_REG (i64 0), OpVPR:$Re, sub_64), OpImm:$Index)>;

		// Pattern for lane without q
		class NI_2VEswap_lane_2d2d<Instruction INST, Operand OpImm,
		SDPatternOperator op,
		RegisterOperand ResVPR, RegisterOperand OpVPR,
		ValueType ResTy, ValueType OpTy,
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions Some new patterns are added to capture a + b * (-c). Jiangning: Some new patterns are added to capture a + b * (-c).
		SDPatternOperator coreop>
		: Pat<(ResTy (op (ResTy (coreop (OpTy OpVPR:$Re), (OpTy OpVPR:$Re))),
		(ResTy ResVPR:$Rn), (ResTy ResVPR:$src))),
		(INST ResVPR:$src, ResVPR:$Rn,
		(SUBREG_TO_REG (i64 0), OpVPR:$Re, sub_64), 0)>;


		multiclass NI_2VE_fma_v2_pat<string subop, SDPatternOperator op>
		{
		def : NI_2VEswap_laneq<!cast<Instruction>(subop # "_2s4s"),
		neon_uimm2_bare, op, VPR64, VPR128, v2f32, v4f32,
		BinOpFrag<(Neon_vduplane
		(Neon_low4f node:$LHS), node:$RHS)>>;

		def : NI_2VEswap_laneq<!cast<Instruction>(subop # "_4s4s"),
		neon_uimm2_bare, op, VPR128, VPR128, v4f32, v4f32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VEswap_laneq<!cast<Instruction>(subop # "_2d2d"),
		neon_uimm1_bare, op, VPR128, VPR128, v2f64, v2f64,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		// Index can only be half of the max value for lane without q

		def : NI_2VEswap_lane<!cast<Instruction>(subop # "_2s4s"),
		neon_uimm1_bare, op, VPR64, VPR64, v2f32, v2f32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VEswap_lane<!cast<Instruction>(subop # "_4s4s"),
		neon_uimm1_bare, op, VPR128, VPR64, v4f32, v2f32,
		BinOpFrag<(Neon_vduplane
		(Neon_combine_4f node:$LHS, undef),
		node:$RHS)>>;

		def : NI_2VEswap_lane_2d2d<!cast<Instruction>(subop # "_2d2d"),
		neon_uimm1_bare, op, VPR128, VPR64, v2f64, v1f64,
		BinOpFrag<(Neon_combine_2d node:$LHS, node:$RHS)>>;
		}

		defm FMLA_lane_v2_s : NI_2VE_fma_v2_pat<"FMLAvve", fma>;

		multiclass NI_2VE_fms_v2_pat<string subop, SDPatternOperator op>
		{
		def : NI_2VEswap_laneq<!cast<Instruction>(subop # "_2s4s"),
		neon_uimm2_bare, op, VPR64, VPR128, v2f32, v4f32,
		BinOpFrag<(fneg (Neon_vduplane
		(Neon_low4f node:$LHS), node:$RHS))>>;

		def : NI_2VEswap_laneq<!cast<Instruction>(subop # "_4s4s"),
		neon_uimm2_bare, op, VPR128, VPR128, v4f32, v4f32,
		BinOpFrag<(fneg (Neon_vduplane
		node:$LHS, node:$RHS))>>;

		def : NI_2VEswap_laneq<!cast<Instruction>(subop # "_2d2d"),
		neon_uimm1_bare, op, VPR128, VPR128, v2f64, v2f64,
		BinOpFrag<(fneg (Neon_vduplane
		node:$LHS, node:$RHS))>>;

		// Index can only be half of the max value for lane without q

		def : NI_2VEswap_lane<!cast<Instruction>(subop # "_2s4s"),
		neon_uimm1_bare, op, VPR64, VPR64, v2f32, v2f32,
		BinOpFrag<(fneg (Neon_vduplane
		node:$LHS, node:$RHS))>>;

		def : NI_2VEswap_lane<!cast<Instruction>(subop # "_4s4s"),
		neon_uimm1_bare, op, VPR128, VPR64, v4f32, v2f32,
		BinOpFrag<(fneg (Neon_vduplane
		(Neon_combine_4f node:$LHS, undef),
		node:$RHS))>>;

		def : NI_2VEswap_lane_2d2d<!cast<Instruction>(subop # "_2d2d"),
		neon_uimm1_bare, op, VPR128, VPR64, v2f64, v1f64,
		BinOpFrag<(fneg
		(Neon_combine_2d node:$LHS,
		node:$RHS))>>;
		}

		defm FMLS_lane_v2_s : NI_2VE_fms_v2_pat<"FMLSvve", fma>;

		// Variant 3: Long type
		// E.g. SMLAL : 4S/4H/H (v0-v15), 2D/2S/S
		// SMLAL2: 4S/8H/H (v0-v15), 2D/4S/S

		multiclass NI_2VE_v3<bit u, bits<4> opcode, string asmop>
		{
		// vector register class for element is always 128-bit to cover the max index
		def _2d2s : NI_2VE<0b0, u, 0b10, opcode, asmop, "2d", "2s", "s",
		neon_uimm2_bare, VPR128, VPR64, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		def _2d4s : NI_2VE<0b1, u, 0b10, opcode, asmop # "2", "2d", "4s", "s",
		neon_uimm2_bare, VPR128, VPR128, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		// Index operations on 16-bit(H) elements are restricted to using v0-v15.
		def _4s8h : NI_2VE<0b1, u, 0b01, opcode, asmop # "2", "4s", "8h", "h",
		neon_uimm3_bare, VPR128, VPR128, VPR128Lo> {
		let Inst{11} = {Index{2}};
		let Inst{21} = {Index{1}};
		let Inst{20} = {Index{0}};
		let Inst{19-16} = Re{3-0};
		}

		def _4s4h : NI_2VE<0b0, u, 0b01, opcode, asmop, "4s", "4h", "h",
		neon_uimm3_bare, VPR128, VPR64, VPR128Lo> {
		let Inst{11} = {Index{2}};
		let Inst{21} = {Index{1}};
		let Inst{20} = {Index{0}};
		let Inst{19-16} = Re{3-0};
		}
		}

		defm SMLALvve : NI_2VE_v3<0b0, 0b0010, "smlal">;
		defm UMLALvve : NI_2VE_v3<0b1, 0b0010, "umlal">;
		defm SMLSLvve : NI_2VE_v3<0b0, 0b0110, "smlsl">;
		defm UMLSLvve : NI_2VE_v3<0b1, 0b0110, "umlsl">;
		defm SQDMLALvve : NI_2VE_v3<0b0, 0b0011, "sqdmlal">;
		defm SQDMLSLvve : NI_2VE_v3<0b0, 0b0111, "sqdmlsl">;

		multiclass NI_2VE_v3_2op<bit u, bits<4> opcode, string asmop>
		{
		// vector register class for element is always 128-bit to cover the max index
		def _2d2s : NI_2VE_2op<0b0, u, 0b10, opcode, asmop, "2d", "2s", "s",
		neon_uimm2_bare, VPR128, VPR64, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		def _2d4s : NI_2VE_2op<0b1, u, 0b10, opcode, asmop # "2", "2d", "4s", "s",
		neon_uimm2_bare, VPR128, VPR128, VPR128> {
		let Inst{11} = {Index{1}};
		let Inst{21} = {Index{0}};
		let Inst{20-16} = Re;
		}

		// Index operations on 16-bit(H) elements are restricted to using v0-v15.
		def _4s8h : NI_2VE_2op<0b1, u, 0b01, opcode, asmop # "2", "4s", "8h", "h",
		neon_uimm3_bare, VPR128, VPR128, VPR128Lo> {
		let Inst{11} = {Index{2}};
		let Inst{21} = {Index{1}};
		let Inst{20} = {Index{0}};
		let Inst{19-16} = Re{3-0};
		}

		def _4s4h : NI_2VE_2op<0b0, u, 0b01, opcode, asmop, "4s", "4h", "h",
		neon_uimm3_bare, VPR128, VPR64, VPR128Lo> {
		let Inst{11} = {Index{2}};
		let Inst{21} = {Index{1}};
		let Inst{20} = {Index{0}};
		let Inst{19-16} = Re{3-0};
		}
		}

		defm SMULLve : NI_2VE_v3_2op<0b0, 0b1010, "smull">;
		defm UMULLve : NI_2VE_v3_2op<0b1, 0b1010, "umull">;
		defm SQDMULLve : NI_2VE_v3_2op<0b0, 0b1011, "sqdmull">;

		// Pattern for lane with q
		class NI_2VEL2_laneq<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand EleOpVPR, ValueType ResTy,
		ValueType OpTy, ValueType EleOpTy, ValueType HalfOpTy,
		SDPatternOperator hiop, SDPatternOperator coreop>
		: Pat<(ResTy (op (ResTy VPR128:$src),
		(HalfOpTy (hiop (OpTy VPR128:$Rn))),
		(HalfOpTy (coreop (EleOpTy EleOpVPR:$Re), (i64 OpImm:$Index))))),
		(INST VPR128:$src, VPR128:$Rn, EleOpVPR:$Re, OpImm:$Index)>;

		// Pattern for lane without q
		class NI_2VEL2_lane<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand EleOpVPR, ValueType ResTy,
		ValueType OpTy, ValueType EleOpTy, ValueType HalfOpTy,
		SDPatternOperator hiop, SDPatternOperator coreop>
		: Pat<(ResTy (op (ResTy VPR128:$src),
		(HalfOpTy (hiop (OpTy VPR128:$Rn))),
		(HalfOpTy (coreop (EleOpTy EleOpVPR:$Re), (i64 OpImm:$Index))))),
		(INST VPR128:$src, VPR128:$Rn,
		(SUBREG_TO_REG (i64 0), EleOpVPR:$Re, sub_64), OpImm:$Index)>;

		multiclass NI_2VEL_v3_pat<string subop, SDPatternOperator op>
		{
		def : NI_2VE_laneq<!cast<Instruction>(subop # "_4s4h"), neon_uimm3_bare,
		op, VPR128, VPR64, VPR128Lo, v4i32, v4i16, v8i16,
		BinOpFrag<(Neon_vduplane
		(Neon_low8H node:$LHS), node:$RHS)>>;

		def : NI_2VE_laneq<!cast<Instruction>(subop # "_2d2s"), neon_uimm2_bare,
		op, VPR128, VPR64, VPR128, v2i64, v2i32, v4i32,
		BinOpFrag<(Neon_vduplane
		(Neon_low4S node:$LHS), node:$RHS)>>;

		def : NI_2VEL2_laneq<!cast<Instruction>(subop # "_4s8h"), neon_uimm3_bare,
		op, VPR128Lo, v4i32, v8i16, v8i16, v4i16, Neon_top8H,
		BinOpFrag<(Neon_vduplane
		(Neon_low8H node:$LHS), node:$RHS)>>;

		def : NI_2VEL2_laneq<!cast<Instruction>(subop # "_2d4s"), neon_uimm2_bare,
		op, VPR128, v2i64, v4i32, v4i32, v2i32, Neon_top4S,
		BinOpFrag<(Neon_vduplane
		(Neon_low4S node:$LHS), node:$RHS)>>;

		// Index can only be half of the max value for lane without q

		def : NI_2VE_lane<!cast<Instruction>(subop # "_4s4h"), neon_uimm2_bare,
		op, VPR128, VPR64, VPR64Lo, v4i32, v4i16, v4i16,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_lane<!cast<Instruction>(subop # "_2d2s"), neon_uimm1_bare,
		op, VPR128, VPR64, VPR64, v2i64, v2i32, v2i32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VEL2_lane<!cast<Instruction>(subop # "_4s8h"), neon_uimm2_bare,
		op, VPR64Lo, v4i32, v8i16, v4i16, v4i16, Neon_top8H,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VEL2_lane<!cast<Instruction>(subop # "_2d4s"), neon_uimm1_bare,
		op, VPR64, v2i64, v4i32, v2i32, v2i32, Neon_top4S,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;
		}

		defm SMLAL_lane_v3 : NI_2VEL_v3_pat<"SMLALvve", Neon_smlal>;
		defm UMLAL_lane_v3 : NI_2VEL_v3_pat<"UMLALvve", Neon_umlal>;
		defm SMLSL_lane_v3 : NI_2VEL_v3_pat<"SMLSLvve", Neon_smlsl>;
		defm UMLSL_lane_v3 : NI_2VEL_v3_pat<"UMLSLvve", Neon_umlsl>;

		// Pattern for lane with q
		class NI_2VEL2_mul_laneq<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand EleOpVPR, ValueType ResTy,
		ValueType OpTy, ValueType EleOpTy, ValueType HalfOpTy,
		SDPatternOperator hiop, SDPatternOperator coreop>
		: Pat<(ResTy (op
		(HalfOpTy (hiop (OpTy VPR128:$Rn))),
		(HalfOpTy (coreop (EleOpTy EleOpVPR:$Re), (i64 OpImm:$Index))))),
		(INST VPR128:$Rn, EleOpVPR:$Re, OpImm:$Index)>;

		// Pattern for lane without q
		class NI_2VEL2_mul_lane<Instruction INST, Operand OpImm, SDPatternOperator op,
		RegisterOperand EleOpVPR, ValueType ResTy,
		ValueType OpTy, ValueType EleOpTy, ValueType HalfOpTy,
		SDPatternOperator hiop, SDPatternOperator coreop>
		: Pat<(ResTy (op
		(HalfOpTy (hiop (OpTy VPR128:$Rn))),
		(HalfOpTy (coreop (EleOpTy EleOpVPR:$Re), (i64 OpImm:$Index))))),
		(INST VPR128:$Rn,
		(SUBREG_TO_REG (i64 0), EleOpVPR:$Re, sub_64), OpImm:$Index)>;

		multiclass NI_2VEL_mul_v3_pat<string subop, SDPatternOperator op>
		{
		def : NI_2VE_mul_laneq<!cast<Instruction>(subop # "_4s4h"), neon_uimm3_bare,
		op, VPR64, VPR128Lo, v4i32, v4i16, v8i16,
		BinOpFrag<(Neon_vduplane
		(Neon_low8H node:$LHS), node:$RHS)>>;

		def : NI_2VE_mul_laneq<!cast<Instruction>(subop # "_2d2s"), neon_uimm2_bare,
		op, VPR64, VPR128, v2i64, v2i32, v4i32,
		BinOpFrag<(Neon_vduplane
		(Neon_low4S node:$LHS), node:$RHS)>>;

		def : NI_2VEL2_mul_laneq<!cast<Instruction>(subop # "_4s8h"), neon_uimm3_bare,
		op, VPR128Lo, v4i32, v8i16, v8i16, v4i16, Neon_top8H,
		BinOpFrag<(Neon_vduplane
		(Neon_low8H node:$LHS), node:$RHS)>>;

		def : NI_2VEL2_mul_laneq<!cast<Instruction>(subop # "_2d4s"), neon_uimm2_bare,
		op, VPR128, v2i64, v4i32, v4i32, v2i32, Neon_top4S,
		BinOpFrag<(Neon_vduplane
		(Neon_low4S node:$LHS), node:$RHS)>>;

		// Index can only be half of the max value for lane without q

		def : NI_2VE_mul_lane<!cast<Instruction>(subop # "_4s4h"), neon_uimm2_bare,
		op, VPR64, VPR64Lo, v4i32, v4i16, v4i16,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_mul_lane<!cast<Instruction>(subop # "_2d2s"), neon_uimm1_bare,
		op, VPR64, VPR64, v2i64, v2i32, v2i32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VEL2_mul_lane<!cast<Instruction>(subop # "_4s8h"), neon_uimm2_bare,
		op, VPR64Lo, v4i32, v8i16, v4i16, v4i16, Neon_top8H,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VEL2_mul_lane<!cast<Instruction>(subop # "_2d4s"), neon_uimm1_bare,
		op, VPR64, v2i64, v4i32, v2i32, v2i32, Neon_top4S,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;
		}

		defm SMULL_lane_v3 : NI_2VEL_mul_v3_pat<"SMULLve", int_arm_neon_vmulls>;
		defm UMULL_lane_v3 : NI_2VEL_mul_v3_pat<"UMULLve", int_arm_neon_vmullu>;
		defm SQDMULL_lane_v3 : NI_2VEL_mul_v3_pat<"SQDMULLve", int_arm_neon_vqdmull>;

		multiclass NI_qdfma<SDPatternOperator op>
		t.p.northoverUnsubmitted Not Done Reply Inline Actions What's this got to do with an fma? The "f" stands for either "float" or "fused", neither of which applies here. t.p.northover: What's this got to do with an fma? The "f" stands for either "float" or "fused", neither of…
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions OK. I will remove letter 'f'. Jiangning: OK. I will remove letter 'f'.
		{
		def _4s : PatFrag<(ops node:$Ra, node:$Rn, node:$Rm),
		(op node:$Ra,
		(v4i32 (int_arm_neon_vqdmull node:$Rn, node:$Rm)))>;

		def _2d : PatFrag<(ops node:$Ra, node:$Rn, node:$Rm),
		(op node:$Ra,
		(v2i64 (int_arm_neon_vqdmull node:$Rn, node:$Rm)))>;
		}

		defm Neon_qdmlal : NI_qdfma<int_arm_neon_vqadds>;
		defm Neon_qdmlsl : NI_qdfma<int_arm_neon_vqsubs>;

		multiclass NI_2VEL_v3_qdfma_pat<string subop, string op>
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Likewise t.p.northover: Likewise
		JiangningAuthorUnsubmitted Not Done Reply Inline Actions Likewise Jiangning: Likewise
		{
		def : NI_2VE_laneq<!cast<Instruction>(subop # "_4s4h"), neon_uimm3_bare,
		!cast<PatFrag>(op # "_4s"), VPR128, VPR64, VPR128Lo,
		v4i32, v4i16, v8i16,
		BinOpFrag<(Neon_vduplane
		(Neon_low8H node:$LHS), node:$RHS)>>;

		def : NI_2VE_laneq<!cast<Instruction>(subop # "_2d2s"), neon_uimm2_bare,
		!cast<PatFrag>(op # "_2d"), VPR128, VPR64, VPR128,
		v2i64, v2i32, v4i32,
		BinOpFrag<(Neon_vduplane
		(Neon_low4S node:$LHS), node:$RHS)>>;

		def : NI_2VEL2_laneq<!cast<Instruction>(subop # "_4s8h"), neon_uimm3_bare,
		!cast<PatFrag>(op # "_4s"), VPR128Lo,
		v4i32, v8i16, v8i16, v4i16, Neon_top8H,
		BinOpFrag<(Neon_vduplane
		(Neon_low8H node:$LHS), node:$RHS)>>;

		def : NI_2VEL2_laneq<!cast<Instruction>(subop # "_2d4s"), neon_uimm2_bare,
		!cast<PatFrag>(op # "_2d"), VPR128,
		v2i64, v4i32, v4i32, v2i32, Neon_top4S,
		BinOpFrag<(Neon_vduplane
		(Neon_low4S node:$LHS), node:$RHS)>>;

		// Index can only be half of the max value for lane without q

		def : NI_2VE_lane<!cast<Instruction>(subop # "_4s4h"), neon_uimm2_bare,
		!cast<PatFrag>(op # "_4s"), VPR128, VPR64, VPR64Lo,
		v4i32, v4i16, v4i16,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VE_lane<!cast<Instruction>(subop # "_2d2s"), neon_uimm1_bare,
		!cast<PatFrag>(op # "_2d"), VPR128, VPR64, VPR64,
		v2i64, v2i32, v2i32,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VEL2_lane<!cast<Instruction>(subop # "_4s8h"), neon_uimm2_bare,
		!cast<PatFrag>(op # "_4s"), VPR64Lo,
		v4i32, v8i16, v4i16, v4i16, Neon_top8H,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;

		def : NI_2VEL2_lane<!cast<Instruction>(subop # "_2d4s"), neon_uimm1_bare,
		!cast<PatFrag>(op # "_2d"), VPR64,
		v2i64, v4i32, v2i32, v2i32, Neon_top4S,
		BinOpFrag<(Neon_vduplane node:$LHS, node:$RHS)>>;
		}

		defm SQDMLAL_lane_v3 : NI_2VEL_v3_qdfma_pat<"SQDMLALvve", "Neon_qdmlal">;
		defm SQDMLSL_lane_v3 : NI_2VEL_v3_qdfma_pat<"SQDMLSLvve", "Neon_qdmlsl">;

		// End of implementation for instruction class (3V Elem)

	//Insert element (vector, from main)	//Insert element (vector, from main)
	def INSbw : NeonI_INS_main<"ins", "b", v16i8, GPR32, i32,	def INSbw : NeonI_INS_main<"ins", "b", v16i8, GPR32, i32,
Context not available.

lib/Target/AArch64/AArch64RegisterInfo.td

Property	Old Value	New Value
File Mode	100644	100755

Context not available.
	(sequence "S%u", 0, 31)> {	(sequence "S%u", 0, 31)> {
	}	}

	def FPR64 : RegisterClass<"AArch64", [f64, v2f32, v2i32, v4i16, v8i8, v1i64, v1f64],	def FPR64 : RegisterClass<"AArch64",
		[f64, v2f32, v2i32, v4i16, v8i8, v1i64, v1f64],
	64, (sequence "D%u", 0, 31)>;	64, (sequence "D%u", 0, 31)>;

	def FPR128 : RegisterClass<"AArch64",	def FPR128 : RegisterClass<"AArch64",
	[f128,v2f64, v2i64, v4f32, v4i32, v8i16, v16i8], 128,	[f128,v2f64, v2i64, v4f32, v4i32, v8i16, v16i8],
	(sequence "Q%u", 0, 31)>;	128, (sequence "Q%u", 0, 31)>;

		def FPR64Lo : RegisterClass<"AArch64",
		[f64, v2f32, v2i32, v4i16, v8i8, v1i64, v1f64],
		64, (sequence "D%u", 0, 15)>;

		def FPR128Lo : RegisterClass<"AArch64",
		[f128,v2f64, v2i64, v4f32, v4i32, v8i16, v16i8],
		128, (sequence "Q%u", 0, 15)>;

	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
	// Vector registers:	// Vector registers:
Context not available.

	def VPR128 : RegisterOperand<FPR128, "printVPRRegister">;	def VPR128 : RegisterOperand<FPR128, "printVPRRegister">;

		def VPR64Lo : RegisterOperand<FPR64Lo, "printVPRRegister">;

		def VPR128Lo : RegisterOperand<FPR128Lo, "printVPRRegister">;

	// Flags register	// Flags register
	def NZCV : Register<"nzcv"> {	def NZCV : Register<"nzcv"> {
	let Namespace = "AArch64";	let Namespace = "AArch64";
Context not available.

lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp

Property	Old Value	New Value
File Mode	100644	100755

Context not available.
	static DecodeStatus DecodeFPR128RegisterClass(llvm::MCInst &Inst,	static DecodeStatus DecodeFPR128RegisterClass(llvm::MCInst &Inst,
	unsigned RegNo, uint64_t Address,	unsigned RegNo, uint64_t Address,
	const void *Decoder);	const void *Decoder);
		static DecodeStatus DecodeFPR128LoRegisterClass(llvm::MCInst &Inst,
		unsigned RegNo, uint64_t Address,
		const void *Decoder);

	static DecodeStatus DecodeAddrRegExtendOperand(llvm::MCInst &Inst,	static DecodeStatus DecodeAddrRegExtendOperand(llvm::MCInst &Inst,
	unsigned OptionHiS,	unsigned OptionHiS,
Context not available.
	return MCDisassembler::Success;	return MCDisassembler::Success;
	}	}

		static DecodeStatus
		DecodeFPR128LoRegisterClass(llvm::MCInst &Inst, unsigned RegNo,
		uint64_t Address, const void *Decoder) {
		if (RegNo > 15)
		return MCDisassembler::Fail;

		return DecodeFPR128RegisterClass(Inst, RegNo, Address, Decoder);
		}

	static DecodeStatus DecodeAddrRegExtendOperand(llvm::MCInst &Inst,	static DecodeStatus DecodeAddrRegExtendOperand(llvm::MCInst &Inst,
	unsigned OptionHiS,	unsigned OptionHiS,
	uint64_t Address,	uint64_t Address,
Context not available.

test/CodeGen/AArch64/neon-2velem.ll

This file was added.

				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s
				; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

				declare <2 x double> @llvm.aarch64.neon.vmulx.v2f64(<2 x double>, <2 x double>)

				declare <4 x float> @llvm.aarch64.neon.vmulx.v4f32(<4 x float>, <4 x float>)

				declare <2 x float> @llvm.aarch64.neon.vmulx.v2f32(<2 x float>, <2 x float>)

				declare <4 x i32> @llvm.arm.neon.vqrdmulh.v4i32(<4 x i32>, <4 x i32>)

				declare <2 x i32> @llvm.arm.neon.vqrdmulh.v2i32(<2 x i32>, <2 x i32>)

				declare <8 x i16> @llvm.arm.neon.vqrdmulh.v8i16(<8 x i16>, <8 x i16>)

				declare <4 x i16> @llvm.arm.neon.vqrdmulh.v4i16(<4 x i16>, <4 x i16>)

				declare <4 x i32> @llvm.arm.neon.vqdmulh.v4i32(<4 x i32>, <4 x i32>)

				declare <2 x i32> @llvm.arm.neon.vqdmulh.v2i32(<2 x i32>, <2 x i32>)

				declare <8 x i16> @llvm.arm.neon.vqdmulh.v8i16(<8 x i16>, <8 x i16>)

				declare <4 x i16> @llvm.arm.neon.vqdmulh.v4i16(<4 x i16>, <4 x i16>)

				declare <2 x i64> @llvm.arm.neon.vqdmull.v2i64(<2 x i32>, <2 x i32>)

				declare <4 x i32> @llvm.arm.neon.vqdmull.v4i32(<4 x i16>, <4 x i16>)

				declare <2 x i64> @llvm.arm.neon.vqsubs.v2i64(<2 x i64>, <2 x i64>)

				declare <4 x i32> @llvm.arm.neon.vqsubs.v4i32(<4 x i32>, <4 x i32>)

				declare <2 x i64> @llvm.arm.neon.vqadds.v2i64(<2 x i64>, <2 x i64>)

				declare <4 x i32> @llvm.arm.neon.vqadds.v4i32(<4 x i32>, <4 x i32>)

				declare <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32>, <2 x i32>)

				declare <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16>, <4 x i16>)

				declare <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32>, <2 x i32>)

				declare <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16>, <4 x i16>)

				define <4 x i16> @test_vmla_lane_s16(<4 x i16> %a, <4 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmla_lane_s16:
				; CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i16> %shuffle, %b
				%add = add <4 x i16> %mul, %a
				ret <4 x i16> %add
				}

				define <8 x i16> @test_vmlaq_lane_s16(<8 x i16> %a, <8 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlaq_lane_s16:
				; CHECK: mla {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%mul = mul <8 x i16> %shuffle, %b
				%add = add <8 x i16> %mul, %a
				ret <8 x i16> %add
				}

				define <2 x i32> @test_vmla_lane_s32(<2 x i32> %a, <2 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmla_lane_s32:
				; CHECK: mla {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%mul = mul <2 x i32> %shuffle, %b
				%add = add <2 x i32> %mul, %a
				ret <2 x i32> %add
				}

				define <4 x i32> @test_vmlaq_lane_s32(<4 x i32> %a, <4 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlaq_lane_s32:
				; CHECK: mla {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i32> %shuffle, %b
				%add = add <4 x i32> %mul, %a
				ret <4 x i32> %add
				}

				define <4 x i16> @test_vmla_laneq_s16(<4 x i16> %a, <4 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmla_laneq_s16:
				; CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i16> %shuffle, %b
				%add = add <4 x i16> %mul, %a
				ret <4 x i16> %add
				}

				define <8 x i16> @test_vmlaq_laneq_s16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlaq_laneq_s16:
				; CHECK: mla {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%mul = mul <8 x i16> %shuffle, %b
				%add = add <8 x i16> %mul, %a
				ret <8 x i16> %add
				}

				define <2 x i32> @test_vmla_laneq_s32(<2 x i32> %a, <2 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmla_laneq_s32:
				; CHECK: mla {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%mul = mul <2 x i32> %shuffle, %b
				%add = add <2 x i32> %mul, %a
				ret <2 x i32> %add
				}

				define <4 x i32> @test_vmlaq_laneq_s32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlaq_laneq_s32:
				; CHECK: mla {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i32> %shuffle, %b
				%add = add <4 x i32> %mul, %a
				ret <4 x i32> %add
				}

				define <4 x i16> @test_vmls_lane_s16(<4 x i16> %a, <4 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmls_lane_s16:
				; CHECK: mls {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i16> %shuffle, %b
				%sub = sub <4 x i16> %a, %mul
				ret <4 x i16> %sub
				}

				define <8 x i16> @test_vmlsq_lane_s16(<8 x i16> %a, <8 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlsq_lane_s16:
				; CHECK: mls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%mul = mul <8 x i16> %shuffle, %b
				%sub = sub <8 x i16> %a, %mul
				ret <8 x i16> %sub
				}

				define <2 x i32> @test_vmls_lane_s32(<2 x i32> %a, <2 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmls_lane_s32:
				; CHECK: mls {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%mul = mul <2 x i32> %shuffle, %b
				%sub = sub <2 x i32> %a, %mul
				ret <2 x i32> %sub
				}

				define <4 x i32> @test_vmlsq_lane_s32(<4 x i32> %a, <4 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlsq_lane_s32:
				; CHECK: mls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i32> %shuffle, %b
				%sub = sub <4 x i32> %a, %mul
				ret <4 x i32> %sub
				}

				define <4 x i16> @test_vmls_laneq_s16(<4 x i16> %a, <4 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmls_laneq_s16:
				; CHECK: mls {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i16> %shuffle, %b
				%sub = sub <4 x i16> %a, %mul
				ret <4 x i16> %sub
				}

				define <8 x i16> @test_vmlsq_laneq_s16(<8 x i16> %a, <8 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlsq_laneq_s16:
				; CHECK: mls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%mul = mul <8 x i16> %shuffle, %b
				%sub = sub <8 x i16> %a, %mul
				ret <8 x i16> %sub
				}

				define <2 x i32> @test_vmls_laneq_s32(<2 x i32> %a, <2 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmls_laneq_s32:
				; CHECK: mls {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%mul = mul <2 x i32> %shuffle, %b
				%sub = sub <2 x i32> %a, %mul
				ret <2 x i32> %sub
				}

				define <4 x i32> @test_vmlsq_laneq_s32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlsq_laneq_s32:
				; CHECK: mls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i32> %shuffle, %b
				%sub = sub <4 x i32> %a, %mul
				ret <4 x i32> %sub
				}

				define <4 x i16> @test_vmul_lane_s16(<4 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vmul_lane_s16:
				; CHECK: mul {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i16> %shuffle, %a
				ret <4 x i16> %mul
				}

				define <8 x i16> @test_vmulq_lane_s16(<8 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vmulq_lane_s16:
				; CHECK: mul {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%mul = mul <8 x i16> %shuffle, %a
				ret <8 x i16> %mul
				}

				define <2 x i32> @test_vmul_lane_s32(<2 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vmul_lane_s32:
				; CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%mul = mul <2 x i32> %shuffle, %a
				ret <2 x i32> %mul
				}

				define <4 x i32> @test_vmulq_lane_s32(<4 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vmulq_lane_s32:
				; CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i32> %shuffle, %a
				ret <4 x i32> %mul
				}

				define <4 x i16> @test_vmul_lane_u16(<4 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vmul_lane_u16:
				; CHECK: mul {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i16> %shuffle, %a
				ret <4 x i16> %mul
				}

				define <8 x i16> @test_vmulq_lane_u16(<8 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vmulq_lane_u16:
				; CHECK: mul {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%mul = mul <8 x i16> %shuffle, %a
				ret <8 x i16> %mul
				}

				define <2 x i32> @test_vmul_lane_u32(<2 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vmul_lane_u32:
				; CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%mul = mul <2 x i32> %shuffle, %a
				ret <2 x i32> %mul
				}

				define <4 x i32> @test_vmulq_lane_u32(<4 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vmulq_lane_u32:
				; CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i32> %shuffle, %a
				ret <4 x i32> %mul
				}

				define <4 x i16> @test_vmul_laneq_s16(<4 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vmul_laneq_s16:
				; CHECK: mul {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i16> %shuffle, %a
				ret <4 x i16> %mul
				}

				define <8 x i16> @test_vmulq_laneq_s16(<8 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vmulq_laneq_s16:
				; CHECK: mul {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%mul = mul <8 x i16> %shuffle, %a
				ret <8 x i16> %mul
				}

				define <2 x i32> @test_vmul_laneq_s32(<2 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vmul_laneq_s32:
				; CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%mul = mul <2 x i32> %shuffle, %a
				ret <2 x i32> %mul
				}

				define <4 x i32> @test_vmulq_laneq_s32(<4 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vmulq_laneq_s32:
				; CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i32> %shuffle, %a
				ret <4 x i32> %mul
				}

				define <4 x i16> @test_vmul_laneq_u16(<4 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vmul_laneq_u16:
				; CHECK: mul {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i16> %shuffle, %a
				ret <4 x i16> %mul
				}

				define <8 x i16> @test_vmulq_laneq_u16(<8 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vmulq_laneq_u16:
				; CHECK: mul {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%mul = mul <8 x i16> %shuffle, %a
				ret <8 x i16> %mul
				}

				define <2 x i32> @test_vmul_laneq_u32(<2 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vmul_laneq_u32:
				; CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%mul = mul <2 x i32> %shuffle, %a
				ret <2 x i32> %mul
				}

				define <4 x i32> @test_vmulq_laneq_u32(<4 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vmulq_laneq_u32:
				; CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = mul <4 x i32> %shuffle, %a
				ret <4 x i32> %mul
				}

				define <2 x float> @test_vfma_lane_f32(<2 x float> %a, <2 x float> %b, <2 x float> %v) {
				; CHECK: test_vfma_lane_f32:
				; CHECK: fmla {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
				%mul = fmul <2 x float> %shuffle, %b
				%add = fadd <2 x float> %mul, %a
				t.p.northoverUnsubmitted Not Done Reply Inline Actions None of these should be forming an fmla without fp-contract=fast, should they? t.p.northover: None of these should be forming an fmla without fp-contract=fast, should they?
				JiangningAuthorUnsubmitted Not Done Reply Inline Actions Yes. I will remove the test without -ffp-contract=fast. Jiangning: Yes. I will remove the test without -ffp-contract=fast.
				ret <2 x float> %add
				}

				define <4 x float> @test_vfmaq_lane_f32(<4 x float> %a, <4 x float> %b, <2 x float> %v) {
				; CHECK: test_vfmaq_lane_f32:
				; CHECK: fmla {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = fmul <4 x float> %shuffle, %b
				%add = fadd <4 x float> %mul, %a
				ret <4 x float> %add
				}

				define <2 x float> @test_vfma_laneq_f32(<2 x float> %a, <2 x float> %b, <4 x float> %v) {
				; CHECK: test_vfma_laneq_f32:
				; CHECK: fmla {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> <i32 1, i32 1>
				%mul = fmul <2 x float> %shuffle, %b
				%add = fadd <2 x float> %mul, %a
				ret <2 x float> %add
				}

				define <4 x float> @test_vfmaq_laneq_f32(<4 x float> %a, <4 x float> %b, <4 x float> %v) {
				; CHECK: test_vfmaq_laneq_f32:
				; CHECK: fmla {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = fmul <4 x float> %shuffle, %b
				%add = fadd <4 x float> %mul, %a
				ret <4 x float> %add
				}

				define <2 x float> @test_vfms_lane_f32(<2 x float> %a, <2 x float> %b, <2 x float> %v) {
				; CHECK: test_vfms_lane_f32:
				; CHECK: fmls {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
				%mul = fmul <2 x float> %shuffle, %b
				%sub = fsub <2 x float> %a, %mul
				ret <2 x float> %sub
				}

				define <4 x float> @test_vfmsq_lane_f32(<4 x float> %a, <4 x float> %b, <2 x float> %v) {
				; CHECK: test_vfmsq_lane_f32:
				; CHECK: fmls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = fmul <4 x float> %shuffle, %b
				%sub = fsub <4 x float> %a, %mul
				ret <4 x float> %sub
				}

				define <2 x float> @test_vfms_laneq_f32(<2 x float> %a, <2 x float> %b, <4 x float> %v) {
				; CHECK: test_vfms_laneq_f32:
				; CHECK: fmls {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> <i32 1, i32 1>
				%mul = fmul <2 x float> %shuffle, %b
				%sub = fsub <2 x float> %a, %mul
				ret <2 x float> %sub
				}

				define <4 x float> @test_vfmsq_laneq_f32(<4 x float> %a, <4 x float> %b, <4 x float> %v) {
				; CHECK: test_vfmsq_laneq_f32:
				; CHECK: fmls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = fmul <4 x float> %shuffle, %b
				%sub = fsub <4 x float> %a, %mul
				ret <4 x float> %sub
				}

				define <2 x double> @test_vfmaq_lane_f64(<2 x double> %a, <2 x double> %b, <1 x double> %v) {
				; CHECK: test_vfmaq_lane_f64:
				; CHECK: fmla {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <1 x double> %v, <1 x double> undef, <2 x i32> zeroinitializer
				%mul = fmul <2 x double> %shuffle, %b
				%add = fadd <2 x double> %mul, %a
				ret <2 x double> %add
				}

				define <2 x double> @test_vfmaq_laneq_f64_0(<2 x double> %a, <2 x double> %b, <2 x double> %v) {
				; CHECK: test_vfmaq_laneq_f64_0:
				; CHECK: fmla {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> zeroinitializer
				%mul = fmul <2 x double> %shuffle, %b
				%add = fadd <2 x double> %mul, %a
				ret <2 x double> %add
				}

				define <2 x double> @test_vfmaq_laneq_f64(<2 x double> %a, <2 x double> %b, <2 x double> %v) {
				; CHECK: test_vfmaq_laneq_f64:
				; CHECK: fmla {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> <i32 1, i32 1>
				%mul = fmul <2 x double> %shuffle, %b
				%add = fadd <2 x double> %mul, %a
				ret <2 x double> %add
				}

				define <2 x double> @test_vfmsq_lane_f64(<2 x double> %a, <2 x double> %b, <1 x double> %v) {
				; CHECK: test_vfmsq_lane_f64:
				; CHECK: fmls {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <1 x double> %v, <1 x double> undef, <2 x i32> zeroinitializer
				%mul = fmul <2 x double> %shuffle, %b
				%sub = fsub <2 x double> %a, %mul
				ret <2 x double> %sub
				}

				define <2 x double> @test_vfmsq_laneq_f64_0(<2 x double> %a, <2 x double> %b, <2 x double> %v) {
				; CHECK: test_vfmsq_laneq_f64_0:
				; CHECK: fmls {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> zeroinitializer
				%mul = fmul <2 x double> %shuffle, %b
				%sub = fsub <2 x double> %a, %mul
				ret <2 x double> %sub
				}

				define <2 x double> @test_vfmsq_laneq_f64(<2 x double> %a, <2 x double> %b, <2 x double> %v) {
				; CHECK: test_vfmsq_laneq_f64:
				; CHECK: fmls {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> <i32 1, i32 1>
				%mul = fmul <2 x double> %shuffle, %b
				%sub = fsub <2 x double> %a, %mul
				ret <2 x double> %sub
				}

				define <4 x i32> @test_vmlal_lane_s16(<4 x i32> %a, <4 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlal_lane_s16:
				; CHECK: mlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%add = add <4 x i32> %vmull2.i, %a
				ret <4 x i32> %add
				}

				define <2 x i64> @test_vmlal_lane_s32(<2 x i64> %a, <2 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlal_lane_s32:
				; CHECK: mlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%add = add <2 x i64> %vmull2.i, %a
				ret <2 x i64> %add
				}

				define <4 x i32> @test_vmlal_laneq_s16(<4 x i32> %a, <4 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlal_laneq_s16:
				; CHECK: mlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%add = add <4 x i32> %vmull2.i, %a
				ret <4 x i32> %add
				}

				define <2 x i64> @test_vmlal_laneq_s32(<2 x i64> %a, <2 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlal_laneq_s32:
				; CHECK: mlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%add = add <2 x i64> %vmull2.i, %a
				ret <2 x i64> %add
				}

				define <4 x i32> @test_vmlal_high_lane_s16(<4 x i32> %a, <8 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlal_high_lane_s16:
				; CHECK: mlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%add = add <4 x i32> %vmull2.i, %a
				ret <4 x i32> %add
				}

				define <2 x i64> @test_vmlal_high_lane_s32(<2 x i64> %a, <4 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlal_high_lane_s32:
				; CHECK: mlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%add = add <2 x i64> %vmull2.i, %a
				ret <2 x i64> %add
				}

				define <4 x i32> @test_vmlal_high_laneq_s16(<4 x i32> %a, <8 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlal_high_laneq_s16:
				; CHECK: mlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%add = add <4 x i32> %vmull2.i, %a
				ret <4 x i32> %add
				}

				define <2 x i64> @test_vmlal_high_laneq_s32(<2 x i64> %a, <4 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlal_high_laneq_s32:
				; CHECK: mlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%add = add <2 x i64> %vmull2.i, %a
				ret <2 x i64> %add
				}

				define <4 x i32> @test_vmlsl_lane_s16(<4 x i32> %a, <4 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlsl_lane_s16:
				; CHECK: mlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%sub = sub <4 x i32> %a, %vmull2.i
				ret <4 x i32> %sub
				}

				define <2 x i64> @test_vmlsl_lane_s32(<2 x i64> %a, <2 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlsl_lane_s32:
				; CHECK: mlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%sub = sub <2 x i64> %a, %vmull2.i
				ret <2 x i64> %sub
				}

				define <4 x i32> @test_vmlsl_laneq_s16(<4 x i32> %a, <4 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlsl_laneq_s16:
				; CHECK: mlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%sub = sub <4 x i32> %a, %vmull2.i
				ret <4 x i32> %sub
				}

				define <2 x i64> @test_vmlsl_laneq_s32(<2 x i64> %a, <2 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlsl_laneq_s32:
				; CHECK: mlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%sub = sub <2 x i64> %a, %vmull2.i
				ret <2 x i64> %sub
				}

				define <4 x i32> @test_vmlsl_high_lane_s16(<4 x i32> %a, <8 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlsl_high_lane_s16:
				; CHECK: mlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%sub = sub <4 x i32> %a, %vmull2.i
				ret <4 x i32> %sub
				}

				define <2 x i64> @test_vmlsl_high_lane_s32(<2 x i64> %a, <4 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlsl_high_lane_s32:
				; CHECK: mlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%sub = sub <2 x i64> %a, %vmull2.i
				ret <2 x i64> %sub
				}

				define <4 x i32> @test_vmlsl_high_laneq_s16(<4 x i32> %a, <8 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlsl_high_laneq_s16:
				; CHECK: mlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%sub = sub <4 x i32> %a, %vmull2.i
				ret <4 x i32> %sub
				}

				define <2 x i64> @test_vmlsl_high_laneq_s32(<2 x i64> %a, <4 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlsl_high_laneq_s32:
				; CHECK: mlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%sub = sub <2 x i64> %a, %vmull2.i
				ret <2 x i64> %sub
				}

				define <4 x i32> @test_vmlal_lane_u16(<4 x i32> %a, <4 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlal_lane_u16:
				; CHECK: mlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%add = add <4 x i32> %vmull2.i, %a
				ret <4 x i32> %add
				}

				define <2 x i64> @test_vmlal_lane_u32(<2 x i64> %a, <2 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlal_lane_u32:
				; CHECK: mlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%add = add <2 x i64> %vmull2.i, %a
				ret <2 x i64> %add
				}

				define <4 x i32> @test_vmlal_laneq_u16(<4 x i32> %a, <4 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlal_laneq_u16:
				; CHECK: mlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%add = add <4 x i32> %vmull2.i, %a
				ret <4 x i32> %add
				}

				define <2 x i64> @test_vmlal_laneq_u32(<2 x i64> %a, <2 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlal_laneq_u32:
				; CHECK: mlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%add = add <2 x i64> %vmull2.i, %a
				ret <2 x i64> %add
				}

				define <4 x i32> @test_vmlal_high_lane_u16(<4 x i32> %a, <8 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlal_high_lane_u16:
				; CHECK: mlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%add = add <4 x i32> %vmull2.i, %a
				ret <4 x i32> %add
				}

				define <2 x i64> @test_vmlal_high_lane_u32(<2 x i64> %a, <4 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlal_high_lane_u32:
				; CHECK: mlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%add = add <2 x i64> %vmull2.i, %a
				ret <2 x i64> %add
				}

				define <4 x i32> @test_vmlal_high_laneq_u16(<4 x i32> %a, <8 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlal_high_laneq_u16:
				; CHECK: mlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%add = add <4 x i32> %vmull2.i, %a
				ret <4 x i32> %add
				}

				define <2 x i64> @test_vmlal_high_laneq_u32(<2 x i64> %a, <4 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlal_high_laneq_u32:
				; CHECK: mlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%add = add <2 x i64> %vmull2.i, %a
				ret <2 x i64> %add
				}

				define <4 x i32> @test_vmlsl_lane_u16(<4 x i32> %a, <4 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlsl_lane_u16:
				; CHECK: mlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%sub = sub <4 x i32> %a, %vmull2.i
				ret <4 x i32> %sub
				}

				define <2 x i64> @test_vmlsl_lane_u32(<2 x i64> %a, <2 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlsl_lane_u32:
				; CHECK: mlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%sub = sub <2 x i64> %a, %vmull2.i
				ret <2 x i64> %sub
				}

				define <4 x i32> @test_vmlsl_laneq_u16(<4 x i32> %a, <4 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlsl_laneq_u16:
				; CHECK: mlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%sub = sub <4 x i32> %a, %vmull2.i
				ret <4 x i32> %sub
				}

				define <2 x i64> @test_vmlsl_laneq_u32(<2 x i64> %a, <2 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlsl_laneq_u32:
				; CHECK: mlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%sub = sub <2 x i64> %a, %vmull2.i
				ret <2 x i64> %sub
				}

				define <4 x i32> @test_vmlsl_high_lane_u16(<4 x i32> %a, <8 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vmlsl_high_lane_u16:
				; CHECK: mlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%sub = sub <4 x i32> %a, %vmull2.i
				ret <4 x i32> %sub
				}

				define <2 x i64> @test_vmlsl_high_lane_u32(<2 x i64> %a, <4 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vmlsl_high_lane_u32:
				; CHECK: mlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%sub = sub <2 x i64> %a, %vmull2.i
				ret <2 x i64> %sub
				}

				define <4 x i32> @test_vmlsl_high_laneq_u16(<4 x i32> %a, <8 x i16> %b, <8 x i16> %v) {
				; CHECK: test_vmlsl_high_laneq_u16:
				; CHECK: mlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%sub = sub <4 x i32> %a, %vmull2.i
				ret <4 x i32> %sub
				}

				define <2 x i64> @test_vmlsl_high_laneq_u32(<2 x i64> %a, <4 x i32> %b, <4 x i32> %v) {
				; CHECK: test_vmlsl_high_laneq_u32:
				; CHECK: mlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%sub = sub <2 x i64> %a, %vmull2.i
				ret <2 x i64> %sub
				}

				define <4 x i32> @test_vmull_lane_s16(<4 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vmull_lane_s16:
				; CHECK: mull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %a, <4 x i16> %shuffle) #2
				ret <4 x i32> %vmull2.i
				}

				define <2 x i64> @test_vmull_lane_s32(<2 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vmull_lane_s32:
				; CHECK: mull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %a, <2 x i32> %shuffle) #2
				ret <2 x i64> %vmull2.i
				}

				define <4 x i32> @test_vmull_lane_u16(<4 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vmull_lane_u16:
				; CHECK: mull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %a, <4 x i16> %shuffle) #2
				ret <4 x i32> %vmull2.i
				}

				define <2 x i64> @test_vmull_lane_u32(<2 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vmull_lane_u32:
				; CHECK: mull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %a, <2 x i32> %shuffle) #2
				ret <2 x i64> %vmull2.i
				}

				define <4 x i32> @test_vmull_high_lane_s16(<8 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vmull_high_lane_s16:
				; CHECK: mull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				ret <4 x i32> %vmull2.i
				}

				define <2 x i64> @test_vmull_high_lane_s32(<4 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vmull_high_lane_s32:
				; CHECK: mull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %a, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				ret <2 x i64> %vmull2.i
				}

				define <4 x i32> @test_vmull_high_lane_u16(<8 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vmull_high_lane_u16:
				; CHECK: mull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				ret <4 x i32> %vmull2.i
				}

				define <2 x i64> @test_vmull_high_lane_u32(<4 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vmull_high_lane_u32:
				; CHECK: mull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %a, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				ret <2 x i64> %vmull2.i
				}

				define <4 x i32> @test_vmull_laneq_s16(<4 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vmull_laneq_s16:
				; CHECK: mull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %a, <4 x i16> %shuffle) #2
				ret <4 x i32> %vmull2.i
				}

				define <2 x i64> @test_vmull_laneq_s32(<2 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vmull_laneq_s32:
				; CHECK: mull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %a, <2 x i32> %shuffle) #2
				ret <2 x i64> %vmull2.i
				}

				define <4 x i32> @test_vmull_laneq_u16(<4 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vmull_laneq_u16:
				; CHECK: mull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %a, <4 x i16> %shuffle) #2
				ret <4 x i32> %vmull2.i
				}

				define <2 x i64> @test_vmull_laneq_u32(<2 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vmull_laneq_u32:
				; CHECK: mull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %a, <2 x i32> %shuffle) #2
				ret <2 x i64> %vmull2.i
				}

				define <4 x i32> @test_vmull_high_laneq_s16(<8 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vmull_high_laneq_s16:
				; CHECK: mull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmulls.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				ret <4 x i32> %vmull2.i
				}

				define <2 x i64> @test_vmull_high_laneq_s32(<4 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vmull_high_laneq_s32:
				; CHECK: mull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %a, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmulls.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				ret <2 x i64> %vmull2.i
				}

				define <4 x i32> @test_vmull_high_laneq_u16(<8 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vmull_high_laneq_u16:
				; CHECK: mull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmull2.i = tail call <4 x i32> @llvm.arm.neon.vmullu.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				ret <4 x i32> %vmull2.i
				}

				define <2 x i64> @test_vmull_high_laneq_u32(<4 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vmull_high_laneq_u32:
				; CHECK: mull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %a, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vmull2.i = tail call <2 x i64> @llvm.arm.neon.vmullu.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				ret <2 x i64> %vmull2.i
				}

				define <4 x i32> @test_vqdmlal_lane_s16(<4 x i32> %a, <4 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vqdmlal_lane_s16:
				; CHECK: qdmlal {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmlal2.i = tail call <4 x i32> @llvm.arm.neon.vqdmull.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%vqdmlal4.i = tail call <4 x i32> @llvm.arm.neon.vqadds.v4i32(<4 x i32> %a, <4 x i32> %vqdmlal2.i) #2
				ret <4 x i32> %vqdmlal4.i
				}

				define <2 x i64> @test_vqdmlal_lane_s32(<2 x i64> %a, <2 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vqdmlal_lane_s32:
				; CHECK: qdmlal {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqdmlal2.i = tail call <2 x i64> @llvm.arm.neon.vqdmull.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%vqdmlal4.i = tail call <2 x i64> @llvm.arm.neon.vqadds.v2i64(<2 x i64> %a, <2 x i64> %vqdmlal2.i) #2
				ret <2 x i64> %vqdmlal4.i
				}

				define <4 x i32> @test_vqdmlal_high_lane_s16(<4 x i32> %a, <8 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vqdmlal_high_lane_s16:
				; CHECK: qdmlal2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmlal2.i = tail call <4 x i32> @llvm.arm.neon.vqdmull.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%vqdmlal4.i = tail call <4 x i32> @llvm.arm.neon.vqadds.v4i32(<4 x i32> %a, <4 x i32> %vqdmlal2.i) #2
				ret <4 x i32> %vqdmlal4.i
				}

				define <2 x i64> @test_vqdmlal_high_lane_s32(<2 x i64> %a, <4 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vqdmlal_high_lane_s32:
				; CHECK: qdmlal2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqdmlal2.i = tail call <2 x i64> @llvm.arm.neon.vqdmull.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%vqdmlal4.i = tail call <2 x i64> @llvm.arm.neon.vqadds.v2i64(<2 x i64> %a, <2 x i64> %vqdmlal2.i) #2
				ret <2 x i64> %vqdmlal4.i
				}

				define <4 x i32> @test_vqdmlsl_lane_s16(<4 x i32> %a, <4 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vqdmlsl_lane_s16:
				; CHECK: qdmlsl {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmlsl2.i = tail call <4 x i32> @llvm.arm.neon.vqdmull.v4i32(<4 x i16> %b, <4 x i16> %shuffle) #2
				%vqdmlsl4.i = tail call <4 x i32> @llvm.arm.neon.vqsubs.v4i32(<4 x i32> %a, <4 x i32> %vqdmlsl2.i) #2
				ret <4 x i32> %vqdmlsl4.i
				}

				define <2 x i64> @test_vqdmlsl_lane_s32(<2 x i64> %a, <2 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vqdmlsl_lane_s32:
				; CHECK: qdmlsl {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqdmlsl2.i = tail call <2 x i64> @llvm.arm.neon.vqdmull.v2i64(<2 x i32> %b, <2 x i32> %shuffle) #2
				%vqdmlsl4.i = tail call <2 x i64> @llvm.arm.neon.vqsubs.v2i64(<2 x i64> %a, <2 x i64> %vqdmlsl2.i) #2
				ret <2 x i64> %vqdmlsl4.i
				}

				define <4 x i32> @test_vqdmlsl_high_lane_s16(<4 x i32> %a, <8 x i16> %b, <4 x i16> %v) {
				; CHECK: test_vqdmlsl_high_lane_s16:
				; CHECK: qdmlsl2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmlsl2.i = tail call <4 x i32> @llvm.arm.neon.vqdmull.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				%vqdmlsl4.i = tail call <4 x i32> @llvm.arm.neon.vqsubs.v4i32(<4 x i32> %a, <4 x i32> %vqdmlsl2.i) #2
				ret <4 x i32> %vqdmlsl4.i
				}

				define <2 x i64> @test_vqdmlsl_high_lane_s32(<2 x i64> %a, <4 x i32> %b, <2 x i32> %v) {
				; CHECK: test_vqdmlsl_high_lane_s32:
				; CHECK: qdmlsl2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %b, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqdmlsl2.i = tail call <2 x i64> @llvm.arm.neon.vqdmull.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				%vqdmlsl4.i = tail call <2 x i64> @llvm.arm.neon.vqsubs.v2i64(<2 x i64> %a, <2 x i64> %vqdmlsl2.i) #2
				ret <2 x i64> %vqdmlsl4.i
				}

				define <4 x i32> @test_vqdmull_lane_s16(<4 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vqdmull_lane_s16:
				; CHECK: qdmull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmull2.i = tail call <4 x i32> @llvm.arm.neon.vqdmull.v4i32(<4 x i16> %a, <4 x i16> %shuffle) #2
				ret <4 x i32> %vqdmull2.i
				}

				define <2 x i64> @test_vqdmull_lane_s32(<2 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vqdmull_lane_s32:
				; CHECK: qdmull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqdmull2.i = tail call <2 x i64> @llvm.arm.neon.vqdmull.v2i64(<2 x i32> %a, <2 x i32> %shuffle) #2
				ret <2 x i64> %vqdmull2.i
				}

				define <4 x i32> @test_vqdmull_laneq_s16(<4 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vqdmull_laneq_s16:
				; CHECK: qdmull {{v[0-9]+}}.4s, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmull2.i = tail call <4 x i32> @llvm.arm.neon.vqdmull.v4i32(<4 x i16> %a, <4 x i16> %shuffle) #2
				ret <4 x i32> %vqdmull2.i
				}

				define <2 x i64> @test_vqdmull_laneq_s32(<2 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vqdmull_laneq_s32:
				; CHECK: qdmull {{v[0-9]+}}.2d, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqdmull2.i = tail call <2 x i64> @llvm.arm.neon.vqdmull.v2i64(<2 x i32> %a, <2 x i32> %shuffle) #2
				ret <2 x i64> %vqdmull2.i
				}

				define <4 x i32> @test_vqdmull_high_lane_s16(<8 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vqdmull_high_lane_s16:
				; CHECK: qdmull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmull2.i = tail call <4 x i32> @llvm.arm.neon.vqdmull.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				ret <4 x i32> %vqdmull2.i
				}

				define <2 x i64> @test_vqdmull_high_lane_s32(<4 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vqdmull_high_lane_s32:
				; CHECK: qdmull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %a, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqdmull2.i = tail call <2 x i64> @llvm.arm.neon.vqdmull.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				ret <2 x i64> %vqdmull2.i
				}

				define <4 x i32> @test_vqdmull_high_laneq_s16(<8 x i16> %a, <8 x i16> %v) {
				; CHECK: test_vqdmull_high_laneq_s16:
				; CHECK: qdmull2 {{v[0-9]+}}.4s, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <8 x i16> %a, <8 x i16> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmull2.i = tail call <4 x i32> @llvm.arm.neon.vqdmull.v4i32(<4 x i16> %shuffle.i, <4 x i16> %shuffle) #2
				ret <4 x i32> %vqdmull2.i
				}

				define <2 x i64> @test_vqdmull_high_laneq_s32(<4 x i32> %a, <4 x i32> %v) {
				; CHECK: test_vqdmull_high_laneq_s32:
				; CHECK: qdmull2 {{v[0-9]+}}.2d, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle.i = shufflevector <4 x i32> %a, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqdmull2.i = tail call <2 x i64> @llvm.arm.neon.vqdmull.v2i64(<2 x i32> %shuffle.i, <2 x i32> %shuffle) #2
				ret <2 x i64> %vqdmull2.i
				}

				define <4 x i16> @test_vqdmulh_lane_s16(<4 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vqdmulh_lane_s16:
				; CHECK: qdmulh {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmulh2.i = tail call <4 x i16> @llvm.arm.neon.vqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle) #2
				ret <4 x i16> %vqdmulh2.i
				}

				define <8 x i16> @test_vqdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vqdmulhq_lane_s16:
				; CHECK: qdmulh {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%vqdmulh2.i = tail call <8 x i16> @llvm.arm.neon.vqdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle) #2
				ret <8 x i16> %vqdmulh2.i
				}

				define <2 x i32> @test_vqdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vqdmulh_lane_s32:
				; CHECK: qdmulh {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqdmulh2.i = tail call <2 x i32> @llvm.arm.neon.vqdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle) #2
				ret <2 x i32> %vqdmulh2.i
				}

				define <4 x i32> @test_vqdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vqdmulhq_lane_s32:
				; CHECK: qdmulh {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqdmulh2.i = tail call <4 x i32> @llvm.arm.neon.vqdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle) #2
				ret <4 x i32> %vqdmulh2.i
				}

				define <4 x i16> @test_vqrdmulh_lane_s16(<4 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vqrdmulh_lane_s16:
				; CHECK: qrdmulh {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqrdmulh2.i = tail call <4 x i16> @llvm.arm.neon.vqrdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle) #2
				ret <4 x i16> %vqrdmulh2.i
				}

				define <8 x i16> @test_vqrdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {
				; CHECK: test_vqrdmulhq_lane_s16:
				; CHECK: qrdmulh {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
				%vqrdmulh2.i = tail call <8 x i16> @llvm.arm.neon.vqrdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle) #2
				ret <8 x i16> %vqrdmulh2.i
				}

				define <2 x i32> @test_vqrdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vqrdmulh_lane_s32:
				; CHECK: qrdmulh {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqrdmulh2.i = tail call <2 x i32> @llvm.arm.neon.vqrdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle) #2
				ret <2 x i32> %vqrdmulh2.i
				}

				define <4 x i32> @test_vqrdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {
				; CHECK: test_vqrdmulhq_lane_s32:
				; CHECK: qrdmulh {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqrdmulh2.i = tail call <4 x i32> @llvm.arm.neon.vqrdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle) #2
				ret <4 x i32> %vqrdmulh2.i
				}

				define <2 x float> @test_vmul_lane_f32(<2 x float> %a, <2 x float> %v) {
				; CHECK: test_vmul_lane_f32:
				; CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
				%mul = fmul <2 x float> %shuffle, %a
				ret <2 x float> %mul
				}

				define <4 x float> @test_vmulq_lane_f32(<4 x float> %a, <2 x float> %v) {
				; CHECK: test_vmulq_lane_f32:
				; CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = fmul <4 x float> %shuffle, %a
				ret <4 x float> %mul
				}

				define <2 x double> @test_vmulq_lane_f64(<2 x double> %a, <1 x double> %v) {
				; CHECK: test_vmulq_lane_f64:
				; CHECK: mul {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <1 x double> %v, <1 x double> undef, <2 x i32> zeroinitializer
				%mul = fmul <2 x double> %shuffle, %a
				ret <2 x double> %mul
				}

				define <2 x float> @test_vmul_laneq_f32(<2 x float> %a, <4 x float> %v) {
				; CHECK: test_vmul_laneq_f32:
				; CHECK: mul {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> <i32 1, i32 1>
				%mul = fmul <2 x float> %shuffle, %a
				ret <2 x float> %mul
				}

				define <4 x float> @test_vmulq_laneq_f32(<4 x float> %a, <4 x float> %v) {
				; CHECK: test_vmulq_laneq_f32:
				; CHECK: mul {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%mul = fmul <4 x float> %shuffle, %a
				ret <4 x float> %mul
				}

				define <2 x double> @test_vmulq_laneq_f64_0(<2 x double> %a, <2 x double> %v) {
				; CHECK: test_vmulq_laneq_f64_0:
				; CHECK: mul {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> zeroinitializer
				%mul = fmul <2 x double> %shuffle, %a
				ret <2 x double> %mul
				}

				define <2 x double> @test_vmulq_laneq_f64(<2 x double> %a, <2 x double> %v) {
				; CHECK: test_vmulq_laneq_f64:
				; CHECK: mul {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> <i32 1, i32 1>
				%mul = fmul <2 x double> %shuffle, %a
				ret <2 x double> %mul
				}

				define <2 x float> @test_vmulx_lane_f32(<2 x float> %a, <2 x float> %v) {
				; CHECK: test_vmulx_lane_f32:
				; CHECK: mulx {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
				%vmulx2.i = tail call <2 x float> @llvm.aarch64.neon.vmulx.v2f32(<2 x float> %a, <2 x float> %shuffle) #2
				ret <2 x float> %vmulx2.i
				}

				define <4 x float> @test_vmulxq_lane_f32(<4 x float> %a, <2 x float> %v) {
				; CHECK: test_vmulxq_lane_f32:
				; CHECK: mulx {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmulx2.i = tail call <4 x float> @llvm.aarch64.neon.vmulx.v4f32(<4 x float> %a, <4 x float> %shuffle) #2
				ret <4 x float> %vmulx2.i
				}

				define <2 x double> @test_vmulxq_lane_f64(<2 x double> %a, <1 x double> %v) {
				; CHECK: test_vmulxq_lane_f64:
				; CHECK: mulx {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <1 x double> %v, <1 x double> undef, <2 x i32> zeroinitializer
				%vmulx2.i = tail call <2 x double> @llvm.aarch64.neon.vmulx.v2f64(<2 x double> %a, <2 x double> %shuffle) #2
				ret <2 x double> %vmulx2.i
				}

				define <2 x float> @test_vmulx_laneq_f32(<2 x float> %a, <4 x float> %v) {
				; CHECK: test_vmulx_laneq_f32:
				; CHECK: mulx {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> <i32 1, i32 1>
				%vmulx2.i = tail call <2 x float> @llvm.aarch64.neon.vmulx.v2f32(<2 x float> %a, <2 x float> %shuffle) #2
				ret <2 x float> %vmulx2.i
				}

				define <4 x float> @test_vmulxq_laneq_f32(<4 x float> %a, <4 x float> %v) {
				; CHECK: test_vmulxq_laneq_f32:
				; CHECK: mulx {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vmulx2.i = tail call <4 x float> @llvm.aarch64.neon.vmulx.v4f32(<4 x float> %a, <4 x float> %shuffle) #2
				ret <4 x float> %vmulx2.i
				}

				define <2 x double> @test_vmulxq_laneq_f64_0(<2 x double> %a, <2 x double> %v) {
				; CHECK: test_vmulxq_laneq_f64_0:
				; CHECK: mulx {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> zeroinitializer
				%vmulx2.i = tail call <2 x double> @llvm.aarch64.neon.vmulx.v2f64(<2 x double> %a, <2 x double> %shuffle) #2
				ret <2 x double> %vmulx2.i
				}

				define <2 x double> @test_vmulxq_laneq_f64(<2 x double> %a, <2 x double> %v) {
				; CHECK: test_vmulxq_laneq_f64:
				; CHECK: mulx {{v[0-9]+}}.2d, {{v[0-9]+}}.2d, {{v[0-9]+}}.d[{{[0-9]+}}]
				entry:
				%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> <i32 1, i32 1>
				%vmulx2.i = tail call <2 x double> @llvm.aarch64.neon.vmulx.v2f64(<2 x double> %a, <2 x double> %shuffle) #2
				ret <2 x double> %vmulx2.i
				}

test/MC/AArch64/neon-2velem.s

This file was added.

				// RUN: llvm-mc -triple=aarch64 -mattr=+neon -show-encoding < %s \| FileCheck %s

				// Check that the assembler can handle the documented syntax for AArch64

				//------------------------------------------------------------------------------
				// Instructions with 2 vectors and an element
				//------------------------------------------------------------------------------

				mla v0.2s, v1.2s, v2.s[2]
				mla v0.2s, v1.2s, v22.s[2]
				mla v3.4s, v8.4s, v2.s[1]
				mla v3.4s, v8.4s, v22.s[3]

				// CHECK: mla v0.2s, v1.2s, v2.s[2] // encoding: [0x20,0x08,0x82,0x2f]
				// CHECK: mla v0.2s, v1.2s, v22.s[2] // encoding: [0x20,0x08,0x96,0x2f]
				// CHECK: mla v3.4s, v8.4s, v2.s[1] // encoding: [0x03,0x01,0xa2,0x6f]
				// CHECK: mla v3.4s, v8.4s, v22.s[3] // encoding: [0x03,0x09,0xb6,0x6f]

				mla v0.4h, v1.4h, v2.h[2]
				mla v0.4h, v1.4h, v15.h[2]
				mla v0.8h, v1.8h, v2.h[7]
				mla v0.8h, v1.8h, v14.h[6]

				// CHECK: mla v0.4h, v1.4h, v2.h[2] // encoding: [0x20,0x00,0x62,0x2f]
				// CHECK: mla v0.4h, v1.4h, v15.h[2] // encoding: [0x20,0x00,0x6f,0x2f]
				// CHECK: mla v0.8h, v1.8h, v2.h[7] // encoding: [0x20,0x08,0x72,0x6f]
				// CHECK: mla v0.8h, v1.8h, v14.h[6] // encoding: [0x20,0x08,0x6e,0x6f]

				mls v0.2s, v1.2s, v2.s[2]
				mls v0.2s, v1.2s, v22.s[2]
				mls v3.4s, v8.4s, v2.s[1]
				mls v3.4s, v8.4s, v22.s[3]

				// CHECK: mls v0.2s, v1.2s, v2.s[2] // encoding: [0x20,0x48,0x82,0x2f]
				// CHECK: mls v0.2s, v1.2s, v22.s[2] // encoding: [0x20,0x48,0x96,0x2f]
				// CHECK: mls v3.4s, v8.4s, v2.s[1] // encoding: [0x03,0x41,0xa2,0x6f]
				// CHECK: mls v3.4s, v8.4s, v22.s[3] // encoding: [0x03,0x49,0xb6,0x6f]

				mls v0.4h, v1.4h, v2.h[2]
				mls v0.4h, v1.4h, v15.h[2]
				mls v0.8h, v1.8h, v2.h[7]
				mls v0.8h, v1.8h, v14.h[6]

				// CHECK: mls v0.4h, v1.4h, v2.h[2] // encoding: [0x20,0x40,0x62,0x2f]
				// CHECK: mls v0.4h, v1.4h, v15.h[2] // encoding: [0x20,0x40,0x6f,0x2f]
				// CHECK: mls v0.8h, v1.8h, v2.h[7] // encoding: [0x20,0x48,0x72,0x6f]
				// CHECK: mls v0.8h, v1.8h, v14.h[6] // encoding: [0x20,0x48,0x6e,0x6f]

				fmla v0.2s, v1.2s, v2.s[2]
				fmla v0.2s, v1.2s, v22.s[2]
				fmla v3.4s, v8.4s, v2.s[1]
				fmla v3.4s, v8.4s, v22.s[3]
				fmla v0.2d, v1.2d, v2.d[1]
				fmla v0.2d, v1.2d, v22.d[1]

				// CHECK: fmla v0.2s, v1.2s, v2.s[2] // encoding: [0x20,0x18,0x82,0x0f]
				// CHECK: fmla v0.2s, v1.2s, v22.s[2] // encoding: [0x20,0x18,0x96,0x0f]
				// CHECK: fmla v3.4s, v8.4s, v2.s[1] // encoding: [0x03,0x11,0xa2,0x4f]
				// CHECK: fmla v3.4s, v8.4s, v22.s[3] // encoding: [0x03,0x19,0xb6,0x4f]
				// CHECK: fmla v0.2d, v1.2d, v2.d[1] // encoding: [0x20,0x18,0xc2,0x4f]
				// CHECK: fmla v0.2d, v1.2d, v22.d[1] // encoding: [0x20,0x18,0xd6,0x4f]

				fmls v0.2s, v1.2s, v2.s[2]
				fmls v0.2s, v1.2s, v22.s[2]
				fmls v3.4s, v8.4s, v2.s[1]
				fmls v3.4s, v8.4s, v22.s[3]
				fmls v0.2d, v1.2d, v2.d[1]
				fmls v0.2d, v1.2d, v22.d[1]

				// CHECK: fmls v0.2s, v1.2s, v2.s[2] // encoding: [0x20,0x58,0x82,0x0f]
				// CHECK: fmls v0.2s, v1.2s, v22.s[2] // encoding: [0x20,0x58,0x96,0x0f]
				// CHECK: fmls v3.4s, v8.4s, v2.s[1] // encoding: [0x03,0x51,0xa2,0x4f]
				// CHECK: fmls v3.4s, v8.4s, v22.s[3] // encoding: [0x03,0x59,0xb6,0x4f]
				// CHECK: fmls v0.2d, v1.2d, v2.d[1] // encoding: [0x20,0x58,0xc2,0x4f]
				// CHECK: fmls v0.2d, v1.2d, v22.d[1] // encoding: [0x20,0x58,0xd6,0x4f]

				smlal v0.4s, v1.4h, v2.h[2]
				smlal v0.2d, v1.2s, v2.s[2]
				smlal v0.2d, v1.2s, v22.s[2]
				smlal2 v0.4s, v1.8h, v1.h[2]
				smlal2 v0.2d, v1.4s, v1.s[2]
				smlal2 v0.2d, v1.4s, v22.s[2]

				// CHECK: smlal v0.4s, v1.4h, v2.h[2] // encoding: [0x20,0x20,0x62,0x0f]
				// CHECK: smlal v0.2d, v1.2s, v2.s[2] // encoding: [0x20,0x28,0x82,0x0f]
				// CHECK: smlal v0.2d, v1.2s, v22.s[2] // encoding: [0x20,0x28,0x96,0x0f]
				// CHECK: smlal2 v0.4s, v1.8h, v1.h[2] // encoding: [0x20,0x20,0x61,0x4f]
				// CHECK: smlal2 v0.2d, v1.4s, v1.s[2] // encoding: [0x20,0x28,0x81,0x4f]
				// CHECK: smlal2 v0.2d, v1.4s, v22.s[2] // encoding: [0x20,0x28,0x96,0x4f]

				smlsl v0.4s, v1.4h, v2.h[2]
				smlsl v0.2d, v1.2s, v2.s[2]
				smlsl v0.2d, v1.2s, v22.s[2]
				smlsl2 v0.4s, v1.8h, v1.h[2]
				smlsl2 v0.2d, v1.4s, v1.s[2]
				smlsl2 v0.2d, v1.4s, v22.s[2]

				// CHECK: smlsl v0.4s, v1.4h, v2.h[2] // encoding: [0x20,0x60,0x62,0x0f]
				// CHECK: smlsl v0.2d, v1.2s, v2.s[2] // encoding: [0x20,0x68,0x82,0x0f]
				// CHECK: smlsl v0.2d, v1.2s, v22.s[2] // encoding: [0x20,0x68,0x96,0x0f]
				// CHECK: smlsl2 v0.4s, v1.8h, v1.h[2] // encoding: [0x20,0x60,0x61,0x4f]
				// CHECK: smlsl2 v0.2d, v1.4s, v1.s[2] // encoding: [0x20,0x68,0x81,0x4f]
				// CHECK: smlsl2 v0.2d, v1.4s, v22.s[2] // encoding: [0x20,0x68,0x96,0x4f]

				sqdmlal v0.4s, v1.4h, v2.h[2]
				sqdmlal v0.2d, v1.2s, v2.s[2]
				sqdmlal v0.2d, v1.2s, v22.s[2]
				sqdmlal2 v0.4s, v1.8h, v1.h[2]
				sqdmlal2 v0.2d, v1.4s, v1.s[2]
				sqdmlal2 v0.2d, v1.4s, v22.s[2]

				// CHECK: sqdmlal v0.4s, v1.4h, v2.h[2] // encoding: [0x20,0x30,0x62,0x0f]
				// CHECK: sqdmlal v0.2d, v1.2s, v2.s[2] // encoding: [0x20,0x38,0x82,0x0f]
				// CHECK: sqdmlal v0.2d, v1.2s, v22.s[2] // encoding: [0x20,0x38,0x96,0x0f]
				// CHECK: sqdmlal2 v0.4s, v1.8h, v1.h[2] // encoding: [0x20,0x30,0x61,0x4f]
				// CHECK: sqdmlal2 v0.2d, v1.4s, v1.s[2] // encoding: [0x20,0x38,0x81,0x4f]
				// CHECK: sqdmlal2 v0.2d, v1.4s, v22.s[2] // encoding: [0x20,0x38,0x96,0x4f]

				umlal v0.4s, v1.4h, v2.h[2]
				umlal v0.2d, v1.2s, v2.s[2]
				umlal v0.2d, v1.2s, v22.s[2]
				umlal2 v0.4s, v1.8h, v1.h[2]
				umlal2 v0.2d, v1.4s, v1.s[2]
				umlal2 v0.2d, v1.4s, v22.s[2]

				// CHECK: umlal v0.4s, v1.4h, v2.h[2] // encoding: [0x20,0x20,0x62,0x2f]
				// CHECK: umlal v0.2d, v1.2s, v2.s[2] // encoding: [0x20,0x28,0x82,0x2f]
				// CHECK: umlal v0.2d, v1.2s, v22.s[2] // encoding: [0x20,0x28,0x96,0x2f]
				// CHECK: umlal2 v0.4s, v1.8h, v1.h[2] // encoding: [0x20,0x20,0x61,0x6f]
				// CHECK: umlal2 v0.2d, v1.4s, v1.s[2] // encoding: [0x20,0x28,0x81,0x6f]
				// CHECK: umlal2 v0.2d, v1.4s, v22.s[2] // encoding: [0x20,0x28,0x96,0x6f]

				umlsl v0.4s, v1.4h, v2.h[2]
				umlsl v0.2d, v1.2s, v2.s[2]
				umlsl v0.2d, v1.2s, v22.s[2]
				umlsl2 v0.4s, v1.8h, v1.h[2]
				umlsl2 v0.2d, v1.4s, v1.s[2]
				umlsl2 v0.2d, v1.4s, v22.s[2]

				// CHECK: umlsl v0.4s, v1.4h, v2.h[2] // encoding: [0x20,0x60,0x62,0x2f]
				// CHECK: umlsl v0.2d, v1.2s, v2.s[2] // encoding: [0x20,0x68,0x82,0x2f]
				// CHECK: umlsl v0.2d, v1.2s, v22.s[2] // encoding: [0x20,0x68,0x96,0x2f]
				// CHECK: umlsl2 v0.4s, v1.8h, v1.h[2] // encoding: [0x20,0x60,0x61,0x6f]
				// CHECK: umlsl2 v0.2d, v1.4s, v1.s[2] // encoding: [0x20,0x68,0x81,0x6f]
				// CHECK: umlsl2 v0.2d, v1.4s, v22.s[2] // encoding: [0x20,0x68,0x96,0x6f]

				sqdmlsl v0.4s, v1.4h, v2.h[2]
				sqdmlsl v0.2d, v1.2s, v2.s[2]
				sqdmlsl v0.2d, v1.2s, v22.s[2]
				sqdmlsl2 v0.4s, v1.8h, v1.h[2]
				sqdmlsl2 v0.2d, v1.4s, v1.s[2]
				sqdmlsl2 v0.2d, v1.4s, v22.s[2]

				// CHECK: sqdmlsl v0.4s, v1.4h, v2.h[2] // encoding: [0x20,0x70,0x62,0x0f]
				// CHECK: sqdmlsl v0.2d, v1.2s, v2.s[2] // encoding: [0x20,0x78,0x82,0x0f]
				// CHECK: sqdmlsl v0.2d, v1.2s, v22.s[2] // encoding: [0x20,0x78,0x96,0x0f]
				// CHECK: sqdmlsl2 v0.4s, v1.8h, v1.h[2] // encoding: [0x20,0x70,0x61,0x4f]
				// CHECK: sqdmlsl2 v0.2d, v1.4s, v1.s[2] // encoding: [0x20,0x78,0x81,0x4f]
				// CHECK: sqdmlsl2 v0.2d, v1.4s, v22.s[2] // encoding: [0x20,0x78,0x96,0x4f]

				mul v0.4h, v1.4h, v2.h[2]
				mul v0.8h, v1.8h, v2.h[2]
				mul v0.2s, v1.2s, v2.s[2]
				mul v0.2s, v1.2s, v22.s[2]
				mul v0.4s, v1.4s, v2.s[2]
				mul v0.4s, v1.4s, v22.s[2]

				// CHECK: mul v0.4h, v1.4h, v2.h[2] // encoding: [0x20,0x80,0x62,0x0f]
				// CHECK: mul v0.8h, v1.8h, v2.h[2] // encoding: [0x20,0x80,0x62,0x4f]
				// CHECK: mul v0.2s, v1.2s, v2.s[2] // encoding: [0x20,0x88,0x82,0x0f]
				// CHECK: mul v0.2s, v1.2s, v22.s[2] // encoding: [0x20,0x88,0x96,0x0f]
				// CHECK: mul v0.4s, v1.4s, v2.s[2] // encoding: [0x20,0x88,0x82,0x4f]
				// CHECK: mul v0.4s, v1.4s, v22.s[2] // encoding: [0x20,0x88,0x96,0x4f]

				fmul v0.2s, v1.2s, v2.s[2]
				fmul v0.2s, v1.2s, v22.s[2]
				fmul v0.4s, v1.4s, v2.s[2]
				fmul v0.4s, v1.4s, v22.s[2]
				fmul v0.2d, v1.2d, v2.d[1]
				fmul v0.2d, v1.2d, v22.d[1]

				// CHECK: fmul v0.2s, v1.2s, v2.s[2] // encoding: [0x20,0x98,0x82,0x0f]
				// CHECK: fmul v0.2s, v1.2s, v22.s[2] // encoding: [0x20,0x98,0x96,0x0f]
				// CHECK: fmul v0.4s, v1.4s, v2.s[2] // encoding: [0x20,0x98,0x82,0x4f]
				// CHECK: fmul v0.4s, v1.4s, v22.s[2] // encoding: [0x20,0x98,0x96,0x4f]
				// CHECK: fmul v0.2d, v1.2d, v2.d[1] // encoding: [0x20,0x98,0xc2,0x4f]
				// CHECK: fmul v0.2d, v1.2d, v22.d[1] // encoding: [0x20,0x98,0xd6,0x4f]

				fmulx v0.2s, v1.2s, v2.s[2]
				fmulx v0.2s, v1.2s, v22.s[2]
				fmulx v0.4s, v1.4s, v2.s[2]
				fmulx v0.4s, v1.4s, v22.s[2]
				fmulx v0.2d, v1.2d, v2.d[1]
				fmulx v0.2d, v1.2d, v22.d[1]

				// CHECK: fmulx v0.2s, v1.2s, v2.s[2] // encoding: [0x20,0x98,0x82,0x2f]
				// CHECK: fmulx v0.2s, v1.2s, v22.s[2] // encoding: [0x20,0x98,0x96,0x2f]
				// CHECK: fmulx v0.4s, v1.4s, v2.s[2] // encoding: [0x20,0x98,0x82,0x6f]
				// CHECK: fmulx v0.4s, v1.4s, v22.s[2] // encoding: [0x20,0x98,0x96,0x6f]
				// CHECK: fmulx v0.2d, v1.2d, v2.d[1] // encoding: [0x20,0x98,0xc2,0x6f]
				// CHECK: fmulx v0.2d, v1.2d, v22.d[1] // encoding: [0x20,0x98,0xd6,0x6f]

				smull v0.4s, v1.4h, v2.h[2]
				smull v0.2d, v1.2s, v2.s[2]
				smull v0.2d, v1.2s, v22.s[2]
				smull2 v0.4s, v1.8h, v2.h[2]
				smull2 v0.2d, v1.4s, v2.s[2]
				smull2 v0.2d, v1.4s, v22.s[2]

				// CHECK: smull v0.4s, v1.4h, v2.h[2] // encoding: [0x20,0xa0,0x62,0x0f]
				// CHECK: smull v0.2d, v1.2s, v2.s[2] // encoding: [0x20,0xa8,0x82,0x0f]
				// CHECK: smull v0.2d, v1.2s, v22.s[2] // encoding: [0x20,0xa8,0x96,0x0f]
				// CHECK: smull2 v0.4s, v1.8h, v2.h[2] // encoding: [0x20,0xa0,0x62,0x4f]
				// CHECK: smull2 v0.2d, v1.4s, v2.s[2] // encoding: [0x20,0xa8,0x82,0x4f]
				// CHECK: smull2 v0.2d, v1.4s, v22.s[2] // encoding: [0x20,0xa8,0x96,0x4f]

				umull v0.4s, v1.4h, v2.h[2]
				umull v0.2d, v1.2s, v2.s[2]
				umull v0.2d, v1.2s, v22.s[2]
				umull2 v0.4s, v1.8h, v2.h[2]
				umull2 v0.2d, v1.4s, v2.s[2]
				umull2 v0.2d, v1.4s, v22.s[2]

				// CHECK: umull v0.4s, v1.4h, v2.h[2] // encoding: [0x20,0xa0,0x62,0x2f]
				// CHECK: umull v0.2d, v1.2s, v2.s[2] // encoding: [0x20,0xa8,0x82,0x2f]
				// CHECK: umull v0.2d, v1.2s, v22.s[2] // encoding: [0x20,0xa8,0x96,0x2f]
				// CHECK: umull2 v0.4s, v1.8h, v2.h[2] // encoding: [0x20,0xa0,0x62,0x6f]
				// CHECK: umull2 v0.2d, v1.4s, v2.s[2] // encoding: [0x20,0xa8,0x82,0x6f]
				// CHECK: umull2 v0.2d, v1.4s, v22.s[2] // encoding: [0x20,0xa8,0x96,0x6f]

				sqdmull v0.4s, v1.4h, v2.h[2]
				sqdmull v0.2d, v1.2s, v2.s[2]
				sqdmull v0.2d, v1.2s, v22.s[2]
				sqdmull2 v0.4s, v1.8h, v2.h[2]
				sqdmull2 v0.2d, v1.4s, v2.s[2]
				sqdmull2 v0.2d, v1.4s, v22.s[2]

				// CHECK: sqdmull v0.4s, v1.4h, v2.h[2] // encoding: [0x20,0xb0,0x62,0x0f]
				// CHECK: sqdmull v0.2d, v1.2s, v2.s[2] // encoding: [0x20,0xb8,0x82,0x0f]
				// CHECK: sqdmull v0.2d, v1.2s, v22.s[2] // encoding: [0x20,0xb8,0x96,0x0f]
				// CHECK: sqdmull2 v0.4s, v1.8h, v2.h[2] // encoding: [0x20,0xb0,0x62,0x4f]
				// CHECK: sqdmull2 v0.2d, v1.4s, v2.s[2] // encoding: [0x20,0xb8,0x82,0x4f]
				// CHECK: sqdmull2 v0.2d, v1.4s, v22.s[2] // encoding: [0x20,0xb8,0x96,0x4f]

				sqdmulh v0.4h, v1.4h, v2.h[2]
				sqdmulh v0.8h, v1.8h, v2.h[2]
				sqdmulh v0.2s, v1.2s, v2.s[2]
				sqdmulh v0.2s, v1.2s, v22.s[2]
				sqdmulh v0.4s, v1.4s, v2.s[2]
				sqdmulh v0.4s, v1.4s, v22.s[2]

				// CHECK: sqdmulh v0.4h, v1.4h, v2.h[2] // encoding: [0x20,0xc0,0x62,0x0f]
				// CHECK: sqdmulh v0.8h, v1.8h, v2.h[2] // encoding: [0x20,0xc0,0x62,0x4f]
				// CHECK: sqdmulh v0.2s, v1.2s, v2.s[2] // encoding: [0x20,0xc8,0x82,0x0f]
				// CHECK: sqdmulh v0.2s, v1.2s, v22.s[2] // encoding: [0x20,0xc8,0x96,0x0f]
				// CHECK: sqdmulh v0.4s, v1.4s, v2.s[2] // encoding: [0x20,0xc8,0x82,0x4f]
				// CHECK: sqdmulh v0.4s, v1.4s, v22.s[2] // encoding: [0x20,0xc8,0x96,0x4f]

				sqrdmulh v0.4h, v1.4h, v2.h[2]
				sqrdmulh v0.8h, v1.8h, v2.h[2]
				sqrdmulh v0.2s, v1.2s, v2.s[2]
				sqrdmulh v0.2s, v1.2s, v22.s[2]
				sqrdmulh v0.4s, v1.4s, v2.s[2]
				sqrdmulh v0.4s, v1.4s, v22.s[2]

				// CHECK: sqrdmulh v0.4h, v1.4h, v2.h[2] // encoding: [0x20,0xd0,0x62,0x0f]
				// CHECK: sqrdmulh v0.8h, v1.8h, v2.h[2] // encoding: [0x20,0xd0,0x62,0x4f]
				// CHECK: sqrdmulh v0.2s, v1.2s, v2.s[2] // encoding: [0x20,0xd8,0x82,0x0f]
				// CHECK: sqrdmulh v0.2s, v1.2s, v22.s[2] // encoding: [0x20,0xd8,0x96,0x0f]
				// CHECK: sqrdmulh v0.4s, v1.4s, v2.s[2] // encoding: [0x20,0xd8,0x82,0x4f]
				// CHECK: sqrdmulh v0.4s, v1.4s, v22.s[2] // encoding: [0x20,0xd8,0x96,0x4f]

test/MC/AArch64/neon-diagnostics.s

Context not available.
	// CHECK-ERROR: fminnmp v1.4s, v2.2d	// CHECK-ERROR: fminnmp v1.4s, v2.2d
	// CHECK-ERROR: ^	// CHECK-ERROR: ^

		mla v0.2d, v1.2d, v16.d[1]
		mla v0.2s, v1.2s, v2.s[4]
		mla v0.4s, v1.4s, v2.s[4]
		mla v0.2h, v1.2h, v2.h[1]
		mla v0.4h, v1.4h, v2.h[8]
		mla v0.8h, v1.8h, v2.h[8]
		mla v0.4h, v1.4h, v16.h[2]
		mla v0.8h, v1.8h, v16.h[2]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: mla v0.2d, v1.2d, v16.d[1]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mla v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mla v0.4s, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: mla v0.2h, v1.2h, v2.h[1]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mla v0.4h, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mla v0.8h, v1.8h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: mla v0.4h, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: mla v0.8h, v1.8h, v16.h[2]
		// CHECK-ERROR: ^

		mls v0.2d, v1.2d, v16.d[1]
		mls v0.2s, v1.2s, v2.s[4]
		mls v0.4s, v1.4s, v2.s[4]
		mls v0.2h, v1.2h, v2.h[1]
		mls v0.4h, v1.4h, v2.h[8]
		mls v0.8h, v1.8h, v2.h[8]
		mls v0.4h, v1.4h, v16.h[2]
		mls v0.8h, v1.8h, v16.h[2]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: mls v0.2d, v1.2d, v16.d[1]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mls v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mls v0.4s, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: mls v0.2h, v1.2h, v2.h[1]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mls v0.4h, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mls v0.8h, v1.8h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: mls v0.4h, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: mls v0.8h, v1.8h, v16.h[2]
		// CHECK-ERROR: ^

		fmla v0.4h, v1.4h, v2.h[2]
		fmla v0.8h, v1.8h, v2.h[2]
		fmla v0.2s, v1.2s, v2.s[4]
		fmla v0.2s, v1.2s, v22.s[4]
		fmla v3.4s, v8.4s, v2.s[4]
		fmla v3.4s, v8.4s, v22.s[4]
		fmla v0.2d, v1.2d, v2.d[2]
		fmla v0.2d, v1.2d, v22.d[2]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: fmla v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: fmla v0.8h, v1.8h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmla v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmla v0.2s, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmla v3.4s, v8.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmla v3.4s, v8.4s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmla v0.2d, v1.2d, v2.d[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmla v0.2d, v1.2d, v22.d[2]
		// CHECK-ERROR: ^

		fmls v0.4h, v1.4h, v2.h[2]
		fmls v0.8h, v1.8h, v2.h[2]
		fmls v0.2s, v1.2s, v2.s[4]
		fmls v0.2s, v1.2s, v22.s[4]
		fmls v3.4s, v8.4s, v2.s[4]
		fmls v3.4s, v8.4s, v22.s[4]
		fmls v0.2d, v1.2d, v2.d[2]
		fmls v0.2d, v1.2d, v22.d[2]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: fmls v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: fmls v0.8h, v1.8h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmls v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmls v0.2s, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmls v3.4s, v8.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmls v3.4s, v8.4s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmls v0.2d, v1.2d, v2.d[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmls v0.2d, v1.2d, v22.d[2]
		// CHECK-ERROR: ^

		smlal v0.4h, v1.4h, v2.h[2]
		smlal v0.4s, v1.4h, v2.h[8]
		smlal v0.4s, v1.4h, v16.h[2]
		smlal v0.2s, v1.2s, v2.s[4]
		smlal v0.2d, v1.2s, v2.s[4]
		smlal v0.2d, v1.2s, v22.s[4]
		smlal2 v0.4h, v1.8h, v1.h[2]
		smlal2 v0.4s, v1.8h, v1.h[8]
		smlal2 v0.4s, v1.8h, v16.h[2]
		smlal2 v0.2s, v1.4s, v1.s[2]
		smlal2 v0.2d, v1.4s, v1.s[4]
		smlal2 v0.2d, v1.4s, v22.s[4]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlal v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlal v0.4s, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlal v0.4s, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlal v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlal v0.2d, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlal v0.2d, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlal2 v0.4h, v1.8h, v1.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlal2 v0.4s, v1.8h, v1.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlal2 v0.4s, v1.8h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlal2 v0.2s, v1.4s, v1.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlal2 v0.2d, v1.4s, v1.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlal2 v0.2d, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		smlsl v0.4h, v1.4h, v2.h[2]
		smlsl v0.4s, v1.4h, v2.h[8]
		smlsl v0.4s, v1.4h, v16.h[2]
		smlsl v0.2s, v1.2s, v2.s[4]
		smlsl v0.2d, v1.2s, v2.s[4]
		smlsl v0.2d, v1.2s, v22.s[4]
		smlsl2 v0.4h, v1.8h, v1.h[2]
		smlsl2 v0.4s, v1.8h, v1.h[8]
		smlsl2 v0.4s, v1.8h, v16.h[2]
		smlsl2 v0.2s, v1.4s, v1.s[2]
		smlsl2 v0.2d, v1.4s, v1.s[4]
		smlsl2 v0.2d, v1.4s, v22.s[4]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlsl v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlsl v0.4s, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlsl v0.4s, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlsl v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlsl v0.2d, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlsl v0.2d, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlsl2 v0.4h, v1.8h, v1.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlsl2 v0.4s, v1.8h, v1.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlsl2 v0.4s, v1.8h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smlsl2 v0.2s, v1.4s, v1.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlsl2 v0.2d, v1.4s, v1.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smlsl2 v0.2d, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		umlal v0.4h, v1.4h, v2.h[2]
		umlal v0.4s, v1.4h, v2.h[8]
		umlal v0.4s, v1.4h, v16.h[2]
		umlal v0.2s, v1.2s, v2.s[4]
		umlal v0.2d, v1.2s, v2.s[4]
		umlal v0.2d, v1.2s, v22.s[4]
		umlal2 v0.4h, v1.8h, v1.h[2]
		umlal2 v0.4s, v1.8h, v1.h[8]
		umlal2 v0.4s, v1.8h, v16.h[2]
		umlal2 v0.2s, v1.4s, v1.s[2]
		umlal2 v0.2d, v1.4s, v1.s[4]
		umlal2 v0.2d, v1.4s, v22.s[4]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlal v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlal v0.4s, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlal v0.4s, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlal v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlal v0.2d, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlal v0.2d, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlal2 v0.4h, v1.8h, v1.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlal2 v0.4s, v1.8h, v1.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlal2 v0.4s, v1.8h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlal2 v0.2s, v1.4s, v1.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlal2 v0.2d, v1.4s, v1.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlal2 v0.2d, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		umlsl v0.4h, v1.4h, v2.h[2]
		umlsl v0.4s, v1.4h, v2.h[8]
		umlsl v0.4s, v1.4h, v16.h[2]
		umlsl v0.2s, v1.2s, v2.s[4]
		umlsl v0.2d, v1.2s, v2.s[4]
		umlsl v0.2d, v1.2s, v22.s[4]
		umlsl2 v0.4h, v1.8h, v1.h[2]
		umlsl2 v0.4s, v1.8h, v1.h[8]
		umlsl2 v0.4s, v1.8h, v16.h[2]
		umlsl2 v0.2s, v1.4s, v1.s[2]
		umlsl2 v0.2d, v1.4s, v1.s[4]
		umlsl2 v0.2d, v1.4s, v22.s[4]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlsl v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlsl v0.4s, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlsl v0.4s, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlsl v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlsl v0.2d, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlsl v0.2d, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlsl2 v0.4h, v1.8h, v1.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlsl2 v0.4s, v1.8h, v1.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlsl2 v0.4s, v1.8h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umlsl2 v0.2s, v1.4s, v1.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlsl2 v0.2d, v1.4s, v1.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umlsl2 v0.2d, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		sqdmlal v0.4h, v1.4h, v2.h[2]
		sqdmlal v0.4s, v1.4h, v2.h[8]
		sqdmlal v0.4s, v1.4h, v16.h[2]
		sqdmlal v0.2s, v1.2s, v2.s[4]
		sqdmlal v0.2d, v1.2s, v2.s[4]
		sqdmlal v0.2d, v1.2s, v22.s[4]
		sqdmlal2 v0.4h, v1.8h, v1.h[2]
		sqdmlal2 v0.4s, v1.8h, v1.h[8]
		sqdmlal2 v0.4s, v1.8h, v16.h[2]
		sqdmlal2 v0.2s, v1.4s, v1.s[2]
		sqdmlal2 v0.2d, v1.4s, v1.s[4]
		sqdmlal2 v0.2d, v1.4s, v22.s[4]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlal v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlal v0.4s, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlal v0.4s, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlal v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlal v0.2d, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlal v0.2d, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlal2 v0.4h, v1.8h, v1.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlal2 v0.4s, v1.8h, v1.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlal2 v0.4s, v1.8h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlal2 v0.2s, v1.4s, v1.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlal2 v0.2d, v1.4s, v1.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlal2 v0.2d, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		sqdmlsl v0.4h, v1.4h, v2.h[2]
		sqdmlsl v0.4s, v1.4h, v2.h[8]
		sqdmlsl v0.4s, v1.4h, v16.h[2]
		sqdmlsl v0.2s, v1.2s, v2.s[4]
		sqdmlsl v0.2d, v1.2s, v2.s[4]
		sqdmlsl v0.2d, v1.2s, v22.s[4]
		sqdmlsl2 v0.4h, v1.8h, v1.h[2]
		sqdmlsl2 v0.4s, v1.8h, v1.h[8]
		sqdmlsl2 v0.4s, v1.8h, v16.h[2]
		sqdmlsl2 v0.2s, v1.4s, v1.s[2]
		sqdmlsl2 v0.2d, v1.4s, v1.s[4]
		sqdmlsl2 v0.2d, v1.4s, v22.s[4]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlsl v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlsl v0.4s, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlsl v0.4s, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlsl v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlsl v0.2d, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlsl v0.2d, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlsl2 v0.4h, v1.8h, v1.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlsl2 v0.4s, v1.8h, v1.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlsl2 v0.4s, v1.8h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmlsl2 v0.2s, v1.4s, v1.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlsl2 v0.2d, v1.4s, v1.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmlsl2 v0.2d, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		mul v0.4h, v1.4h, v2.h[8]
		mul v0.4h, v1.4h, v16.h[8]
		mul v0.8h, v1.8h, v2.h[8]
		mul v0.8h, v1.8h, v16.h[8]
		mul v0.2s, v1.2s, v2.s[4]
		mul v0.2s, v1.2s, v22.s[4]
		mul v0.4s, v1.4s, v2.s[4]
		mul v0.4s, v1.4s, v22.s[4]
		mul v0.2d, v1.2d, v2.d[1]

		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mul v0.4h, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mul v0.4h, v1.4h, v16.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mul v0.8h, v1.8h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mul v0.8h, v1.8h, v16.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mul v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mul v0.2s, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mul v0.4s, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: mul v0.4s, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		fmul v0.4h, v1.4h, v2.h[4]
		fmul v0.2s, v1.2s, v2.s[4]
		fmul v0.2s, v1.2s, v22.s[4]
		fmul v0.4s, v1.4s, v2.s[4]
		fmul v0.4s, v1.4s, v22.s[4]
		fmul v0.2d, v1.2d, v2.d[2]
		fmul v0.2d, v1.2d, v22.d[2]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: mul v0.2d, v1.2d, v2.d[1]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: fmul v0.4h, v1.4h, v2.h[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmul v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmul v0.2s, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmul v0.4s, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmul v0.4s, v1.4s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmul v0.2d, v1.2d, v2.d[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmul v0.2d, v1.2d, v22.d[2]
		// CHECK-ERROR: ^

		fmulx v0.4h, v1.4h, v2.h[4]
		fmulx v0.2s, v1.2s, v2.s[4]
		fmulx v0.2s, v1.2s, v22.s[4]
		fmulx v0.4s, v1.4s, v2.s[4]
		fmulx v0.4s, v1.4s, v22.s[4]
		fmulx v0.2d, v1.2d, v2.d[2]
		fmulx v0.2d, v1.2d, v22.d[2]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: fmulx v0.4h, v1.4h, v2.h[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmulx v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmulx v0.2s, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmulx v0.4s, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmulx v0.4s, v1.4s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmulx v0.2d, v1.2d, v2.d[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: fmulx v0.2d, v1.2d, v22.d[2]
		// CHECK-ERROR: ^

		smull v0.4h, v1.4h, v2.h[2]
		smull v0.4s, v1.4h, v2.h[8]
		smull v0.4s, v1.4h, v16.h[4]
		smull v0.2s, v1.2s, v2.s[2]
		smull v0.2d, v1.2s, v2.s[4]
		smull v0.2d, v1.2s, v22.s[4]
		smull2 v0.4h, v1.8h, v2.h[2]
		smull2 v0.4s, v1.8h, v2.h[8]
		smull2 v0.4s, v1.8h, v16.h[4]
		smull2 v0.2s, v1.4s, v2.s[2]
		smull2 v0.2d, v1.4s, v2.s[4]
		smull2 v0.2d, v1.4s, v22.s[4]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smull v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smull v0.4s, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smull v0.4s, v1.4h, v16.h[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smull v0.2s, v1.2s, v2.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smull v0.2d, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smull v0.2d, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smull2 v0.4h, v1.8h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smull2 v0.4s, v1.8h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smull2 v0.4s, v1.8h, v16.h[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: smull2 v0.2s, v1.4s, v2.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smull2 v0.2d, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: smull2 v0.2d, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		umull v0.4h, v1.4h, v2.h[2]
		umull v0.4s, v1.4h, v2.h[8]
		umull v0.4s, v1.4h, v16.h[4]
		umull v0.2s, v1.2s, v2.s[2]
		umull v0.2d, v1.2s, v2.s[4]
		umull v0.2d, v1.2s, v22.s[4]
		umull2 v0.4h, v1.8h, v2.h[2]
		umull2 v0.4s, v1.8h, v2.h[8]
		umull2 v0.4s, v1.8h, v16.h[4]
		umull2 v0.2s, v1.4s, v2.s[2]
		umull2 v0.2d, v1.4s, v2.s[4]
		umull2 v0.2d, v1.4s, v22.s[4]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umull v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umull v0.4s, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umull v0.4s, v1.4h, v16.h[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umull v0.2s, v1.2s, v2.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umull v0.2d, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umull v0.2d, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umull2 v0.4h, v1.8h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umull2 v0.4s, v1.8h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umull2 v0.4s, v1.8h, v16.h[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: umull2 v0.2s, v1.4s, v2.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umull2 v0.2d, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: umull2 v0.2d, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		sqdmull v0.4h, v1.4h, v2.h[2]
		sqdmull v0.4s, v1.4h, v2.h[8]
		sqdmull v0.4s, v1.4h, v16.h[4]
		sqdmull v0.2s, v1.2s, v2.s[2]
		sqdmull v0.2d, v1.2s, v2.s[4]
		sqdmull v0.2d, v1.2s, v22.s[4]
		sqdmull2 v0.4h, v1.8h, v2.h[2]
		sqdmull2 v0.4s, v1.8h, v2.h[8]
		sqdmull2 v0.4s, v1.8h, v16.h[4]
		sqdmull2 v0.2s, v1.4s, v2.s[2]
		sqdmull2 v0.2d, v1.4s, v2.s[4]
		sqdmull2 v0.2d, v1.4s, v22.s[4]

		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmull v0.4h, v1.4h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmull v0.4s, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmull v0.4s, v1.4h, v16.h[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmull v0.2s, v1.2s, v2.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmull v0.2d, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmull v0.2d, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmull2 v0.4h, v1.8h, v2.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmull2 v0.4s, v1.8h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmull2 v0.4s, v1.8h, v16.h[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmull2 v0.2s, v1.4s, v2.s[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmull2 v0.2d, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmull2 v0.2d, v1.4s, v22.s[4]
		// CHECK-ERROR: ^

		sqdmulh v0.4h, v1.4h, v2.h[8]
		sqdmulh v0.4h, v1.4h, v16.h[2]
		sqdmulh v0.8h, v1.8h, v2.h[8]
		sqdmulh v0.8h, v1.8h, v16.h[2]
		sqdmulh v0.2s, v1.2s, v2.s[4]
		sqdmulh v0.2s, v1.2s, v22.s[4]
		sqdmulh v0.4s, v1.4s, v2.s[4]
		sqdmulh v0.4s, v1.4s, v22.s[4]
		sqdmulh v0.2d, v1.2d, v22.d[1]

		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmulh v0.4h, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmulh v0.4h, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmulh v0.8h, v1.8h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmulh v0.8h, v1.8h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmulh v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmulh v0.2s, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmulh v0.4s, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqdmulh v0.4s, v1.4s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqdmulh v0.2d, v1.2d, v22.d[1]
		// CHECK-ERROR: ^

		sqrdmulh v0.4h, v1.4h, v2.h[8]
		sqrdmulh v0.4h, v1.4h, v16.h[2]
		sqrdmulh v0.8h, v1.8h, v2.h[8]
		sqrdmulh v0.8h, v1.8h, v16.h[2]
		sqrdmulh v0.2s, v1.2s, v2.s[4]
		sqrdmulh v0.2s, v1.2s, v22.s[4]
		sqrdmulh v0.4s, v1.4s, v2.s[4]
		sqrdmulh v0.4s, v1.4s, v22.s[4]
		sqrdmulh v0.2d, v1.2d, v22.d[1]

		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqrdmulh v0.4h, v1.4h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqrdmulh v0.4h, v1.4h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqrdmulh v0.8h, v1.8h, v2.h[8]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqrdmulh v0.8h, v1.8h, v16.h[2]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqrdmulh v0.2s, v1.2s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqrdmulh v0.2s, v1.2s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqrdmulh v0.4s, v1.4s, v2.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: lane number incompatible with layout
		// CHECK-ERROR: sqrdmulh v0.4s, v1.4s, v22.s[4]
		// CHECK-ERROR: ^
		// CHECK-ERROR: error: invalid operand for instruction
		// CHECK-ERROR: sqrdmulh v0.2d, v1.2d, v22.d[1]
		// CHECK-ERROR: ^
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

Implement aarch64 neon instruction class AdvSIMD (by element) - LLVMClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 4470

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64InstrFormats.td

lib/Target/AArch64/AArch64InstrNEON.td

lib/Target/AArch64/AArch64RegisterInfo.td

lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp

test/CodeGen/AArch64/neon-2velem.ll

test/MC/AArch64/neon-2velem.s

test/MC/AArch64/neon-diagnostics.s

Implement aarch64 neon instruction class AdvSIMD (by element) - LLVM
ClosedPublic