This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)
ClosedPublic

Authored by sanwou01 on Dec 13 2019, 8:02 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen
t.p.northover
rovka
rengolin
efriedma

Commits

rG2939fc13c8f6: [AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)

Summary

Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:

%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)

When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.

This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:

int16x8_t v = {2,3,4,5,6,7,8,9};
a = vqdmulh_laneq_s16(a, v, 0);
b = vqdmulh_laneq_s16(b, v, 1);
c = vqdmulh_laneq_s16(c, v, 2);
d = vqdmulh_laneq_s16(d, v, 3);
[...]

In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.

We could teach the compiler to recover the lane variants, but this would likely
require its own pass. (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)

This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:

sqdmulh_lane
sqdmulh_laneq
sqrdmulh_lane
sqrdmulh_laneq.

These 'lane' variants need an additional register class. The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.

Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.

This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sanwou01 created this revision.Dec 13 2019, 8:02 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 13 2019, 8:02 AM

Herald added subscribers: llvm-commits, cfe-commits, jdoerfert and 2 others. · View Herald Transcript

sanwou01 added reviewers: SjoerdMeijer, dmgreen, t.p.northover.Dec 13 2019, 8:06 AM

Harbormaster completed remote builds in B42465: Diff 233807.Dec 13 2019, 8:06 AM

ping?

This makes it impossible to do a neat trick when using NEON intrinsics: one can load a number of constants using a single vector load, which are then repeatedly used to multiply whole vectors by one of the constants. This trick is used for a nice performance upside (2.5% to 4% on one microbenchmark) in libjpeg-turbo.

I'm not completely sure I follow here. The "trick" is something like the following?

int16x8_t v = {2,3,4,5,6,7,8,9};
a = vqdmulh_laneq_s16(a, v, 0);
b = vqdmulh_laneq_s16(b, v, 1);
c = vqdmulh_laneq_s16(c, v, 2);
d = vqdmulh_laneq_s16(d, v, 3);
[...]

I can see how that could be helpful. The compiler could probably be taught to recover something like the original structure, but it would probably require a dedicated pass. Or I guess you could hack the source to use "volatile", but that would be ugly.

I'm a little unhappy we're forced to introduce more intrinsics here, but it might be the best solution to avoid breaking carefully tuned code like this.

llvm/lib/IR/Function.cpp
1374 ↗	(On Diff #233807)	Hardcoding "64" and "128" in target-independent code here seems like a bad idea. Can we just let both vector operands have any vector type, and reject in the backend if we see an unexpected type?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6054 ↗	(On Diff #233807)	Is this related somehow?

Thanks Eli.

The "trick" is something like the following?
[...]

Yeah, that's exactly right. Your assessment of the options (dedicated pass, "volatile") matches our thinking as well. I'll update the commit message to make this a bit clearer.

llvm/lib/IR/Function.cpp
1374 ↗	(On Diff #233807)	Makes sense. Any type vector for both operands is certainly doable. Instruction selection will fail if you try to use a non-existent intrinsic, which is not the nicest failure mode, but probably good enough for intrinsics? Emitting the correct arm_neon.h for clang is a little less trivial, but not by too much.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6054 ↗	(On Diff #233807)	This popped up when I was looking for uses of FPR128_loRegClass; it made sense to do the same for FPR64_lo. Doesn't seem essential though, so I'm happy to leave this out.

Address Eli's feedback; clarified commit message.

Harbormaster completed remote builds in B45140: Diff 240902.Jan 28 2020, 9:21 AM

LGTM

This revision is now accepted and ready to land.Jan 28 2020, 12:38 PM

Closed by commit rG2939fc13c8f6: [AArch64] Add IR intrinsics for sq(r)dmulh_lane(q) (authored by sanwou01). · Explain WhyJan 29 2020, 5:40 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

arm_neon.td

16 lines

lib/

CodeGen/

CGBuiltin.cpp

26 lines

test/

CodeGen/

aarch64-neon-2velem.c

320 lines

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

8 lines

lib/

Target/

AArch64/

AArch64InstrFormats.td

61 lines

AArch64InstrInfo.td

5 lines

AArch64RegisterBankInfo.cpp

1 line

AArch64RegisterInfo.cpp

1 line

AArch64RegisterInfo.td

7 lines

AsmParser/

AArch64AsmParser.cpp

6 lines

test/

CodeGen/

AArch64/

arm64-neon-2velem.ll

264 lines

Diff 241125

clang/include/clang/Basic/arm_neon.td

Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines
def VMUL_N : IOpInst<"vmul_n", "..1", "sifUsUiQsQiQfQUsQUi", OP_MUL_N>;		def VMUL_N : IOpInst<"vmul_n", "..1", "sifUsUiQsQiQfQUsQUi", OP_MUL_N>;
def VMUL_LANE : IOpInst<"vmul_lane", "..qI",		def VMUL_LANE : IOpInst<"vmul_lane", "..qI",
"sifUsUiQsQiQfQUsQUi", OP_MUL_LN>;		"sifUsUiQsQiQfQUsQUi", OP_MUL_LN>;
def VMULL_N : SOpInst<"vmull_n", "(>Q).1", "siUsUi", OP_MULL_N>;		def VMULL_N : SOpInst<"vmull_n", "(>Q).1", "siUsUi", OP_MULL_N>;
def VMULL_LANE : SOpInst<"vmull_lane", "(>Q)..I", "siUsUi", OP_MULL_LN>;		def VMULL_LANE : SOpInst<"vmull_lane", "(>Q)..I", "siUsUi", OP_MULL_LN>;
def VQDMULL_N : SOpInst<"vqdmull_n", "(>Q).1", "si", OP_QDMULL_N>;		def VQDMULL_N : SOpInst<"vqdmull_n", "(>Q).1", "si", OP_QDMULL_N>;
def VQDMULL_LANE : SOpInst<"vqdmull_lane", "(>Q)..I", "si", OP_QDMULL_LN>;		def VQDMULL_LANE : SOpInst<"vqdmull_lane", "(>Q)..I", "si", OP_QDMULL_LN>;
def VQDMULH_N : SOpInst<"vqdmulh_n", "..1", "siQsQi", OP_QDMULH_N>;		def VQDMULH_N : SOpInst<"vqdmulh_n", "..1", "siQsQi", OP_QDMULH_N>;
def VQDMULH_LANE : SOpInst<"vqdmulh_lane", "..qI", "siQsQi", OP_QDMULH_LN>;
def VQRDMULH_N : SOpInst<"vqrdmulh_n", "..1", "siQsQi", OP_QRDMULH_N>;		def VQRDMULH_N : SOpInst<"vqrdmulh_n", "..1", "siQsQi", OP_QRDMULH_N>;

		let ArchGuard = "!defined(__aarch64__)" in {
		def VQDMULH_LANE : SOpInst<"vqdmulh_lane", "..qI", "siQsQi", OP_QDMULH_LN>;
def VQRDMULH_LANE : SOpInst<"vqrdmulh_lane", "..qI", "siQsQi", OP_QRDMULH_LN>;		def VQRDMULH_LANE : SOpInst<"vqrdmulh_lane", "..qI", "siQsQi", OP_QRDMULH_LN>;
		}
		let ArchGuard = "defined(__aarch64__)" in {
		def A64_VQDMULH_LANE : SInst<"vqdmulh_lane", "..qI", "siQsQi">;
		def A64_VQRDMULH_LANE : SInst<"vqrdmulh_lane", "..qI", "siQsQi">;
		}

let ArchGuard = "defined(__ARM_FEATURE_QRDMX)" in {		let ArchGuard = "defined(__ARM_FEATURE_QRDMX)" in {
def VQRDMLAH_LANE : SOpInst<"vqrdmlah_lane", "...qI", "siQsQi", OP_QRDMLAH_LN>;		def VQRDMLAH_LANE : SOpInst<"vqrdmlah_lane", "...qI", "siQsQi", OP_QRDMLAH_LN>;
def VQRDMLSH_LANE : SOpInst<"vqrdmlsh_lane", "...qI", "siQsQi", OP_QRDMLSH_LN>;		def VQRDMLSH_LANE : SOpInst<"vqrdmlsh_lane", "...qI", "siQsQi", OP_QRDMLSH_LN>;
}		}

def VMLA_N : IOpInst<"vmla_n", "...1", "siUsUifQsQiQUsQUiQf", OP_MLA_N>;		def VMLA_N : IOpInst<"vmla_n", "...1", "siUsUifQsQiQUsQUiQf", OP_MLA_N>;
def VMLAL_N : SOpInst<"vmlal_n", "(>Q)(>Q).1", "siUsUi", OP_MLAL_N>;		def VMLAL_N : SOpInst<"vmlal_n", "(>Q)(>Q).1", "siUsUi", OP_MLAL_N>;
▲ Show 20 Lines • Show All 404 Lines • ▼ Show 20 Lines	def VMULL_HIGH_LANEQ : SOpInst<"vmull_high_laneq", "(>Q)QQI", "siUsUi",
OP_MULLHi_LN>;		OP_MULLHi_LN>;

def VQDMULL_LANEQ : SOpInst<"vqdmull_laneq", "(>Q).QI", "si", OP_QDMULL_LN>;		def VQDMULL_LANEQ : SOpInst<"vqdmull_laneq", "(>Q).QI", "si", OP_QDMULL_LN>;
def VQDMULL_HIGH_LANE : SOpInst<"vqdmull_high_lane", "(>Q)Q.I", "si",		def VQDMULL_HIGH_LANE : SOpInst<"vqdmull_high_lane", "(>Q)Q.I", "si",
OP_QDMULLHi_LN>;		OP_QDMULLHi_LN>;
def VQDMULL_HIGH_LANEQ : SOpInst<"vqdmull_high_laneq", "(>Q)QQI", "si",		def VQDMULL_HIGH_LANEQ : SOpInst<"vqdmull_high_laneq", "(>Q)QQI", "si",
OP_QDMULLHi_LN>;		OP_QDMULLHi_LN>;

def VQDMULH_LANEQ : SOpInst<"vqdmulh_laneq", "..QI", "siQsQi", OP_QDMULH_LN>;		let isLaneQ = 1 in {
def VQRDMULH_LANEQ : SOpInst<"vqrdmulh_laneq", "..QI", "siQsQi", OP_QRDMULH_LN>;		def VQDMULH_LANEQ : SInst<"vqdmulh_laneq", "..QI", "siQsQi">;
		def VQRDMULH_LANEQ : SInst<"vqrdmulh_laneq", "..QI", "siQsQi">;
		}
let ArchGuard = "defined(__ARM_FEATURE_QRDMX) && defined(__aarch64__)" in {		let ArchGuard = "defined(__ARM_FEATURE_QRDMX) && defined(__aarch64__)" in {
def VQRDMLAH_LANEQ : SOpInst<"vqrdmlah_laneq", "...QI", "siQsQi", OP_QRDMLAH_LN>;		def VQRDMLAH_LANEQ : SOpInst<"vqrdmlah_laneq", "...QI", "siQsQi", OP_QRDMLAH_LN>;
def VQRDMLSH_LANEQ : SOpInst<"vqrdmlsh_laneq", "...QI", "siQsQi", OP_QRDMLSH_LN>;		def VQRDMLSH_LANEQ : SOpInst<"vqrdmlsh_laneq", "...QI", "siQsQi", OP_QRDMLSH_LN>;
}		}

// Note: d type implemented by SCALAR_VMULX_LANE		// Note: d type implemented by SCALAR_VMULX_LANE
def VMULX_LANE : IOpInst<"vmulx_lane", "..qI", "fQfQd", OP_MULX_LN>;		def VMULX_LANE : IOpInst<"vmulx_lane", "..qI", "fQfQd", OP_MULX_LN>;
// Note: d type is implemented by SCALAR_VMULX_LANEQ		// Note: d type is implemented by SCALAR_VMULX_LANEQ
▲ Show 20 Lines • Show All 721 Lines • ▼ Show 20 Lines	let ArchGuard = "defined(__ARM_FEATURE_COMPLEX)" in {
def VCADD_ROT270 : SInst<"vcadd_rot270", "...", "f">;		def VCADD_ROT270 : SInst<"vcadd_rot270", "...", "f">;
def VCADDQ_ROT90 : SInst<"vcaddq_rot90", "QQQ", "f">;		def VCADDQ_ROT90 : SInst<"vcaddq_rot90", "QQQ", "f">;
def VCADDQ_ROT270 : SInst<"vcaddq_rot270", "QQQ", "f">;		def VCADDQ_ROT270 : SInst<"vcaddq_rot270", "QQQ", "f">;
}		}
let ArchGuard = "defined(__ARM_FEATURE_COMPLEX) && defined(__aarch64__)" in {		let ArchGuard = "defined(__ARM_FEATURE_COMPLEX) && defined(__aarch64__)" in {
def VCADDQ_ROT90_FP64 : SInst<"vcaddq_rot90", "QQQ", "d">;		def VCADDQ_ROT90_FP64 : SInst<"vcaddq_rot90", "QQQ", "d">;
def VCADDQ_ROT270_FP64 : SInst<"vcaddq_rot270", "QQQ", "d">;		def VCADDQ_ROT270_FP64 : SInst<"vcaddq_rot270", "QQQ", "d">;
}		}
No newline at end of file		No newline at end of file

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,963 Lines • ▼ Show 20 Lines	static const NeonIntrinsicInfo AArch64SIMDIntrinsicMap[] = {
NEONMAP2(vpaddlq_v, aarch64_neon_uaddlp, aarch64_neon_saddlp, UnsignedAlts),		NEONMAP2(vpaddlq_v, aarch64_neon_uaddlp, aarch64_neon_saddlp, UnsignedAlts),
NEONMAP1(vpaddq_v, aarch64_neon_addp, Add1ArgType),		NEONMAP1(vpaddq_v, aarch64_neon_addp, Add1ArgType),
NEONMAP1(vqabs_v, aarch64_neon_sqabs, Add1ArgType),		NEONMAP1(vqabs_v, aarch64_neon_sqabs, Add1ArgType),
NEONMAP1(vqabsq_v, aarch64_neon_sqabs, Add1ArgType),		NEONMAP1(vqabsq_v, aarch64_neon_sqabs, Add1ArgType),
NEONMAP2(vqadd_v, aarch64_neon_uqadd, aarch64_neon_sqadd, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqadd_v, aarch64_neon_uqadd, aarch64_neon_sqadd, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqaddq_v, aarch64_neon_uqadd, aarch64_neon_sqadd, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqaddq_v, aarch64_neon_uqadd, aarch64_neon_sqadd, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqdmlal_v, aarch64_neon_sqdmull, aarch64_neon_sqadd, 0),		NEONMAP2(vqdmlal_v, aarch64_neon_sqdmull, aarch64_neon_sqadd, 0),
NEONMAP2(vqdmlsl_v, aarch64_neon_sqdmull, aarch64_neon_sqsub, 0),		NEONMAP2(vqdmlsl_v, aarch64_neon_sqdmull, aarch64_neon_sqsub, 0),
		NEONMAP1(vqdmulh_lane_v, aarch64_neon_sqdmulh_lane, 0),
		NEONMAP1(vqdmulh_laneq_v, aarch64_neon_sqdmulh_laneq, 0),
NEONMAP1(vqdmulh_v, aarch64_neon_sqdmulh, Add1ArgType),		NEONMAP1(vqdmulh_v, aarch64_neon_sqdmulh, Add1ArgType),
		NEONMAP1(vqdmulhq_lane_v, aarch64_neon_sqdmulh_lane, 0),
		NEONMAP1(vqdmulhq_laneq_v, aarch64_neon_sqdmulh_laneq, 0),
NEONMAP1(vqdmulhq_v, aarch64_neon_sqdmulh, Add1ArgType),		NEONMAP1(vqdmulhq_v, aarch64_neon_sqdmulh, Add1ArgType),
NEONMAP1(vqdmull_v, aarch64_neon_sqdmull, Add1ArgType),		NEONMAP1(vqdmull_v, aarch64_neon_sqdmull, Add1ArgType),
NEONMAP2(vqmovn_v, aarch64_neon_uqxtn, aarch64_neon_sqxtn, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqmovn_v, aarch64_neon_uqxtn, aarch64_neon_sqxtn, Add1ArgType \| UnsignedAlts),
NEONMAP1(vqmovun_v, aarch64_neon_sqxtun, Add1ArgType),		NEONMAP1(vqmovun_v, aarch64_neon_sqxtun, Add1ArgType),
NEONMAP1(vqneg_v, aarch64_neon_sqneg, Add1ArgType),		NEONMAP1(vqneg_v, aarch64_neon_sqneg, Add1ArgType),
NEONMAP1(vqnegq_v, aarch64_neon_sqneg, Add1ArgType),		NEONMAP1(vqnegq_v, aarch64_neon_sqneg, Add1ArgType),
		NEONMAP1(vqrdmulh_lane_v, aarch64_neon_sqrdmulh_lane, 0),
		NEONMAP1(vqrdmulh_laneq_v, aarch64_neon_sqrdmulh_laneq, 0),
NEONMAP1(vqrdmulh_v, aarch64_neon_sqrdmulh, Add1ArgType),		NEONMAP1(vqrdmulh_v, aarch64_neon_sqrdmulh, Add1ArgType),
		NEONMAP1(vqrdmulhq_lane_v, aarch64_neon_sqrdmulh_lane, 0),
		NEONMAP1(vqrdmulhq_laneq_v, aarch64_neon_sqrdmulh_laneq, 0),
NEONMAP1(vqrdmulhq_v, aarch64_neon_sqrdmulh, Add1ArgType),		NEONMAP1(vqrdmulhq_v, aarch64_neon_sqrdmulh, Add1ArgType),
NEONMAP2(vqrshl_v, aarch64_neon_uqrshl, aarch64_neon_sqrshl, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqrshl_v, aarch64_neon_uqrshl, aarch64_neon_sqrshl, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqrshlq_v, aarch64_neon_uqrshl, aarch64_neon_sqrshl, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqrshlq_v, aarch64_neon_uqrshl, aarch64_neon_sqrshl, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqshl_n_v, aarch64_neon_uqshl, aarch64_neon_sqshl, UnsignedAlts),		NEONMAP2(vqshl_n_v, aarch64_neon_uqshl, aarch64_neon_sqshl, UnsignedAlts),
NEONMAP2(vqshl_v, aarch64_neon_uqshl, aarch64_neon_sqshl, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqshl_v, aarch64_neon_uqshl, aarch64_neon_sqshl, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqshlq_n_v, aarch64_neon_uqshl, aarch64_neon_sqshl,UnsignedAlts),		NEONMAP2(vqshlq_n_v, aarch64_neon_uqshl, aarch64_neon_sqshl,UnsignedAlts),
NEONMAP2(vqshlq_v, aarch64_neon_uqshl, aarch64_neon_sqshl, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqshlq_v, aarch64_neon_uqshl, aarch64_neon_sqshl, Add1ArgType \| UnsignedAlts),
NEONMAP1(vqshlu_n_v, aarch64_neon_sqshlu, 0),		NEONMAP1(vqshlu_n_v, aarch64_neon_sqshlu, 0),
▲ Show 20 Lines • Show All 761 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
case NEON::BI__builtin_neon_vqdmlal_v:		case NEON::BI__builtin_neon_vqdmlal_v:
case NEON::BI__builtin_neon_vqdmlsl_v: {		case NEON::BI__builtin_neon_vqdmlsl_v: {
SmallVector<Value *, 2> MulOps(Ops.begin() + 1, Ops.end());		SmallVector<Value *, 2> MulOps(Ops.begin() + 1, Ops.end());
Ops[1] =		Ops[1] =
EmitNeonCall(CGM.getIntrinsic(LLVMIntrinsic, Ty), MulOps, "vqdmlal");		EmitNeonCall(CGM.getIntrinsic(LLVMIntrinsic, Ty), MulOps, "vqdmlal");
Ops.resize(2);		Ops.resize(2);
return EmitNeonCall(CGM.getIntrinsic(AltLLVMIntrinsic, Ty), Ops, NameHint);		return EmitNeonCall(CGM.getIntrinsic(AltLLVMIntrinsic, Ty), Ops, NameHint);
}		}
		case NEON::BI__builtin_neon_vqdmulhq_lane_v:
		case NEON::BI__builtin_neon_vqdmulh_lane_v:
		case NEON::BI__builtin_neon_vqrdmulhq_lane_v:
		case NEON::BI__builtin_neon_vqrdmulh_lane_v: {
		llvm::Type *Tys[2] = {
		Ty, GetNeonType(this, NeonTypeFlags(Type.getEltType(), false,
		/isQuad/ false))};
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, NameHint);
		}
		case NEON::BI__builtin_neon_vqdmulhq_laneq_v:
		case NEON::BI__builtin_neon_vqdmulh_laneq_v:
		case NEON::BI__builtin_neon_vqrdmulhq_laneq_v:
		case NEON::BI__builtin_neon_vqrdmulh_laneq_v: {
		llvm::Type *Tys[2] = {
		Ty, GetNeonType(this, NeonTypeFlags(Type.getEltType(), false,
		/isQuad/ true))};
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, NameHint);
		}
case NEON::BI__builtin_neon_vqshl_n_v:		case NEON::BI__builtin_neon_vqshl_n_v:
case NEON::BI__builtin_neon_vqshlq_n_v:		case NEON::BI__builtin_neon_vqshlq_n_v:
return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vqshl_n",		return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vqshl_n",
1, false);		1, false);
case NEON::BI__builtin_neon_vqshlu_n_v:		case NEON::BI__builtin_neon_vqshlu_n_v:
case NEON::BI__builtin_neon_vqshluq_n_v:		case NEON::BI__builtin_neon_vqshluq_n_v:
return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vqshlu_n",		return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vqshlu_n",
1, false);		1, false);
▲ Show 20 Lines • Show All 9,238 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-neon-2velem.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,434 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: ret <2 x i64> [[VQDMULL_V2_I]]			// CHECK-NEXT: ret <2 x i64> [[VQDMULL_V2_I]]
	//			//
	int64x2_t test_vqdmull_high_laneq_s32(int32x4_t a, int32x4_t v) {			int64x2_t test_vqdmull_high_laneq_s32(int32x4_t a, int32x4_t v) {
	return vqdmull_high_laneq_s32(a, v, 3);			return vqdmull_high_laneq_s32(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulh_lane_s16(			// CHECK-LABEL: @test_vqdmulh_lane_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANE_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.lane.v4i16.v4i16(<4 x i16> [[VQDMULH_LANE_V]], <4 x i16> [[VQDMULH_LANE_V1]], i32 3)
				// CHECK-NEXT: ret <4 x i16> [[VQDMULH_LANE_V2]]
	//			//
	int16x4_t test_vqdmulh_lane_s16(int16x4_t a, int16x4_t v) {			int16x4_t test_vqdmulh_lane_s16(int16x4_t a, int16x4_t v) {
	return vqdmulh_lane_s16(a, v, 3);			return vqdmulh_lane_s16(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_lane_s16(			// CHECK-LABEL: @test_vqdmulhq_lane_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANE_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.lane.v8i16.v4i16(<8 x i16> [[VQDMULHQ_LANE_V]], <4 x i16> [[VQDMULHQ_LANE_V1]], i32 3)
				// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_LANE_V2]]
	//			//
	int16x8_t test_vqdmulhq_lane_s16(int16x8_t a, int16x4_t v) {			int16x8_t test_vqdmulhq_lane_s16(int16x8_t a, int16x4_t v) {
	return vqdmulhq_lane_s16(a, v, 3);			return vqdmulhq_lane_s16(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulh_lane_s32(			// CHECK-LABEL: @test_vqdmulh_lane_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <2 x i32> <i32 1, i32 1>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANE_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.lane.v2i32.v2i32(<2 x i32> [[VQDMULH_LANE_V]], <2 x i32> [[VQDMULH_LANE_V1]], i32 1)
				// CHECK-NEXT: ret <2 x i32> [[VQDMULH_LANE_V2]]
	//			//
	int32x2_t test_vqdmulh_lane_s32(int32x2_t a, int32x2_t v) {			int32x2_t test_vqdmulh_lane_s32(int32x2_t a, int32x2_t v) {
	return vqdmulh_lane_s32(a, v, 1);			return vqdmulh_lane_s32(a, v, 1);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_lane_s32(			// CHECK-LABEL: @test_vqdmulhq_lane_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANE_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.lane.v4i32.v2i32(<4 x i32> [[VQDMULHQ_LANE_V]], <2 x i32> [[VQDMULHQ_LANE_V1]], i32 1)
				// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_LANE_V2]]
	//			//
	int32x4_t test_vqdmulhq_lane_s32(int32x4_t a, int32x2_t v) {			int32x4_t test_vqdmulhq_lane_s32(int32x4_t a, int32x2_t v) {
	return vqdmulhq_lane_s32(a, v, 1);			return vqdmulhq_lane_s32(a, v, 1);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_lane_s16(			// CHECK-LABEL: @test_vqrdmulh_lane_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANE_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v4i16.v4i16(<4 x i16> [[VQRDMULH_LANE_V]], <4 x i16> [[VQRDMULH_LANE_V1]], i32 3)
				// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_LANE_V2]]
	//			//
	int16x4_t test_vqrdmulh_lane_s16(int16x4_t a, int16x4_t v) {			int16x4_t test_vqrdmulh_lane_s16(int16x4_t a, int16x4_t v) {
	return vqrdmulh_lane_s16(a, v, 3);			return vqrdmulh_lane_s16(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_lane_s16(			// CHECK-LABEL: @test_vqrdmulhq_lane_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANE_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v8i16.v4i16(<8 x i16> [[VQRDMULHQ_LANE_V]], <4 x i16> [[VQRDMULHQ_LANE_V1]], i32 3)
				// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_LANE_V2]]
	//			//
	int16x8_t test_vqrdmulhq_lane_s16(int16x8_t a, int16x4_t v) {			int16x8_t test_vqrdmulhq_lane_s16(int16x8_t a, int16x4_t v) {
	return vqrdmulhq_lane_s16(a, v, 3);			return vqrdmulhq_lane_s16(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_lane_s32(			// CHECK-LABEL: @test_vqrdmulh_lane_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <2 x i32> <i32 1, i32 1>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANE_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32.v2i32(<2 x i32> [[VQRDMULH_LANE_V]], <2 x i32> [[VQRDMULH_LANE_V1]], i32 1)
				// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_LANE_V2]]
	//			//
	int32x2_t test_vqrdmulh_lane_s32(int32x2_t a, int32x2_t v) {			int32x2_t test_vqrdmulh_lane_s32(int32x2_t a, int32x2_t v) {
	return vqrdmulh_lane_s32(a, v, 1);			return vqrdmulh_lane_s32(a, v, 1);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_lane_s32(			// CHECK-LABEL: @test_vqrdmulhq_lane_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANE_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32.v2i32(<4 x i32> [[VQRDMULHQ_LANE_V]], <2 x i32> [[VQRDMULHQ_LANE_V1]], i32 1)
				// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_LANE_V2]]
	//			//
	int32x4_t test_vqrdmulhq_lane_s32(int32x4_t a, int32x2_t v) {			int32x4_t test_vqrdmulhq_lane_s32(int32x4_t a, int32x2_t v) {
	return vqrdmulhq_lane_s32(a, v, 1);			return vqrdmulhq_lane_s32(a, v, 1);
	}			}

	// CHECK-LABEL: @test_vmul_lane_f32(			// CHECK-LABEL: @test_vmul_lane_f32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x float> [[V:%.]], <2 x float> [[V]], <2 x i32> <i32 1, i32 1>			// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x float> [[V:%.]], <2 x float> [[V]], <2 x i32> <i32 1, i32 1>
	▲ Show 20 Lines • Show All 1,513 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: ret <2 x i64> [[VQDMULL_V2_I]]			// CHECK-NEXT: ret <2 x i64> [[VQDMULL_V2_I]]
	//			//
	int64x2_t test_vqdmull_high_laneq_s32_0(int32x4_t a, int32x4_t v) {			int64x2_t test_vqdmull_high_laneq_s32_0(int32x4_t a, int32x4_t v) {
	return vqdmull_high_laneq_s32(a, v, 0);			return vqdmull_high_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulh_lane_s16_0(			// CHECK-LABEL: @test_vqdmulh_lane_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANE_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.lane.v4i16.v4i16(<4 x i16> [[VQDMULH_LANE_V]], <4 x i16> [[VQDMULH_LANE_V1]], i32 0)
				// CHECK-NEXT: ret <4 x i16> [[VQDMULH_LANE_V2]]
	//			//
	int16x4_t test_vqdmulh_lane_s16_0(int16x4_t a, int16x4_t v) {			int16x4_t test_vqdmulh_lane_s16_0(int16x4_t a, int16x4_t v) {
	return vqdmulh_lane_s16(a, v, 0);			return vqdmulh_lane_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_lane_s16_0(			// CHECK-LABEL: @test_vqdmulhq_lane_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <8 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANE_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.lane.v8i16.v4i16(<8 x i16> [[VQDMULHQ_LANE_V]], <4 x i16> [[VQDMULHQ_LANE_V1]], i32 0)
				// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_LANE_V2]]
	//			//
	int16x8_t test_vqdmulhq_lane_s16_0(int16x8_t a, int16x4_t v) {			int16x8_t test_vqdmulhq_lane_s16_0(int16x8_t a, int16x4_t v) {
	return vqdmulhq_lane_s16(a, v, 0);			return vqdmulhq_lane_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulh_lane_s32_0(			// CHECK-LABEL: @test_vqdmulh_lane_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <2 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANE_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.lane.v2i32.v2i32(<2 x i32> [[VQDMULH_LANE_V]], <2 x i32> [[VQDMULH_LANE_V1]], i32 0)
				// CHECK-NEXT: ret <2 x i32> [[VQDMULH_LANE_V2]]
	//			//
	int32x2_t test_vqdmulh_lane_s32_0(int32x2_t a, int32x2_t v) {			int32x2_t test_vqdmulh_lane_s32_0(int32x2_t a, int32x2_t v) {
	return vqdmulh_lane_s32(a, v, 0);			return vqdmulh_lane_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_lane_s32_0(			// CHECK-LABEL: @test_vqdmulhq_lane_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANE_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.lane.v4i32.v2i32(<4 x i32> [[VQDMULHQ_LANE_V]], <2 x i32> [[VQDMULHQ_LANE_V1]], i32 0)
				// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_LANE_V2]]
	//			//
	int32x4_t test_vqdmulhq_lane_s32_0(int32x4_t a, int32x2_t v) {			int32x4_t test_vqdmulhq_lane_s32_0(int32x4_t a, int32x2_t v) {
	return vqdmulhq_lane_s32(a, v, 0);			return vqdmulhq_lane_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_lane_s16_0(			// CHECK-LABEL: @test_vqrdmulh_lane_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANE_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v4i16.v4i16(<4 x i16> [[VQRDMULH_LANE_V]], <4 x i16> [[VQRDMULH_LANE_V1]], i32 0)
				// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_LANE_V2]]
	//			//
	int16x4_t test_vqrdmulh_lane_s16_0(int16x4_t a, int16x4_t v) {			int16x4_t test_vqrdmulh_lane_s16_0(int16x4_t a, int16x4_t v) {
	return vqrdmulh_lane_s16(a, v, 0);			return vqrdmulh_lane_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_lane_s16_0(			// CHECK-LABEL: @test_vqrdmulhq_lane_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <8 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANE_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v8i16.v4i16(<8 x i16> [[VQRDMULHQ_LANE_V]], <4 x i16> [[VQRDMULHQ_LANE_V1]], i32 0)
				// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_LANE_V2]]
	//			//
	int16x8_t test_vqrdmulhq_lane_s16_0(int16x8_t a, int16x4_t v) {			int16x8_t test_vqrdmulhq_lane_s16_0(int16x8_t a, int16x4_t v) {
	return vqrdmulhq_lane_s16(a, v, 0);			return vqrdmulhq_lane_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_lane_s32_0(			// CHECK-LABEL: @test_vqrdmulh_lane_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <2 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANE_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32.v2i32(<2 x i32> [[VQRDMULH_LANE_V]], <2 x i32> [[VQRDMULH_LANE_V1]], i32 0)
				// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_LANE_V2]]
	//			//
	int32x2_t test_vqrdmulh_lane_s32_0(int32x2_t a, int32x2_t v) {			int32x2_t test_vqrdmulh_lane_s32_0(int32x2_t a, int32x2_t v) {
	return vqrdmulh_lane_s32(a, v, 0);			return vqrdmulh_lane_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_lane_s32_0(			// CHECK-LABEL: @test_vqrdmulhq_lane_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANE_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32.v2i32(<4 x i32> [[VQRDMULHQ_LANE_V]], <2 x i32> [[VQRDMULHQ_LANE_V1]], i32 0)
				// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_LANE_V2]]
	//			//
	int32x4_t test_vqrdmulhq_lane_s32_0(int32x4_t a, int32x2_t v) {			int32x4_t test_vqrdmulhq_lane_s32_0(int32x4_t a, int32x2_t v) {
	return vqrdmulhq_lane_s32(a, v, 0);			return vqrdmulhq_lane_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vmul_lane_f32_0(			// CHECK-LABEL: @test_vmul_lane_f32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x float> [[V:%.]], <2 x float> [[V]], <2 x i32> zeroinitializer			// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x float> [[V:%.]], <2 x float> [[V]], <2 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 1,574 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: ret <2 x i64> [[VQDMLSL_V3_I]]			// CHECK-NEXT: ret <2 x i64> [[VQDMLSL_V3_I]]
	//			//
	int64x2_t test_vqdmlsl_high_laneq_s32_0(int64x2_t a, int32x4_t b, int32x4_t v) {			int64x2_t test_vqdmlsl_high_laneq_s32_0(int64x2_t a, int32x4_t b, int32x4_t v) {
	return vqdmlsl_high_laneq_s32(a, b, v, 0);			return vqdmlsl_high_laneq_s32(a, b, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulh_laneq_s16_0(			// CHECK-LABEL: @test_vqdmulh_laneq_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANEQ_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16.v8i16(<4 x i16> [[VQDMULH_LANEQ_V]], <8 x i16> [[VQDMULH_LANEQ_V1]], i32 0)
				// CHECK-NEXT: ret <4 x i16> [[VQDMULH_LANEQ_V2]]
	//			//
	int16x4_t test_vqdmulh_laneq_s16_0(int16x4_t a, int16x8_t v) {			int16x4_t test_vqdmulh_laneq_s16_0(int16x4_t a, int16x8_t v) {
	return vqdmulh_laneq_s16(a, v, 0);			return vqdmulh_laneq_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_laneq_s16_0(			// CHECK-LABEL: @test_vqdmulhq_laneq_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <8 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16.v8i16(<8 x i16> [[VQDMULHQ_LANEQ_V]], <8 x i16> [[VQDMULHQ_LANEQ_V1]], i32 0)
				// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_LANEQ_V2]]
	//			//
	int16x8_t test_vqdmulhq_laneq_s16_0(int16x8_t a, int16x8_t v) {			int16x8_t test_vqdmulhq_laneq_s16_0(int16x8_t a, int16x8_t v) {
	return vqdmulhq_laneq_s16(a, v, 0);			return vqdmulhq_laneq_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulh_laneq_s32_0(			// CHECK-LABEL: @test_vqdmulh_laneq_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <2 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANEQ_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32.v4i32(<2 x i32> [[VQDMULH_LANEQ_V]], <4 x i32> [[VQDMULH_LANEQ_V1]], i32 0)
				// CHECK-NEXT: ret <2 x i32> [[VQDMULH_LANEQ_V2]]
	//			//
	int32x2_t test_vqdmulh_laneq_s32_0(int32x2_t a, int32x4_t v) {			int32x2_t test_vqdmulh_laneq_s32_0(int32x2_t a, int32x4_t v) {
	return vqdmulh_laneq_s32(a, v, 0);			return vqdmulh_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_laneq_s32_0(			// CHECK-LABEL: @test_vqdmulhq_laneq_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32.v4i32(<4 x i32> [[VQDMULHQ_LANEQ_V]], <4 x i32> [[VQDMULHQ_LANEQ_V1]], i32 0)
				// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_LANEQ_V2]]
	//			//
	int32x4_t test_vqdmulhq_laneq_s32_0(int32x4_t a, int32x4_t v) {			int32x4_t test_vqdmulhq_laneq_s32_0(int32x4_t a, int32x4_t v) {
	return vqdmulhq_laneq_s32(a, v, 0);			return vqdmulhq_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_laneq_s16_0(			// CHECK-LABEL: @test_vqrdmulh_laneq_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANEQ_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16.v8i16(<4 x i16> [[VQRDMULH_LANEQ_V]], <8 x i16> [[VQRDMULH_LANEQ_V1]], i32 0)
				// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_LANEQ_V2]]
	//			//
	int16x4_t test_vqrdmulh_laneq_s16_0(int16x4_t a, int16x8_t v) {			int16x4_t test_vqrdmulh_laneq_s16_0(int16x4_t a, int16x8_t v) {
	return vqrdmulh_laneq_s16(a, v, 0);			return vqrdmulh_laneq_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_laneq_s16_0(			// CHECK-LABEL: @test_vqrdmulhq_laneq_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <8 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16.v8i16(<8 x i16> [[VQRDMULHQ_LANEQ_V]], <8 x i16> [[VQRDMULHQ_LANEQ_V1]], i32 0)
				// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_LANEQ_V2]]
	//			//
	int16x8_t test_vqrdmulhq_laneq_s16_0(int16x8_t a, int16x8_t v) {			int16x8_t test_vqrdmulhq_laneq_s16_0(int16x8_t a, int16x8_t v) {
	return vqrdmulhq_laneq_s16(a, v, 0);			return vqrdmulhq_laneq_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_laneq_s32_0(			// CHECK-LABEL: @test_vqrdmulh_laneq_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <2 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANEQ_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32.v4i32(<2 x i32> [[VQRDMULH_LANEQ_V]], <4 x i32> [[VQRDMULH_LANEQ_V1]], i32 0)
				// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_LANEQ_V2]]
	//			//
	int32x2_t test_vqrdmulh_laneq_s32_0(int32x2_t a, int32x4_t v) {			int32x2_t test_vqrdmulh_laneq_s32_0(int32x2_t a, int32x4_t v) {
	return vqrdmulh_laneq_s32(a, v, 0);			return vqrdmulh_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_laneq_s32_0(			// CHECK-LABEL: @test_vqrdmulhq_laneq_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32.v4i32(<4 x i32> [[VQRDMULHQ_LANEQ_V]], <4 x i32> [[VQRDMULHQ_LANEQ_V1]], i32 0)
				// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_LANEQ_V2]]
	//			//
	int32x4_t test_vqrdmulhq_laneq_s32_0(int32x4_t a, int32x4_t v) {			int32x4_t test_vqrdmulhq_laneq_s32_0(int32x4_t a, int32x4_t v) {
	return vqrdmulhq_laneq_s32(a, v, 0);			return vqrdmulhq_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vmla_lane_u16(			// CHECK-LABEL: @test_vmla_lane_u16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>			// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: ret <2 x i64> [[VQDMLSL_V3_I]]			// CHECK-NEXT: ret <2 x i64> [[VQDMLSL_V3_I]]
	//			//
	int64x2_t test_vqdmlsl_high_laneq_s32(int64x2_t a, int32x4_t b, int32x4_t v) {			int64x2_t test_vqdmlsl_high_laneq_s32(int64x2_t a, int32x4_t b, int32x4_t v) {
	return vqdmlsl_high_laneq_s32(a, b, v, 3);			return vqdmlsl_high_laneq_s32(a, b, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulh_laneq_s16(			// CHECK-LABEL: @test_vqdmulh_laneq_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <4 x i32> <i32 7, i32 7, i32 7, i32 7>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANEQ_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16.v8i16(<4 x i16> [[VQDMULH_LANEQ_V]], <8 x i16> [[VQDMULH_LANEQ_V1]], i32 7)
				// CHECK-NEXT: ret <4 x i16> [[VQDMULH_LANEQ_V2]]
	//			//
	int16x4_t test_vqdmulh_laneq_s16(int16x4_t a, int16x8_t v) {			int16x4_t test_vqdmulh_laneq_s16(int16x4_t a, int16x8_t v) {
	return vqdmulh_laneq_s16(a, v, 7);			return vqdmulh_laneq_s16(a, v, 7);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_laneq_s16(			// CHECK-LABEL: @test_vqdmulhq_laneq_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16.v8i16(<8 x i16> [[VQDMULHQ_LANEQ_V]], <8 x i16> [[VQDMULHQ_LANEQ_V1]], i32 7)
				// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_LANEQ_V2]]
	//			//
	int16x8_t test_vqdmulhq_laneq_s16(int16x8_t a, int16x8_t v) {			int16x8_t test_vqdmulhq_laneq_s16(int16x8_t a, int16x8_t v) {
	return vqdmulhq_laneq_s16(a, v, 7);			return vqdmulhq_laneq_s16(a, v, 7);
	}			}

	// CHECK-LABEL: @test_vqdmulh_laneq_s32(			// CHECK-LABEL: @test_vqdmulh_laneq_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <2 x i32> <i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANEQ_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32.v4i32(<2 x i32> [[VQDMULH_LANEQ_V]], <4 x i32> [[VQDMULH_LANEQ_V1]], i32 3)
				// CHECK-NEXT: ret <2 x i32> [[VQDMULH_LANEQ_V2]]
	//			//
	int32x2_t test_vqdmulh_laneq_s32(int32x2_t a, int32x4_t v) {			int32x2_t test_vqdmulh_laneq_s32(int32x2_t a, int32x4_t v) {
	return vqdmulh_laneq_s32(a, v, 3);			return vqdmulh_laneq_s32(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_laneq_s32(			// CHECK-LABEL: @test_vqdmulhq_laneq_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32.v4i32(<4 x i32> [[VQDMULHQ_LANEQ_V]], <4 x i32> [[VQDMULHQ_LANEQ_V1]], i32 3)
				// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_LANEQ_V2]]
	//			//
	int32x4_t test_vqdmulhq_laneq_s32(int32x4_t a, int32x4_t v) {			int32x4_t test_vqdmulhq_laneq_s32(int32x4_t a, int32x4_t v) {
	return vqdmulhq_laneq_s32(a, v, 3);			return vqdmulhq_laneq_s32(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_laneq_s16(			// CHECK-LABEL: @test_vqrdmulh_laneq_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <4 x i32> <i32 7, i32 7, i32 7, i32 7>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANEQ_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16.v8i16(<4 x i16> [[VQRDMULH_LANEQ_V]], <8 x i16> [[VQRDMULH_LANEQ_V1]], i32 7)
				// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_LANEQ_V2]]
	//			//
	int16x4_t test_vqrdmulh_laneq_s16(int16x4_t a, int16x8_t v) {			int16x4_t test_vqrdmulh_laneq_s16(int16x4_t a, int16x8_t v) {
	return vqrdmulh_laneq_s16(a, v, 7);			return vqrdmulh_laneq_s16(a, v, 7);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_laneq_s16(			// CHECK-LABEL: @test_vqrdmulhq_laneq_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16.v8i16(<8 x i16> [[VQRDMULHQ_LANEQ_V]], <8 x i16> [[VQRDMULHQ_LANEQ_V1]], i32 7)
				// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_LANEQ_V2]]
	//			//
	int16x8_t test_vqrdmulhq_laneq_s16(int16x8_t a, int16x8_t v) {			int16x8_t test_vqrdmulhq_laneq_s16(int16x8_t a, int16x8_t v) {
	return vqrdmulhq_laneq_s16(a, v, 7);			return vqrdmulhq_laneq_s16(a, v, 7);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_laneq_s32(			// CHECK-LABEL: @test_vqrdmulh_laneq_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <2 x i32> <i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANEQ_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32.v4i32(<2 x i32> [[VQRDMULH_LANEQ_V]], <4 x i32> [[VQRDMULH_LANEQ_V1]], i32 3)
				// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_LANEQ_V2]]
	//			//
	int32x2_t test_vqrdmulh_laneq_s32(int32x2_t a, int32x4_t v) {			int32x2_t test_vqrdmulh_laneq_s32(int32x2_t a, int32x4_t v) {
	return vqrdmulh_laneq_s32(a, v, 3);			return vqrdmulh_laneq_s32(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_laneq_s32(			// CHECK-LABEL: @test_vqrdmulhq_laneq_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32.v4i32(<4 x i32> [[VQRDMULHQ_LANEQ_V]], <4 x i32> [[VQRDMULHQ_LANEQ_V1]], i32 3)
				// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_LANEQ_V2]]
	//			//
	int32x4_t test_vqrdmulhq_laneq_s32(int32x4_t a, int32x4_t v) {			int32x4_t test_vqrdmulhq_laneq_s32(int32x4_t a, int32x4_t v) {
	return vqrdmulhq_laneq_s32(a, v, 3);			return vqrdmulhq_laneq_s32(a, v, 3);
	}			}

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".
class AdvSIMD_2VectorArg_Scalar_Wide_Intrinsic		class AdvSIMD_2VectorArg_Scalar_Wide_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMTruncatedType<0>, llvm_i32_ty],		[LLVMTruncatedType<0>, llvm_i32_ty],
[IntrNoMem]>;		[IntrNoMem]>;
class AdvSIMD_2VectorArg_Tied_Narrow_Intrinsic		class AdvSIMD_2VectorArg_Tied_Narrow_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMHalfElementsVectorType<0>, llvm_anyvector_ty],		[LLVMHalfElementsVectorType<0>, llvm_anyvector_ty],
[IntrNoMem]>;		[IntrNoMem]>;
		class AdvSIMD_2VectorArg_Lane_Intrinsic
		: Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>, llvm_anyint_ty, llvm_i32_ty],
		[IntrNoMem]>;

class AdvSIMD_3VectorArg_Intrinsic		class AdvSIMD_3VectorArg_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],		[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
[IntrNoMem]>;		[IntrNoMem]>;
class AdvSIMD_3VectorArg_Scalar_Intrinsic		class AdvSIMD_3VectorArg_Scalar_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],		[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	let TargetPrefix = "aarch64", IntrProperties = [IntrNoMem] in {
// header is no longer supported.		// header is no longer supported.
def int_aarch64_neon_addhn : AdvSIMD_2VectorArg_Narrow_Intrinsic;		def int_aarch64_neon_addhn : AdvSIMD_2VectorArg_Narrow_Intrinsic;

// Vector Rounding Add High-Half		// Vector Rounding Add High-Half
def int_aarch64_neon_raddhn : AdvSIMD_2VectorArg_Narrow_Intrinsic;		def int_aarch64_neon_raddhn : AdvSIMD_2VectorArg_Narrow_Intrinsic;

// Vector Saturating Doubling Multiply High		// Vector Saturating Doubling Multiply High
def int_aarch64_neon_sqdmulh : AdvSIMD_2IntArg_Intrinsic;		def int_aarch64_neon_sqdmulh : AdvSIMD_2IntArg_Intrinsic;
		def int_aarch64_neon_sqdmulh_lane : AdvSIMD_2VectorArg_Lane_Intrinsic;
		def int_aarch64_neon_sqdmulh_laneq : AdvSIMD_2VectorArg_Lane_Intrinsic;

// Vector Saturating Rounding Doubling Multiply High		// Vector Saturating Rounding Doubling Multiply High
def int_aarch64_neon_sqrdmulh : AdvSIMD_2IntArg_Intrinsic;		def int_aarch64_neon_sqrdmulh : AdvSIMD_2IntArg_Intrinsic;
		def int_aarch64_neon_sqrdmulh_lane : AdvSIMD_2VectorArg_Lane_Intrinsic;
		def int_aarch64_neon_sqrdmulh_laneq : AdvSIMD_2VectorArg_Lane_Intrinsic;

// Vector Polynominal Multiply		// Vector Polynominal Multiply
def int_aarch64_neon_pmul : AdvSIMD_2VectorArg_Intrinsic;		def int_aarch64_neon_pmul : AdvSIMD_2VectorArg_Intrinsic;

// Vector Long Multiply		// Vector Long Multiply
def int_aarch64_neon_smull : AdvSIMD_2VectorArg_Long_Intrinsic;		def int_aarch64_neon_smull : AdvSIMD_2VectorArg_Long_Intrinsic;
def int_aarch64_neon_umull : AdvSIMD_2VectorArg_Long_Intrinsic;		def int_aarch64_neon_umull : AdvSIMD_2VectorArg_Long_Intrinsic;
def int_aarch64_neon_pmull : AdvSIMD_2VectorArg_Long_Intrinsic;		def int_aarch64_neon_pmull : AdvSIMD_2VectorArg_Long_Intrinsic;
▲ Show 20 Lines • Show All 1,575 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 354 Lines • ▼ Show 20 Lines
def am_indexed7s16 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S16", []>;		def am_indexed7s16 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S16", []>;
def am_indexed7s32 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S32", []>;		def am_indexed7s32 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S32", []>;
def am_indexed7s64 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S64", []>;		def am_indexed7s64 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S64", []>;
def am_indexed7s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S128", []>;		def am_indexed7s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S128", []>;

def am_indexedu6s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexedU6S128", []>;		def am_indexedu6s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexedU6S128", []>;
def am_indexeds9s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexedS9S128", []>;		def am_indexeds9s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexedS9S128", []>;

		def UImmS1XForm : SDNodeXForm<imm, [{
		return CurDAG->getTargetConstant(N->getZExtValue(), SDLoc(N), MVT::i64);
		}]>;
def UImmS2XForm : SDNodeXForm<imm, [{		def UImmS2XForm : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() / 2, SDLoc(N), MVT::i64);		return CurDAG->getTargetConstant(N->getZExtValue() / 2, SDLoc(N), MVT::i64);
}]>;		}]>;
def UImmS4XForm : SDNodeXForm<imm, [{		def UImmS4XForm : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() / 4, SDLoc(N), MVT::i64);		return CurDAG->getTargetConstant(N->getZExtValue() / 4, SDLoc(N), MVT::i64);
}]>;		}]>;
def UImmS8XForm : SDNodeXForm<imm, [{		def UImmS8XForm : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() / 8, SDLoc(N), MVT::i64);		return CurDAG->getTargetConstant(N->getZExtValue() / 8, SDLoc(N), MVT::i64);
▲ Show 20 Lines • Show All 7,592 Lines • ▼ Show 20 Lines	def v1i64_indexed : BaseSIMDIndexedTied<1, U, 1, 0b11, opc,
FPR64Op, FPR64Op, V128, VectorIndexD,		FPR64Op, FPR64Op, V128, VectorIndexD,
asm, ".d", "", "", ".d", []> {		asm, ".d", "", "", ".d", []> {
bits<1> idx;		bits<1> idx;
let Inst{11} = idx{0};		let Inst{11} = idx{0};
let Inst{21} = 0;		let Inst{21} = 0;
}		}
}		}

		multiclass SIMDIndexedHSPatterns<SDPatternOperator OpNodeLane,
		SDPatternOperator OpNodeLaneQ> {

		def : Pat<(v4i16 (OpNodeLane
		(v4i16 V64:$Rn), (v4i16 V64_lo:$Rm),
		VectorIndexS32b:$idx)),
		(!cast<Instruction>(NAME # v4i16_indexed) $Rn,
		(SUBREG_TO_REG (i32 0), (v4i16 V64_lo:$Rm), dsub),
		(UImmS1XForm $idx))>;

		def : Pat<(v4i16 (OpNodeLaneQ
		(v4i16 V64:$Rn), (v8i16 V128_lo:$Rm),
		VectorIndexH32b:$idx)),
		(!cast<Instruction>(NAME # v4i16_indexed) $Rn, $Rm,
		(UImmS1XForm $idx))>;

		def : Pat<(v8i16 (OpNodeLane
		(v8i16 V128:$Rn), (v4i16 V64_lo:$Rm),
		VectorIndexS32b:$idx)),
		(!cast<Instruction>(NAME # v8i16_indexed) $Rn,
		(SUBREG_TO_REG (i32 0), $Rm, dsub),
		(UImmS1XForm $idx))>;

		def : Pat<(v8i16 (OpNodeLaneQ
		(v8i16 V128:$Rn), (v8i16 V128_lo:$Rm),
		VectorIndexH32b:$idx)),
		(!cast<Instruction>(NAME # v8i16_indexed) $Rn, $Rm,
		(UImmS1XForm $idx))>;

		def : Pat<(v2i32 (OpNodeLane
		(v2i32 V64:$Rn), (v2i32 V64:$Rm),
		VectorIndexD32b:$idx)),
		(!cast<Instruction>(NAME # v2i32_indexed) $Rn,
		(SUBREG_TO_REG (i32 0), (v2i32 V64_lo:$Rm), dsub),
		(UImmS1XForm $idx))>;

		def : Pat<(v2i32 (OpNodeLaneQ
		(v2i32 V64:$Rn), (v4i32 V128:$Rm),
		VectorIndexS32b:$idx)),
		(!cast<Instruction>(NAME # v2i32_indexed) $Rn, $Rm,
		(UImmS1XForm $idx))>;

		def : Pat<(v4i32 (OpNodeLane
		(v4i32 V128:$Rn), (v2i32 V64:$Rm),
		VectorIndexD32b:$idx)),
		(!cast<Instruction>(NAME # v4i32_indexed) $Rn,
		(SUBREG_TO_REG (i32 0), $Rm, dsub),
		(UImmS1XForm $idx))>;

		def : Pat<(v4i32 (OpNodeLaneQ
		(v4i32 V128:$Rn),
		(v4i32 V128:$Rm),
		VectorIndexS32b:$idx)),
		(!cast<Instruction>(NAME # v4i32_indexed) $Rn, $Rm,
		(UImmS1XForm $idx))>;

		}

multiclass SIMDIndexedHS<bit U, bits<4> opc, string asm,		multiclass SIMDIndexedHS<bit U, bits<4> opc, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
def v4i16_indexed : BaseSIMDIndexed<0, U, 0, 0b01, opc, V64, V64,		def v4i16_indexed : BaseSIMDIndexed<0, U, 0, 0b01, opc, V64, V64,
V128_lo, VectorIndexH,		V128_lo, VectorIndexH,
asm, ".4h", ".4h", ".4h", ".h",		asm, ".4h", ".4h", ".4h", ".h",
[(set (v4i16 V64:$Rd),		[(set (v4i16 V64:$Rd),
(OpNode (v4i16 V64:$Rn),		(OpNode (v4i16 V64:$Rn),
(v4i16 (AArch64duplane16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx))))]> {		(v4i16 (AArch64duplane16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx))))]> {
▲ Show 20 Lines • Show All 2,905 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,625 Lines • ▼ Show 20 Lines
	def : Pat<(v2f64 (fmul V128:$Rn, (AArch64dup (f64 FPR64:$Rm)))),			def : Pat<(v2f64 (fmul V128:$Rn, (AArch64dup (f64 FPR64:$Rm)))),
	(FMULv2i64_indexed V128:$Rn,			(FMULv2i64_indexed V128:$Rn,
	(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR64:$Rm, dsub),			(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR64:$Rm, dsub),
	(i64 0))>;			(i64 0))>;

	defm SQDMULH : SIMDIndexedHS<0, 0b1100, "sqdmulh", int_aarch64_neon_sqdmulh>;			defm SQDMULH : SIMDIndexedHS<0, 0b1100, "sqdmulh", int_aarch64_neon_sqdmulh>;
	defm SQRDMULH : SIMDIndexedHS<0, 0b1101, "sqrdmulh", int_aarch64_neon_sqrdmulh>;			defm SQRDMULH : SIMDIndexedHS<0, 0b1101, "sqrdmulh", int_aarch64_neon_sqrdmulh>;

				defm SQDMULH : SIMDIndexedHSPatterns<int_aarch64_neon_sqdmulh_lane,
				int_aarch64_neon_sqdmulh_laneq>;
				defm SQRDMULH : SIMDIndexedHSPatterns<int_aarch64_neon_sqrdmulh_lane,
				int_aarch64_neon_sqrdmulh_laneq>;

	// Generated by MachineCombine			// Generated by MachineCombine
	defm MLA : SIMDVectorIndexedHSTied<1, 0b0000, "mla", null_frag>;			defm MLA : SIMDVectorIndexedHSTied<1, 0b0000, "mla", null_frag>;
	defm MLS : SIMDVectorIndexedHSTied<1, 0b0100, "mls", null_frag>;			defm MLS : SIMDVectorIndexedHSTied<1, 0b0100, "mls", null_frag>;

	defm MUL : SIMDVectorIndexedHS<0, 0b1000, "mul", mul>;			defm MUL : SIMDVectorIndexedHS<0, 0b1000, "mul", mul>;
	defm SMLAL : SIMDVectorIndexedLongSDTied<0, 0b0010, "smlal",			defm SMLAL : SIMDVectorIndexedLongSDTied<0, 0b0010, "smlal",
	TriOpFrag<(add node:$LHS, (int_aarch64_neon_smull node:$MHS, node:$RHS))>>;			TriOpFrag<(add node:$LHS, (int_aarch64_neon_smull node:$MHS, node:$RHS))>>;
	defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",			defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",
	▲ Show 20 Lines • Show All 1,696 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64RegisterBankInfo.cpp

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	const RegisterBank &			const RegisterBank &
	AArch64RegisterBankInfo::getRegBankFromRegClass(const TargetRegisterClass &RC,			AArch64RegisterBankInfo::getRegBankFromRegClass(const TargetRegisterClass &RC,
	LLT) const {			LLT) const {
	switch (RC.getID()) {			switch (RC.getID()) {
	case AArch64::FPR8RegClassID:			case AArch64::FPR8RegClassID:
	case AArch64::FPR16RegClassID:			case AArch64::FPR16RegClassID:
	case AArch64::FPR32RegClassID:			case AArch64::FPR32RegClassID:
	case AArch64::FPR64RegClassID:			case AArch64::FPR64RegClassID:
				case AArch64::FPR64_loRegClassID:
	case AArch64::FPR128RegClassID:			case AArch64::FPR128RegClassID:
	case AArch64::FPR128_loRegClassID:			case AArch64::FPR128_loRegClassID:
	case AArch64::DDRegClassID:			case AArch64::DDRegClassID:
	case AArch64::DDDRegClassID:			case AArch64::DDDRegClassID:
	case AArch64::DDDDRegClassID:			case AArch64::DDDDRegClassID:
	case AArch64::QQRegClassID:			case AArch64::QQRegClassID:
	case AArch64::QQQRegClassID:			case AArch64::QQQRegClassID:
	case AArch64::QQQQRegClassID:			case AArch64::QQQQRegClassID:
	▲ Show 20 Lines • Show All 612 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp

Show First 20 Lines • Show All 590 Lines • ▼ Show 20 Lines	unsigned AArch64RegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
case AArch64::DDDRegClassID:		case AArch64::DDDRegClassID:
case AArch64::DDDDRegClassID:		case AArch64::DDDDRegClassID:
case AArch64::QQRegClassID:		case AArch64::QQRegClassID:
case AArch64::QQQRegClassID:		case AArch64::QQQRegClassID:
case AArch64::QQQQRegClassID:		case AArch64::QQQQRegClassID:
return 32;		return 32;

case AArch64::FPR128_loRegClassID:		case AArch64::FPR128_loRegClassID:
		case AArch64::FPR64_loRegClassID:
return 16;		return 16;
}		}
}		}

unsigned AArch64RegisterInfo::getLocalAddressRegister(		unsigned AArch64RegisterInfo::getLocalAddressRegister(
const MachineFunction &MF) const {		const MachineFunction &MF) const {
const auto &MFI = MF.getFrameInfo();		const auto &MFI = MF.getFrameInfo();
if (!MF.hasEHFunclets() && !MFI.hasVarSizedObjects())		if (!MF.hasEHFunclets() && !MFI.hasVarSizedObjects())
return AArch64::SP;		return AArch64::SP;
else if (needsStackRealignment(MF))		else if (needsStackRealignment(MF))
return getBaseRegister();		return getBaseRegister();
return getFrameRegister(MF);		return getFrameRegister(MF);
}		}

llvm/lib/Target/AArch64/AArch64RegisterInfo.td

	Show First 20 Lines • Show All 423 Lines • ▼ Show 20 Lines
	}			}
	def FPR16 : RegisterClass<"AArch64", [f16], 16, (sequence "H%u", 0, 31)> {			def FPR16 : RegisterClass<"AArch64", [f16], 16, (sequence "H%u", 0, 31)> {
	let Size = 16;			let Size = 16;
	}			}
	def FPR32 : RegisterClass<"AArch64", [f32, i32], 32,(sequence "S%u", 0, 31)>;			def FPR32 : RegisterClass<"AArch64", [f32, i32], 32,(sequence "S%u", 0, 31)>;
	def FPR64 : RegisterClass<"AArch64", [f64, i64, v2f32, v1f64, v8i8, v4i16, v2i32,			def FPR64 : RegisterClass<"AArch64", [f64, i64, v2f32, v1f64, v8i8, v4i16, v2i32,
	v1i64, v4f16],			v1i64, v4f16],
	64, (sequence "D%u", 0, 31)>;			64, (sequence "D%u", 0, 31)>;
				def FPR64_lo : RegisterClass<"AArch64",
				[v8i8, v4i16, v2i32, v1i64, v4f16, v2f32, v1f64],
				64, (trunc FPR64, 16)>;

	// We don't (yet) have an f128 legal type, so don't use that here. We			// We don't (yet) have an f128 legal type, so don't use that here. We
	// normalize 128-bit vectors to v2f64 for arg passing and such, so use			// normalize 128-bit vectors to v2f64 for arg passing and such, so use
	// that here.			// that here.
	def FPR128 : RegisterClass<"AArch64",			def FPR128 : RegisterClass<"AArch64",
	[v16i8, v8i16, v4i32, v2i64, v4f32, v2f64, f128,			[v16i8, v8i16, v4i32, v2i64, v4f32, v2f64, f128,
	v8f16],			v8f16],
	128, (sequence "Q%u", 0, 31)>;			128, (sequence "Q%u", 0, 31)>;

	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	def V128 : RegisterOperand<FPR128, "printVRegOperand"> {			def V128 : RegisterOperand<FPR128, "printVRegOperand"> {
	let ParserMatchClass = VectorReg128AsmOperand;			let ParserMatchClass = VectorReg128AsmOperand;
	}			}

	def VectorRegLoAsmOperand : AsmOperandClass {			def VectorRegLoAsmOperand : AsmOperandClass {
	let Name = "VectorRegLo";			let Name = "VectorRegLo";
	let PredicateMethod = "isNeonVectorRegLo";			let PredicateMethod = "isNeonVectorRegLo";
	}			}
				def V64_lo : RegisterOperand<FPR64_lo, "printVRegOperand"> {
				let ParserMatchClass = VectorRegLoAsmOperand;
				}
	def V128_lo : RegisterOperand<FPR128_lo, "printVRegOperand"> {			def V128_lo : RegisterOperand<FPR128_lo, "printVRegOperand"> {
	let ParserMatchClass = VectorRegLoAsmOperand;			let ParserMatchClass = VectorRegLoAsmOperand;
	}			}

	class TypedVecListAsmOperand<int count, string vecty, int lanes, int eltsize>			class TypedVecListAsmOperand<int count, string vecty, int lanes, int eltsize>
	: AsmOperandClass {			: AsmOperandClass {
	let Name = "TypedVectorList" # count # "_" # lanes # eltsize;			let Name = "TypedVectorList" # count # "_" # lanes # eltsize;

	▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp

Show First 20 Lines • Show All 1,027 Lines • ▼ Show 20 Lines	public:
}		}

bool isNeonVectorReg() const {		bool isNeonVectorReg() const {
return Kind == k_Register && Reg.Kind == RegKind::NeonVector;		return Kind == k_Register && Reg.Kind == RegKind::NeonVector;
}		}

bool isNeonVectorRegLo() const {		bool isNeonVectorRegLo() const {
return Kind == k_Register && Reg.Kind == RegKind::NeonVector &&		return Kind == k_Register && Reg.Kind == RegKind::NeonVector &&
AArch64MCRegisterClasses[AArch64::FPR128_loRegClassID].contains(		(AArch64MCRegisterClasses[AArch64::FPR128_loRegClassID].contains(
Reg.RegNum);		Reg.RegNum) \|\|
		AArch64MCRegisterClasses[AArch64::FPR64_loRegClassID].contains(
		Reg.RegNum));
}		}

template <unsigned Class> bool isSVEVectorReg() const {		template <unsigned Class> bool isSVEVectorReg() const {
RegKind RK;		RegKind RK;
switch (Class) {		switch (Class) {
case AArch64::ZPRRegClassID:		case AArch64::ZPRRegClassID:
case AArch64::ZPR_3bRegClassID:		case AArch64::ZPR_3bRegClassID:
case AArch64::ZPR_4bRegClassID:		case AArch64::ZPR_4bRegClassID:
▲ Show 20 Lines • Show All 4,716 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s --check-prefixes=CHECK,GENERIC			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s --check-prefixes=CHECK,GENERIC
	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast -mcpu=exynos-m3 \| FileCheck %s --check-prefixes=CHECK,EXYNOSM3			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast -mcpu=exynos-m3 \| FileCheck %s --check-prefixes=CHECK,EXYNOSM3

	declare <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double>, <2 x double>)

	declare <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float>, <4 x float>)			declare <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float>, <4 x float>)

	declare <2 x float> @llvm.aarch64.neon.fmulx.v2f32(<2 x float>, <2 x float>)			declare <2 x float> @llvm.aarch64.neon.fmulx.v2f32(<2 x float>, <2 x float>)

	declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32>, <4 x i32>)
				declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32.v2i32(<4 x i32>, <2 x i32>, i32)
				declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32.v4i32(<4 x i32>, <4 x i32>, i32)

	declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32>, <2 x i32>)			declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32>, <2 x i32>)
				declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32.v2i32(<2 x i32>, <2 x i32>, i32)
				declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32.v4i32(<2 x i32>, <4 x i32>, i32)

	declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16>, <8 x i16>)			declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16>, <8 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v8i16.v4i16(<8 x i16>, <4 x i16>, i32)
				declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16.v8i16(<8 x i16>, <8 x i16>, i32)

	declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16>, <4 x i16>)			declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16>, <4 x i16>)
				declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v4i16.v4i16(<4 x i16>, <4 x i16>, i32)
				declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16.v8i16(<4 x i16>, <8 x i16>, i32)

	declare <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32>, <4 x i32>)
				declare <4 x i32> @llvm.aarch64.neon.sqdmulh.lane.v4i32.v2i32(<4 x i32>, <2 x i32>, i32)
				declare <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32.v4i32(<4 x i32>, <4 x i32>, i32)

	declare <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32>, <2 x i32>)			declare <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32>, <2 x i32>)
				declare <2 x i32> @llvm.aarch64.neon.sqdmulh.lane.v2i32.v2i32(<2 x i32>, <2 x i32>, i32)
				declare <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32.v4i32(<2 x i32>, <4 x i32>, i32)

	declare <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16>, <8 x i16>)			declare <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16>, <8 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.sqdmulh.lane.v8i16.v4i16(<8 x i16>, <4 x i16>, i32)
				declare <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16.v8i16(<8 x i16>, <8 x i16>, i32)

	declare <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16>, <4 x i16>)			declare <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16>, <4 x i16>)
				declare <4 x i16> @llvm.aarch64.neon.sqdmulh.lane.v4i16.v4i16(<4 x i16>, <4 x i16>, i32)
				declare <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16.v8i16(<4 x i16>, <8 x i16>, i32)

	declare <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32>, <2 x i32>)			declare <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32>, <2 x i32>)

	declare <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16>, <4 x i16>)			declare <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16>, <4 x i16>)

	declare <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64>, <2 x i64>)			declare <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64>, <2 x i64>)

	declare <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32>, <4 x i32>)
	▲ Show 20 Lines • Show All 1,476 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[3]			; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>			%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)			%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
	ret <4 x i16> %vqdmulh2.i			ret <4 x i16> %vqdmulh2.i
	}			}

				define <4 x i16> @test_vqdmulh_lane_s16_intrinsic(<4 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqdmulh_lane_s16_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.lane.v4i16.v4i16(<4 x i16> %a, <4 x i16> %v, i32 3)
				ret <4 x i16> %vqdmulh2.i
				}

				define <4 x i16> @test_vqdmulh_laneq_s16_intrinsic_lo(<4 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqdmulh_laneq_s16_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16.v8i16(<4 x i16> %a, <8 x i16> %v, i32 3)
				ret <4 x i16> %vqdmulh2.i
				}

				define <4 x i16> @test_vqdmulh_laneq_s16_intrinsic_hi(<4 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqdmulh_laneq_s16_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[7]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16.v8i16(<4 x i16> %a, <8 x i16> %v, i32 7)
				ret <4 x i16> %vqdmulh2.i
				}

	define <8 x i16> @test_vqdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {			define <8 x i16> @test_vqdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {
	; CHECK-LABEL: test_vqdmulhq_lane_s16:			; CHECK-LABEL: test_vqdmulhq_lane_s16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[3]			; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle)			%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle)
	ret <8 x i16> %vqdmulh2.i			ret <8 x i16> %vqdmulh2.i
	}			}

				define <8 x i16> @test_vqdmulhq_lane_s16_intrinsic(<8 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqdmulhq_lane_s16_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.lane.v8i16.v4i16(<8 x i16> %a, <4 x i16> %v, i32 3)
				ret <8 x i16> %vqdmulh2.i
				}

				define <8 x i16> @test_vqdmulhq_laneq_s16_intrinsic_lo(<8 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqdmulhq_laneq_s16_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16.v8i16(<8 x i16> %a, <8 x i16> %v, i32 3)
				ret <8 x i16> %vqdmulh2.i
				}

				define <8 x i16> @test_vqdmulhq_laneq_s16_intrinsic_hi(<8 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqdmulhq_laneq_s16_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[7]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16.v8i16(<8 x i16> %a, <8 x i16> %v, i32 7)
				ret <8 x i16> %vqdmulh2.i
				}

	define <2 x i32> @test_vqdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {			define <2 x i32> @test_vqdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {
	; CHECK-LABEL: test_vqdmulh_lane_s32:			; CHECK-LABEL: test_vqdmulh_lane_s32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[1]			; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle)			%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle)
	ret <2 x i32> %vqdmulh2.i			ret <2 x i32> %vqdmulh2.i
	}			}

				define <2 x i32> @test_vqdmulh_lane_s32_intrinsic(<2 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqdmulh_lane_s32_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.lane.v2i32.v2i32(<2 x i32> %a, <2 x i32> %v, i32 1)
				ret <2 x i32> %vqdmulh2.i
				}

				define <2 x i32> @test_vqdmulh_laneq_s32_intrinsic_lo(<2 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqdmulh_laneq_s32_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32.v4i32(<2 x i32> %a, <4 x i32> %v, i32 1)
				ret <2 x i32> %vqdmulh2.i
				}

				define <2 x i32> @test_vqdmulh_laneq_s32_intrinsic_hi(<2 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqdmulh_laneq_s32_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32.v4i32(<2 x i32> %a, <4 x i32> %v, i32 3)
				ret <2 x i32> %vqdmulh2.i
				}

	define <4 x i32> @test_vqdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {			define <4 x i32> @test_vqdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {
	; CHECK-LABEL: test_vqdmulhq_lane_s32:			; CHECK-LABEL: test_vqdmulhq_lane_s32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[1]			; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>			%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle)			%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle)
	ret <4 x i32> %vqdmulh2.i			ret <4 x i32> %vqdmulh2.i
	}			}

				define <4 x i32> @test_vqdmulhq_lane_s32_intrinsic(<4 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqdmulhq_lane_s32_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.lane.v4i32.v2i32(<4 x i32> %a, <2 x i32> %v, i32 1)
				ret <4 x i32> %vqdmulh2.i
				}

				define <4 x i32> @test_vqdmulhq_laneq_s32_intrinsic_lo(<4 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqdmulhq_laneq_s32_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32.v4i32(<4 x i32> %a, <4 x i32> %v, i32 1)
				ret <4 x i32> %vqdmulh2.i
				}

				define <4 x i32> @test_vqdmulhq_laneq_s32_intrinsic_hi(<4 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqdmulhq_laneq_s32_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32.v4i32(<4 x i32> %a, <4 x i32> %v, i32 3)
				ret <4 x i32> %vqdmulh2.i
				}

	define <4 x i16> @test_vqrdmulh_lane_s16(<4 x i16> %a, <4 x i16> %v) {			define <4 x i16> @test_vqrdmulh_lane_s16(<4 x i16> %a, <4 x i16> %v) {
	; CHECK-LABEL: test_vqrdmulh_lane_s16:			; CHECK-LABEL: test_vqrdmulh_lane_s16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[3]			; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>			%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)			%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
	ret <4 x i16> %vqrdmulh2.i			ret <4 x i16> %vqrdmulh2.i
	}			}

				define <4 x i16> @test_vqrdmulh_lane_s16_intrinsic(<4 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulh_lane_s16_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v4i16.v4i16(<4 x i16> %a, <4 x i16> %v, i32 3)
				ret <4 x i16> %vqrdmulh2.i
				}

				define <4 x i16> @test_vqrdmulh_laneq_s16_intrinsic_lo(<4 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulh_laneq_s16_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16.v8i16(<4 x i16> %a, <8 x i16> %v, i32 3)
				ret <4 x i16> %vqrdmulh2.i
				}

				define <4 x i16> @test_vqrdmulh_laneq_s16_intrinsic_hi(<4 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulh_laneq_s16_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[7]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16.v8i16(<4 x i16> %a, <8 x i16> %v, i32 7)
				ret <4 x i16> %vqrdmulh2.i
				}

	define <8 x i16> @test_vqrdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {			define <8 x i16> @test_vqrdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {
	; CHECK-LABEL: test_vqrdmulhq_lane_s16:			; CHECK-LABEL: test_vqrdmulhq_lane_s16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[3]			; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle)			%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle)
	ret <8 x i16> %vqrdmulh2.i			ret <8 x i16> %vqrdmulh2.i
	}			}

				define <8 x i16> @test_vqrdmulhq_lane_s16_intrinsic(<8 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulhq_lane_s16_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v8i16.v4i16(<8 x i16> %a, <4 x i16> %v, i32 3)
				ret <8 x i16> %vqrdmulh2.i
				}

				define <8 x i16> @test_vqrdmulhq_laneq_s16_intrinsic_lo(<8 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulhq_laneq_s16_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16.v8i16(<8 x i16> %a, <8 x i16> %v, i32 3)
				ret <8 x i16> %vqrdmulh2.i
				}

				define <8 x i16> @test_vqrdmulhq_laneq_s16_intrinsic_hi(<8 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulhq_laneq_s16_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[7]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16.v8i16(<8 x i16> %a, <8 x i16> %v, i32 7)
				ret <8 x i16> %vqrdmulh2.i
				}

	define <2 x i32> @test_vqrdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {			define <2 x i32> @test_vqrdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {
	; CHECK-LABEL: test_vqrdmulh_lane_s32:			; CHECK-LABEL: test_vqrdmulh_lane_s32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[1]			; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle)			%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle)
	ret <2 x i32> %vqrdmulh2.i			ret <2 x i32> %vqrdmulh2.i
	}			}

				define <2 x i32> @test_vqrdmulh_lane_s32_intrinsic(<2 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulh_lane_s32_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32.v2i32(<2 x i32> %a, <2 x i32> %v, i32 1)
				ret <2 x i32> %vqrdmulh2.i
				}

				define <2 x i32> @test_vqrdmulh_laneq_s32_intrinsic_lo(<2 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulh_laneq_s32_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32.v4i32(<2 x i32> %a, <4 x i32> %v, i32 1)
				ret <2 x i32> %vqrdmulh2.i
				}

				define <2 x i32> @test_vqrdmulh_laneq_s32_intrinsic_hi(<2 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulh_laneq_s32_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32.v4i32(<2 x i32> %a, <4 x i32> %v, i32 3)
				ret <2 x i32> %vqrdmulh2.i
				}

	define <4 x i32> @test_vqrdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {			define <4 x i32> @test_vqrdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {
	; CHECK-LABEL: test_vqrdmulhq_lane_s32:			; CHECK-LABEL: test_vqrdmulhq_lane_s32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[1]			; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>			%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle)			%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle)
	ret <4 x i32> %vqrdmulh2.i			ret <4 x i32> %vqrdmulh2.i
	}			}

				define <4 x i32> @test_vqrdmulhq_lane_s32_intrinsic(<4 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulhq_lane_s32_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32.v2i32(<4 x i32> %a, <2 x i32> %v, i32 1)
				ret <4 x i32> %vqrdmulh2.i
				}

				define <4 x i32> @test_vqrdmulhq_laneq_s32_intrinsic_lo(<4 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulhq_laneq_s32_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32.v4i32(<4 x i32> %a, <4 x i32> %v, i32 1)
				ret <4 x i32> %vqrdmulh2.i
				}

				define <4 x i32> @test_vqrdmulhq_laneq_s32_intrinsic_hi(<4 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulhq_laneq_s32_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32.v4i32(<4 x i32> %a, <4 x i32> %v, i32 3)
				ret <4 x i32> %vqrdmulh2.i
				}

	define <2 x float> @test_vmul_lane_f32(<2 x float> %a, <2 x float> %v) {			define <2 x float> @test_vmul_lane_f32(<2 x float> %a, <2 x float> %v) {
	; CHECK-LABEL: test_vmul_lane_f32:			; CHECK-LABEL: test_vmul_lane_f32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[1]			; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>			%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
	▲ Show 20 Lines • Show All 1,844 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 241125

clang/include/clang/Basic/arm_neon.td

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/aarch64-neon-2velem.c

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AArch64/AArch64RegisterBankInfo.cpp

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp

llvm/lib/Target/AArch64/AArch64RegisterInfo.td

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp

llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll

[AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)
ClosedPublic