This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)
ClosedPublic

Authored by sanwou01 on Dec 13 2019, 8:02 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen
t.p.northover
rovka
rengolin
efriedma

Commits

rG2939fc13c8f6: [AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)

Summary

Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:

%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)

When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.

This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:

int16x8_t v = {2,3,4,5,6,7,8,9};
a = vqdmulh_laneq_s16(a, v, 0);
b = vqdmulh_laneq_s16(b, v, 1);
c = vqdmulh_laneq_s16(c, v, 2);
d = vqdmulh_laneq_s16(d, v, 3);
[...]

In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.

We could teach the compiler to recover the lane variants, but this would likely
require its own pass. (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)

This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:

sqdmulh_lane
sqdmulh_laneq
sqrdmulh_lane
sqrdmulh_laneq.

These 'lane' variants need an additional register class. The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.

Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.

This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 42465
Build 42955: arc lint + arc unit

Event Timeline

sanwou01 created this revision.Dec 13 2019, 8:02 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 13 2019, 8:02 AM

Herald added subscribers: llvm-commits, cfe-commits, jdoerfert and 2 others. · View Herald Transcript

sanwou01 added reviewers: SjoerdMeijer, dmgreen, t.p.northover.Dec 13 2019, 8:06 AM

Harbormaster completed remote builds in B42465: Diff 233807.Dec 13 2019, 8:06 AM

ping?

This makes it impossible to do a neat trick when using NEON intrinsics: one can load a number of constants using a single vector load, which are then repeatedly used to multiply whole vectors by one of the constants. This trick is used for a nice performance upside (2.5% to 4% on one microbenchmark) in libjpeg-turbo.

I'm not completely sure I follow here. The "trick" is something like the following?

int16x8_t v = {2,3,4,5,6,7,8,9};
a = vqdmulh_laneq_s16(a, v, 0);
b = vqdmulh_laneq_s16(b, v, 1);
c = vqdmulh_laneq_s16(c, v, 2);
d = vqdmulh_laneq_s16(d, v, 3);
[...]

I can see how that could be helpful. The compiler could probably be taught to recover something like the original structure, but it would probably require a dedicated pass. Or I guess you could hack the source to use "volatile", but that would be ugly.

I'm a little unhappy we're forced to introduce more intrinsics here, but it might be the best solution to avoid breaking carefully tuned code like this.

llvm/lib/IR/Function.cpp
1374	Hardcoding "64" and "128" in target-independent code here seems like a bad idea. Can we just let both vector operands have any vector type, and reject in the backend if we see an unexpected type?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6054	Is this related somehow?

Thanks Eli.

The "trick" is something like the following?
[...]

Yeah, that's exactly right. Your assessment of the options (dedicated pass, "volatile") matches our thinking as well. I'll update the commit message to make this a bit clearer.

llvm/lib/IR/Function.cpp
1374	Makes sense. Any type vector for both operands is certainly doable. Instruction selection will fail if you try to use a non-existent intrinsic, which is not the nicest failure mode, but probably good enough for intrinsics? Emitting the correct arm_neon.h for clang is a little less trivial, but not by too much.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
6054	This popped up when I was looking for uses of FPR128_loRegClass; it made sense to do the same for FPR64_lo. Doesn't seem essential though, so I'm happy to leave this out.

Address Eli's feedback; clarified commit message.

Harbormaster completed remote builds in B45140: Diff 240902.Jan 28 2020, 9:21 AM

LGTM

This revision is now accepted and ready to land.Jan 28 2020, 12:38 PM

Closed by commit rG2939fc13c8f6: [AArch64] Add IR intrinsics for sq(r)dmulh_lane(q) (authored by sanwou01). · Explain WhyJan 29 2020, 5:40 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

arm_neon.td

16 lines

lib/

CodeGen/

CGBuiltin.cpp

8 lines

test/

CodeGen/

aarch64-neon-2velem.c

384 lines

llvm/

include/

llvm/

IR/

Intrinsics.h

5 lines

Intrinsics.td

3 lines

IntrinsicsAArch64.td

12 lines

lib/

IR/

Function.cpp

53 lines

Target/

AArch64/

AArch64ISelLowering.cpp

2 lines

AArch64InstrFormats.td

61 lines

AArch64InstrInfo.td

5 lines

AArch64RegisterBankInfo.cpp

1 line

AArch64RegisterInfo.cpp

1 line

AArch64RegisterInfo.td

7 lines

AsmParser/

AArch64AsmParser.cpp

6 lines

test/

CodeGen/

AArch64/

arm64-neon-2velem.ll

264 lines

utils/

TableGen/

IntrinsicEmitter.cpp

8 lines

Diff 233807

clang/include/clang/Basic/arm_neon.td

Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines
def VMUL_N : IOpInst<"vmul_n", "..1", "sifUsUiQsQiQfQUsQUi", OP_MUL_N>;		def VMUL_N : IOpInst<"vmul_n", "..1", "sifUsUiQsQiQfQUsQUi", OP_MUL_N>;
def VMUL_LANE : IOpInst<"vmul_lane", "..qI",		def VMUL_LANE : IOpInst<"vmul_lane", "..qI",
"sifUsUiQsQiQfQUsQUi", OP_MUL_LN>;		"sifUsUiQsQiQfQUsQUi", OP_MUL_LN>;
def VMULL_N : SOpInst<"vmull_n", "(>Q).1", "siUsUi", OP_MULL_N>;		def VMULL_N : SOpInst<"vmull_n", "(>Q).1", "siUsUi", OP_MULL_N>;
def VMULL_LANE : SOpInst<"vmull_lane", "(>Q)..I", "siUsUi", OP_MULL_LN>;		def VMULL_LANE : SOpInst<"vmull_lane", "(>Q)..I", "siUsUi", OP_MULL_LN>;
def VQDMULL_N : SOpInst<"vqdmull_n", "(>Q).1", "si", OP_QDMULL_N>;		def VQDMULL_N : SOpInst<"vqdmull_n", "(>Q).1", "si", OP_QDMULL_N>;
def VQDMULL_LANE : SOpInst<"vqdmull_lane", "(>Q)..I", "si", OP_QDMULL_LN>;		def VQDMULL_LANE : SOpInst<"vqdmull_lane", "(>Q)..I", "si", OP_QDMULL_LN>;
def VQDMULH_N : SOpInst<"vqdmulh_n", "..1", "siQsQi", OP_QDMULH_N>;		def VQDMULH_N : SOpInst<"vqdmulh_n", "..1", "siQsQi", OP_QDMULH_N>;
def VQDMULH_LANE : SOpInst<"vqdmulh_lane", "..qI", "siQsQi", OP_QDMULH_LN>;
def VQRDMULH_N : SOpInst<"vqrdmulh_n", "..1", "siQsQi", OP_QRDMULH_N>;		def VQRDMULH_N : SOpInst<"vqrdmulh_n", "..1", "siQsQi", OP_QRDMULH_N>;

		let ArchGuard = "!defined(__aarch64__)" in {
		def VQDMULH_LANE : SOpInst<"vqdmulh_lane", "..qI", "siQsQi", OP_QDMULH_LN>;
def VQRDMULH_LANE : SOpInst<"vqrdmulh_lane", "..qI", "siQsQi", OP_QRDMULH_LN>;		def VQRDMULH_LANE : SOpInst<"vqrdmulh_lane", "..qI", "siQsQi", OP_QRDMULH_LN>;
		}
		let ArchGuard = "defined(__aarch64__)" in {
		def A64_VQDMULH_LANE : SInst<"vqdmulh_lane", "..qI", "siQsQi">;
		def A64_VQRDMULH_LANE : SInst<"vqrdmulh_lane", "..qI", "siQsQi">;
		}

let ArchGuard = "defined(__ARM_FEATURE_QRDMX)" in {		let ArchGuard = "defined(__ARM_FEATURE_QRDMX)" in {
def VQRDMLAH_LANE : SOpInst<"vqrdmlah_lane", "...qI", "siQsQi", OP_QRDMLAH_LN>;		def VQRDMLAH_LANE : SOpInst<"vqrdmlah_lane", "...qI", "siQsQi", OP_QRDMLAH_LN>;
def VQRDMLSH_LANE : SOpInst<"vqrdmlsh_lane", "...qI", "siQsQi", OP_QRDMLSH_LN>;		def VQRDMLSH_LANE : SOpInst<"vqrdmlsh_lane", "...qI", "siQsQi", OP_QRDMLSH_LN>;
}		}

def VMLA_N : IOpInst<"vmla_n", "...1", "siUsUifQsQiQUsQUiQf", OP_MLA_N>;		def VMLA_N : IOpInst<"vmla_n", "...1", "siUsUifQsQiQUsQUiQf", OP_MLA_N>;
def VMLAL_N : SOpInst<"vmlal_n", "(>Q)(>Q).1", "siUsUi", OP_MLAL_N>;		def VMLAL_N : SOpInst<"vmlal_n", "(>Q)(>Q).1", "siUsUi", OP_MLAL_N>;
▲ Show 20 Lines • Show All 404 Lines • ▼ Show 20 Lines	def VMULL_HIGH_LANEQ : SOpInst<"vmull_high_laneq", "(>Q)QQI", "siUsUi",
OP_MULLHi_LN>;		OP_MULLHi_LN>;

def VQDMULL_LANEQ : SOpInst<"vqdmull_laneq", "(>Q).QI", "si", OP_QDMULL_LN>;		def VQDMULL_LANEQ : SOpInst<"vqdmull_laneq", "(>Q).QI", "si", OP_QDMULL_LN>;
def VQDMULL_HIGH_LANE : SOpInst<"vqdmull_high_lane", "(>Q)Q.I", "si",		def VQDMULL_HIGH_LANE : SOpInst<"vqdmull_high_lane", "(>Q)Q.I", "si",
OP_QDMULLHi_LN>;		OP_QDMULLHi_LN>;
def VQDMULL_HIGH_LANEQ : SOpInst<"vqdmull_high_laneq", "(>Q)QQI", "si",		def VQDMULL_HIGH_LANEQ : SOpInst<"vqdmull_high_laneq", "(>Q)QQI", "si",
OP_QDMULLHi_LN>;		OP_QDMULLHi_LN>;

def VQDMULH_LANEQ : SOpInst<"vqdmulh_laneq", "..QI", "siQsQi", OP_QDMULH_LN>;		let isLaneQ = 1 in {
def VQRDMULH_LANEQ : SOpInst<"vqrdmulh_laneq", "..QI", "siQsQi", OP_QRDMULH_LN>;		def VQDMULH_LANEQ : SInst<"vqdmulh_laneq", "..QI", "siQsQi">;
		def VQRDMULH_LANEQ : SInst<"vqrdmulh_laneq", "..QI", "siQsQi">;
		}
let ArchGuard = "defined(__ARM_FEATURE_QRDMX) && defined(__aarch64__)" in {		let ArchGuard = "defined(__ARM_FEATURE_QRDMX) && defined(__aarch64__)" in {
def VQRDMLAH_LANEQ : SOpInst<"vqrdmlah_laneq", "...QI", "siQsQi", OP_QRDMLAH_LN>;		def VQRDMLAH_LANEQ : SOpInst<"vqrdmlah_laneq", "...QI", "siQsQi", OP_QRDMLAH_LN>;
def VQRDMLSH_LANEQ : SOpInst<"vqrdmlsh_laneq", "...QI", "siQsQi", OP_QRDMLSH_LN>;		def VQRDMLSH_LANEQ : SOpInst<"vqrdmlsh_laneq", "...QI", "siQsQi", OP_QRDMLSH_LN>;
}		}

// Note: d type implemented by SCALAR_VMULX_LANE		// Note: d type implemented by SCALAR_VMULX_LANE
def VMULX_LANE : IOpInst<"vmulx_lane", "..qI", "fQfQd", OP_MULX_LN>;		def VMULX_LANE : IOpInst<"vmulx_lane", "..qI", "fQfQd", OP_MULX_LN>;
// Note: d type is implemented by SCALAR_VMULX_LANEQ		// Note: d type is implemented by SCALAR_VMULX_LANEQ
▲ Show 20 Lines • Show All 721 Lines • ▼ Show 20 Lines	let ArchGuard = "defined(__ARM_FEATURE_COMPLEX)" in {
def VCADD_ROT270 : SInst<"vcadd_rot270", "...", "f">;		def VCADD_ROT270 : SInst<"vcadd_rot270", "...", "f">;
def VCADDQ_ROT90 : SInst<"vcaddq_rot90", "QQQ", "f">;		def VCADDQ_ROT90 : SInst<"vcaddq_rot90", "QQQ", "f">;
def VCADDQ_ROT270 : SInst<"vcaddq_rot270", "QQQ", "f">;		def VCADDQ_ROT270 : SInst<"vcaddq_rot270", "QQQ", "f">;
}		}
let ArchGuard = "defined(__ARM_FEATURE_COMPLEX) && defined(__aarch64__)" in {		let ArchGuard = "defined(__ARM_FEATURE_COMPLEX) && defined(__aarch64__)" in {
def VCADDQ_ROT90_FP64 : SInst<"vcaddq_rot90", "QQQ", "d">;		def VCADDQ_ROT90_FP64 : SInst<"vcaddq_rot90", "QQQ", "d">;
def VCADDQ_ROT270_FP64 : SInst<"vcaddq_rot270", "QQQ", "d">;		def VCADDQ_ROT270_FP64 : SInst<"vcaddq_rot270", "QQQ", "d">;
}		}
No newline at end of file		No newline at end of file

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,931 Lines • ▼ Show 20 Lines	static const NeonIntrinsicInfo AArch64SIMDIntrinsicMap[] = {
NEONMAP2(vpaddlq_v, aarch64_neon_uaddlp, aarch64_neon_saddlp, UnsignedAlts),		NEONMAP2(vpaddlq_v, aarch64_neon_uaddlp, aarch64_neon_saddlp, UnsignedAlts),
NEONMAP1(vpaddq_v, aarch64_neon_addp, Add1ArgType),		NEONMAP1(vpaddq_v, aarch64_neon_addp, Add1ArgType),
NEONMAP1(vqabs_v, aarch64_neon_sqabs, Add1ArgType),		NEONMAP1(vqabs_v, aarch64_neon_sqabs, Add1ArgType),
NEONMAP1(vqabsq_v, aarch64_neon_sqabs, Add1ArgType),		NEONMAP1(vqabsq_v, aarch64_neon_sqabs, Add1ArgType),
NEONMAP2(vqadd_v, aarch64_neon_uqadd, aarch64_neon_sqadd, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqadd_v, aarch64_neon_uqadd, aarch64_neon_sqadd, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqaddq_v, aarch64_neon_uqadd, aarch64_neon_sqadd, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqaddq_v, aarch64_neon_uqadd, aarch64_neon_sqadd, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqdmlal_v, aarch64_neon_sqdmull, aarch64_neon_sqadd, 0),		NEONMAP2(vqdmlal_v, aarch64_neon_sqdmull, aarch64_neon_sqadd, 0),
NEONMAP2(vqdmlsl_v, aarch64_neon_sqdmull, aarch64_neon_sqsub, 0),		NEONMAP2(vqdmlsl_v, aarch64_neon_sqdmull, aarch64_neon_sqsub, 0),
		NEONMAP1(vqdmulh_lane_v, aarch64_neon_sqdmulh_lane, Add1ArgType),
		NEONMAP1(vqdmulh_laneq_v, aarch64_neon_sqdmulh_laneq, Add1ArgType),
NEONMAP1(vqdmulh_v, aarch64_neon_sqdmulh, Add1ArgType),		NEONMAP1(vqdmulh_v, aarch64_neon_sqdmulh, Add1ArgType),
		NEONMAP1(vqdmulhq_lane_v, aarch64_neon_sqdmulh_lane, Add1ArgType),
		NEONMAP1(vqdmulhq_laneq_v, aarch64_neon_sqdmulh_laneq, Add1ArgType),
NEONMAP1(vqdmulhq_v, aarch64_neon_sqdmulh, Add1ArgType),		NEONMAP1(vqdmulhq_v, aarch64_neon_sqdmulh, Add1ArgType),
NEONMAP1(vqdmull_v, aarch64_neon_sqdmull, Add1ArgType),		NEONMAP1(vqdmull_v, aarch64_neon_sqdmull, Add1ArgType),
NEONMAP2(vqmovn_v, aarch64_neon_uqxtn, aarch64_neon_sqxtn, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqmovn_v, aarch64_neon_uqxtn, aarch64_neon_sqxtn, Add1ArgType \| UnsignedAlts),
NEONMAP1(vqmovun_v, aarch64_neon_sqxtun, Add1ArgType),		NEONMAP1(vqmovun_v, aarch64_neon_sqxtun, Add1ArgType),
NEONMAP1(vqneg_v, aarch64_neon_sqneg, Add1ArgType),		NEONMAP1(vqneg_v, aarch64_neon_sqneg, Add1ArgType),
NEONMAP1(vqnegq_v, aarch64_neon_sqneg, Add1ArgType),		NEONMAP1(vqnegq_v, aarch64_neon_sqneg, Add1ArgType),
		NEONMAP1(vqrdmulh_lane_v, aarch64_neon_sqrdmulh_lane, Add1ArgType),
		NEONMAP1(vqrdmulh_laneq_v, aarch64_neon_sqrdmulh_laneq, Add1ArgType),
NEONMAP1(vqrdmulh_v, aarch64_neon_sqrdmulh, Add1ArgType),		NEONMAP1(vqrdmulh_v, aarch64_neon_sqrdmulh, Add1ArgType),
		NEONMAP1(vqrdmulhq_lane_v, aarch64_neon_sqrdmulh_lane, Add1ArgType),
		NEONMAP1(vqrdmulhq_laneq_v, aarch64_neon_sqrdmulh_laneq, Add1ArgType),
NEONMAP1(vqrdmulhq_v, aarch64_neon_sqrdmulh, Add1ArgType),		NEONMAP1(vqrdmulhq_v, aarch64_neon_sqrdmulh, Add1ArgType),
NEONMAP2(vqrshl_v, aarch64_neon_uqrshl, aarch64_neon_sqrshl, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqrshl_v, aarch64_neon_uqrshl, aarch64_neon_sqrshl, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqrshlq_v, aarch64_neon_uqrshl, aarch64_neon_sqrshl, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqrshlq_v, aarch64_neon_uqrshl, aarch64_neon_sqrshl, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqshl_n_v, aarch64_neon_uqshl, aarch64_neon_sqshl, UnsignedAlts),		NEONMAP2(vqshl_n_v, aarch64_neon_uqshl, aarch64_neon_sqshl, UnsignedAlts),
NEONMAP2(vqshl_v, aarch64_neon_uqshl, aarch64_neon_sqshl, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqshl_v, aarch64_neon_uqshl, aarch64_neon_sqshl, Add1ArgType \| UnsignedAlts),
NEONMAP2(vqshlq_n_v, aarch64_neon_uqshl, aarch64_neon_sqshl,UnsignedAlts),		NEONMAP2(vqshlq_n_v, aarch64_neon_uqshl, aarch64_neon_sqshl,UnsignedAlts),
NEONMAP2(vqshlq_v, aarch64_neon_uqshl, aarch64_neon_sqshl, Add1ArgType \| UnsignedAlts),		NEONMAP2(vqshlq_v, aarch64_neon_uqshl, aarch64_neon_sqshl, Add1ArgType \| UnsignedAlts),
NEONMAP1(vqshlu_n_v, aarch64_neon_sqshlu, 0),		NEONMAP1(vqshlu_n_v, aarch64_neon_sqshlu, 0),
▲ Show 20 Lines • Show All 9,913 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-neon-2velem.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,434 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: ret <2 x i64> [[VQDMULL_V2_I]]			// CHECK-NEXT: ret <2 x i64> [[VQDMULL_V2_I]]
	//			//
	int64x2_t test_vqdmull_high_laneq_s32(int32x4_t a, int32x4_t v) {			int64x2_t test_vqdmull_high_laneq_s32(int32x4_t a, int32x4_t v) {
	return vqdmull_high_laneq_s32(a, v, 3);			return vqdmull_high_laneq_s32(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulh_lane_s16(			// CHECK-LABEL: @test_vqdmulh_lane_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANE_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.lane.v4i16(<4 x i16> [[VQDMULH_LANE_V]], <4 x i16> [[VQDMULH_LANE_V1]], i32 3)
				// CHECK-NEXT: [[VQDMULH_LANE_V3:%.*]] = bitcast <4 x i16> [[VQDMULH_LANE_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQDMULH_LANE_V3]] to <4 x i16>
				// CHECK-NEXT: ret <4 x i16> [[TMP2]]
	//			//
	int16x4_t test_vqdmulh_lane_s16(int16x4_t a, int16x4_t v) {			int16x4_t test_vqdmulh_lane_s16(int16x4_t a, int16x4_t v) {
	return vqdmulh_lane_s16(a, v, 3);			return vqdmulh_lane_s16(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_lane_s16(			// CHECK-LABEL: @test_vqdmulhq_lane_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANE_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.lane.v8i16(<8 x i16> [[VQDMULHQ_LANE_V]], <4 x i16> [[VQDMULHQ_LANE_V1]], i32 3)
				// CHECK-NEXT: [[VQDMULHQ_LANE_V3:%.*]] = bitcast <8 x i16> [[VQDMULHQ_LANE_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQDMULHQ_LANE_V3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	int16x8_t test_vqdmulhq_lane_s16(int16x8_t a, int16x4_t v) {			int16x8_t test_vqdmulhq_lane_s16(int16x8_t a, int16x4_t v) {
	return vqdmulhq_lane_s16(a, v, 3);			return vqdmulhq_lane_s16(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulh_lane_s32(			// CHECK-LABEL: @test_vqdmulh_lane_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <2 x i32> <i32 1, i32 1>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANE_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.lane.v2i32(<2 x i32> [[VQDMULH_LANE_V]], <2 x i32> [[VQDMULH_LANE_V1]], i32 1)
				// CHECK-NEXT: [[VQDMULH_LANE_V3:%.*]] = bitcast <2 x i32> [[VQDMULH_LANE_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQDMULH_LANE_V3]] to <2 x i32>
				// CHECK-NEXT: ret <2 x i32> [[TMP2]]
	//			//
	int32x2_t test_vqdmulh_lane_s32(int32x2_t a, int32x2_t v) {			int32x2_t test_vqdmulh_lane_s32(int32x2_t a, int32x2_t v) {
	return vqdmulh_lane_s32(a, v, 1);			return vqdmulh_lane_s32(a, v, 1);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_lane_s32(			// CHECK-LABEL: @test_vqdmulhq_lane_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANE_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.lane.v4i32(<4 x i32> [[VQDMULHQ_LANE_V]], <2 x i32> [[VQDMULHQ_LANE_V1]], i32 1)
				// CHECK-NEXT: [[VQDMULHQ_LANE_V3:%.*]] = bitcast <4 x i32> [[VQDMULHQ_LANE_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQDMULHQ_LANE_V3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	int32x4_t test_vqdmulhq_lane_s32(int32x4_t a, int32x2_t v) {			int32x4_t test_vqdmulhq_lane_s32(int32x4_t a, int32x2_t v) {
	return vqdmulhq_lane_s32(a, v, 1);			return vqdmulhq_lane_s32(a, v, 1);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_lane_s16(			// CHECK-LABEL: @test_vqrdmulh_lane_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANE_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v4i16(<4 x i16> [[VQRDMULH_LANE_V]], <4 x i16> [[VQRDMULH_LANE_V1]], i32 3)
				// CHECK-NEXT: [[VQRDMULH_LANE_V3:%.*]] = bitcast <4 x i16> [[VQRDMULH_LANE_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQRDMULH_LANE_V3]] to <4 x i16>
				// CHECK-NEXT: ret <4 x i16> [[TMP2]]
	//			//
	int16x4_t test_vqrdmulh_lane_s16(int16x4_t a, int16x4_t v) {			int16x4_t test_vqrdmulh_lane_s16(int16x4_t a, int16x4_t v) {
	return vqrdmulh_lane_s16(a, v, 3);			return vqrdmulh_lane_s16(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_lane_s16(			// CHECK-LABEL: @test_vqrdmulhq_lane_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANE_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v8i16(<8 x i16> [[VQRDMULHQ_LANE_V]], <4 x i16> [[VQRDMULHQ_LANE_V1]], i32 3)
				// CHECK-NEXT: [[VQRDMULHQ_LANE_V3:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_LANE_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQRDMULHQ_LANE_V3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	int16x8_t test_vqrdmulhq_lane_s16(int16x8_t a, int16x4_t v) {			int16x8_t test_vqrdmulhq_lane_s16(int16x8_t a, int16x4_t v) {
	return vqrdmulhq_lane_s16(a, v, 3);			return vqrdmulhq_lane_s16(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_lane_s32(			// CHECK-LABEL: @test_vqrdmulh_lane_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <2 x i32> <i32 1, i32 1>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANE_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32(<2 x i32> [[VQRDMULH_LANE_V]], <2 x i32> [[VQRDMULH_LANE_V1]], i32 1)
				// CHECK-NEXT: [[VQRDMULH_LANE_V3:%.*]] = bitcast <2 x i32> [[VQRDMULH_LANE_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQRDMULH_LANE_V3]] to <2 x i32>
				// CHECK-NEXT: ret <2 x i32> [[TMP2]]
	//			//
	int32x2_t test_vqrdmulh_lane_s32(int32x2_t a, int32x2_t v) {			int32x2_t test_vqrdmulh_lane_s32(int32x2_t a, int32x2_t v) {
	return vqrdmulh_lane_s32(a, v, 1);			return vqrdmulh_lane_s32(a, v, 1);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_lane_s32(			// CHECK-LABEL: @test_vqrdmulhq_lane_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANE_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32(<4 x i32> [[VQRDMULHQ_LANE_V]], <2 x i32> [[VQRDMULHQ_LANE_V1]], i32 1)
				// CHECK-NEXT: [[VQRDMULHQ_LANE_V3:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_LANE_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQRDMULHQ_LANE_V3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	int32x4_t test_vqrdmulhq_lane_s32(int32x4_t a, int32x2_t v) {			int32x4_t test_vqrdmulhq_lane_s32(int32x4_t a, int32x2_t v) {
	return vqrdmulhq_lane_s32(a, v, 1);			return vqrdmulhq_lane_s32(a, v, 1);
	}			}

	// CHECK-LABEL: @test_vmul_lane_f32(			// CHECK-LABEL: @test_vmul_lane_f32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x float> [[V:%.]], <2 x float> [[V]], <2 x i32> <i32 1, i32 1>			// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x float> [[V:%.]], <2 x float> [[V]], <2 x i32> <i32 1, i32 1>
	▲ Show 20 Lines • Show All 1,513 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: ret <2 x i64> [[VQDMULL_V2_I]]			// CHECK-NEXT: ret <2 x i64> [[VQDMULL_V2_I]]
	//			//
	int64x2_t test_vqdmull_high_laneq_s32_0(int32x4_t a, int32x4_t v) {			int64x2_t test_vqdmull_high_laneq_s32_0(int32x4_t a, int32x4_t v) {
	return vqdmull_high_laneq_s32(a, v, 0);			return vqdmull_high_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulh_lane_s16_0(			// CHECK-LABEL: @test_vqdmulh_lane_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANE_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.lane.v4i16(<4 x i16> [[VQDMULH_LANE_V]], <4 x i16> [[VQDMULH_LANE_V1]], i32 0)
				// CHECK-NEXT: [[VQDMULH_LANE_V3:%.*]] = bitcast <4 x i16> [[VQDMULH_LANE_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQDMULH_LANE_V3]] to <4 x i16>
				// CHECK-NEXT: ret <4 x i16> [[TMP2]]
	//			//
	int16x4_t test_vqdmulh_lane_s16_0(int16x4_t a, int16x4_t v) {			int16x4_t test_vqdmulh_lane_s16_0(int16x4_t a, int16x4_t v) {
	return vqdmulh_lane_s16(a, v, 0);			return vqdmulh_lane_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_lane_s16_0(			// CHECK-LABEL: @test_vqdmulhq_lane_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <8 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANE_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.lane.v8i16(<8 x i16> [[VQDMULHQ_LANE_V]], <4 x i16> [[VQDMULHQ_LANE_V1]], i32 0)
				// CHECK-NEXT: [[VQDMULHQ_LANE_V3:%.*]] = bitcast <8 x i16> [[VQDMULHQ_LANE_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQDMULHQ_LANE_V3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	int16x8_t test_vqdmulhq_lane_s16_0(int16x8_t a, int16x4_t v) {			int16x8_t test_vqdmulhq_lane_s16_0(int16x8_t a, int16x4_t v) {
	return vqdmulhq_lane_s16(a, v, 0);			return vqdmulhq_lane_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulh_lane_s32_0(			// CHECK-LABEL: @test_vqdmulh_lane_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <2 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANE_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.lane.v2i32(<2 x i32> [[VQDMULH_LANE_V]], <2 x i32> [[VQDMULH_LANE_V1]], i32 0)
				// CHECK-NEXT: [[VQDMULH_LANE_V3:%.*]] = bitcast <2 x i32> [[VQDMULH_LANE_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQDMULH_LANE_V3]] to <2 x i32>
				// CHECK-NEXT: ret <2 x i32> [[TMP2]]
	//			//
	int32x2_t test_vqdmulh_lane_s32_0(int32x2_t a, int32x2_t v) {			int32x2_t test_vqdmulh_lane_s32_0(int32x2_t a, int32x2_t v) {
	return vqdmulh_lane_s32(a, v, 0);			return vqdmulh_lane_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_lane_s32_0(			// CHECK-LABEL: @test_vqdmulhq_lane_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANE_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.lane.v4i32(<4 x i32> [[VQDMULHQ_LANE_V]], <2 x i32> [[VQDMULHQ_LANE_V1]], i32 0)
				// CHECK-NEXT: [[VQDMULHQ_LANE_V3:%.*]] = bitcast <4 x i32> [[VQDMULHQ_LANE_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQDMULHQ_LANE_V3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	int32x4_t test_vqdmulhq_lane_s32_0(int32x4_t a, int32x2_t v) {			int32x4_t test_vqdmulhq_lane_s32_0(int32x4_t a, int32x2_t v) {
	return vqdmulhq_lane_s32(a, v, 0);			return vqdmulhq_lane_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_lane_s16_0(			// CHECK-LABEL: @test_vqrdmulh_lane_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANE_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v4i16(<4 x i16> [[VQRDMULH_LANE_V]], <4 x i16> [[VQRDMULH_LANE_V1]], i32 0)
				// CHECK-NEXT: [[VQRDMULH_LANE_V3:%.*]] = bitcast <4 x i16> [[VQRDMULH_LANE_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQRDMULH_LANE_V3]] to <4 x i16>
				// CHECK-NEXT: ret <4 x i16> [[TMP2]]
	//			//
	int16x4_t test_vqrdmulh_lane_s16_0(int16x4_t a, int16x4_t v) {			int16x4_t test_vqrdmulh_lane_s16_0(int16x4_t a, int16x4_t v) {
	return vqrdmulh_lane_s16(a, v, 0);			return vqrdmulh_lane_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_lane_s16_0(			// CHECK-LABEL: @test_vqrdmulhq_lane_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <8 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i16> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANE_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v8i16(<8 x i16> [[VQRDMULHQ_LANE_V]], <4 x i16> [[VQRDMULHQ_LANE_V1]], i32 0)
				// CHECK-NEXT: [[VQRDMULHQ_LANE_V3:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_LANE_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQRDMULHQ_LANE_V3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	int16x8_t test_vqrdmulhq_lane_s16_0(int16x8_t a, int16x4_t v) {			int16x8_t test_vqrdmulhq_lane_s16_0(int16x8_t a, int16x4_t v) {
	return vqrdmulhq_lane_s16(a, v, 0);			return vqrdmulhq_lane_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_lane_s32_0(			// CHECK-LABEL: @test_vqrdmulh_lane_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <2 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANE_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANE_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32(<2 x i32> [[VQRDMULH_LANE_V]], <2 x i32> [[VQRDMULH_LANE_V1]], i32 0)
				// CHECK-NEXT: [[VQRDMULH_LANE_V3:%.*]] = bitcast <2 x i32> [[VQRDMULH_LANE_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQRDMULH_LANE_V3]] to <2 x i32>
				// CHECK-NEXT: ret <2 x i32> [[TMP2]]
	//			//
	int32x2_t test_vqrdmulh_lane_s32_0(int32x2_t a, int32x2_t v) {			int32x2_t test_vqrdmulh_lane_s32_0(int32x2_t a, int32x2_t v) {
	return vqrdmulh_lane_s32(a, v, 0);			return vqrdmulh_lane_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_lane_s32_0(			// CHECK-LABEL: @test_vqrdmulhq_lane_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <2 x i32> [[V:%.]] to <8 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANE_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANE_V1:%.*]] = bitcast <8 x i8> [[TMP1]] to <2 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANE_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32(<4 x i32> [[VQRDMULHQ_LANE_V]], <2 x i32> [[VQRDMULHQ_LANE_V1]], i32 0)
				// CHECK-NEXT: [[VQRDMULHQ_LANE_V3:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_LANE_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQRDMULHQ_LANE_V3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	int32x4_t test_vqrdmulhq_lane_s32_0(int32x4_t a, int32x2_t v) {			int32x4_t test_vqrdmulhq_lane_s32_0(int32x4_t a, int32x2_t v) {
	return vqrdmulhq_lane_s32(a, v, 0);			return vqrdmulhq_lane_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vmul_lane_f32_0(			// CHECK-LABEL: @test_vmul_lane_f32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x float> [[V:%.]], <2 x float> [[V]], <2 x i32> zeroinitializer			// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x float> [[V:%.]], <2 x float> [[V]], <2 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 1,574 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: ret <2 x i64> [[VQDMLSL_V3_I]]			// CHECK-NEXT: ret <2 x i64> [[VQDMLSL_V3_I]]
	//			//
	int64x2_t test_vqdmlsl_high_laneq_s32_0(int64x2_t a, int32x4_t b, int32x4_t v) {			int64x2_t test_vqdmlsl_high_laneq_s32_0(int64x2_t a, int32x4_t b, int32x4_t v) {
	return vqdmlsl_high_laneq_s32(a, b, v, 0);			return vqdmlsl_high_laneq_s32(a, b, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulh_laneq_s16_0(			// CHECK-LABEL: @test_vqdmulh_laneq_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANEQ_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16(<4 x i16> [[VQDMULH_LANEQ_V]], <8 x i16> [[VQDMULH_LANEQ_V1]], i32 0)
				// CHECK-NEXT: [[VQDMULH_LANEQ_V3:%.*]] = bitcast <4 x i16> [[VQDMULH_LANEQ_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQDMULH_LANEQ_V3]] to <4 x i16>
				// CHECK-NEXT: ret <4 x i16> [[TMP2]]
	//			//
	int16x4_t test_vqdmulh_laneq_s16_0(int16x4_t a, int16x8_t v) {			int16x4_t test_vqdmulh_laneq_s16_0(int16x4_t a, int16x8_t v) {
	return vqdmulh_laneq_s16(a, v, 0);			return vqdmulh_laneq_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_laneq_s16_0(			// CHECK-LABEL: @test_vqdmulhq_laneq_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <8 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16(<8 x i16> [[VQDMULHQ_LANEQ_V]], <8 x i16> [[VQDMULHQ_LANEQ_V1]], i32 0)
				// CHECK-NEXT: [[VQDMULHQ_LANEQ_V3:%.*]] = bitcast <8 x i16> [[VQDMULHQ_LANEQ_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQDMULHQ_LANEQ_V3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	int16x8_t test_vqdmulhq_laneq_s16_0(int16x8_t a, int16x8_t v) {			int16x8_t test_vqdmulhq_laneq_s16_0(int16x8_t a, int16x8_t v) {
	return vqdmulhq_laneq_s16(a, v, 0);			return vqdmulhq_laneq_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulh_laneq_s32_0(			// CHECK-LABEL: @test_vqdmulh_laneq_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <2 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANEQ_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32(<2 x i32> [[VQDMULH_LANEQ_V]], <4 x i32> [[VQDMULH_LANEQ_V1]], i32 0)
				// CHECK-NEXT: [[VQDMULH_LANEQ_V3:%.*]] = bitcast <2 x i32> [[VQDMULH_LANEQ_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQDMULH_LANEQ_V3]] to <2 x i32>
				// CHECK-NEXT: ret <2 x i32> [[TMP2]]
	//			//
	int32x2_t test_vqdmulh_laneq_s32_0(int32x2_t a, int32x4_t v) {			int32x2_t test_vqdmulh_laneq_s32_0(int32x2_t a, int32x4_t v) {
	return vqdmulh_laneq_s32(a, v, 0);			return vqdmulh_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_laneq_s32_0(			// CHECK-LABEL: @test_vqdmulhq_laneq_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32(<4 x i32> [[VQDMULHQ_LANEQ_V]], <4 x i32> [[VQDMULHQ_LANEQ_V1]], i32 0)
				// CHECK-NEXT: [[VQDMULHQ_LANEQ_V3:%.*]] = bitcast <4 x i32> [[VQDMULHQ_LANEQ_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQDMULHQ_LANEQ_V3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	int32x4_t test_vqdmulhq_laneq_s32_0(int32x4_t a, int32x4_t v) {			int32x4_t test_vqdmulhq_laneq_s32_0(int32x4_t a, int32x4_t v) {
	return vqdmulhq_laneq_s32(a, v, 0);			return vqdmulhq_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_laneq_s16_0(			// CHECK-LABEL: @test_vqrdmulh_laneq_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANEQ_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16(<4 x i16> [[VQRDMULH_LANEQ_V]], <8 x i16> [[VQRDMULH_LANEQ_V1]], i32 0)
				// CHECK-NEXT: [[VQRDMULH_LANEQ_V3:%.*]] = bitcast <4 x i16> [[VQRDMULH_LANEQ_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQRDMULH_LANEQ_V3]] to <4 x i16>
				// CHECK-NEXT: ret <4 x i16> [[TMP2]]
	//			//
	int16x4_t test_vqrdmulh_laneq_s16_0(int16x4_t a, int16x8_t v) {			int16x4_t test_vqrdmulh_laneq_s16_0(int16x4_t a, int16x8_t v) {
	return vqrdmulh_laneq_s16(a, v, 0);			return vqrdmulh_laneq_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_laneq_s16_0(			// CHECK-LABEL: @test_vqrdmulhq_laneq_s16_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <8 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16(<8 x i16> [[VQRDMULHQ_LANEQ_V]], <8 x i16> [[VQRDMULHQ_LANEQ_V1]], i32 0)
				// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V3:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_LANEQ_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQRDMULHQ_LANEQ_V3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	int16x8_t test_vqrdmulhq_laneq_s16_0(int16x8_t a, int16x8_t v) {			int16x8_t test_vqrdmulhq_laneq_s16_0(int16x8_t a, int16x8_t v) {
	return vqrdmulhq_laneq_s16(a, v, 0);			return vqrdmulhq_laneq_s16(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_laneq_s32_0(			// CHECK-LABEL: @test_vqrdmulh_laneq_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <2 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANEQ_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32(<2 x i32> [[VQRDMULH_LANEQ_V]], <4 x i32> [[VQRDMULH_LANEQ_V1]], i32 0)
				// CHECK-NEXT: [[VQRDMULH_LANEQ_V3:%.*]] = bitcast <2 x i32> [[VQRDMULH_LANEQ_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQRDMULH_LANEQ_V3]] to <2 x i32>
				// CHECK-NEXT: ret <2 x i32> [[TMP2]]
	//			//
	int32x2_t test_vqrdmulh_laneq_s32_0(int32x2_t a, int32x4_t v) {			int32x2_t test_vqrdmulh_laneq_s32_0(int32x2_t a, int32x4_t v) {
	return vqrdmulh_laneq_s32(a, v, 0);			return vqrdmulh_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_laneq_s32_0(			// CHECK-LABEL: @test_vqrdmulhq_laneq_s32_0(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <4 x i32> zeroinitializer
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32(<4 x i32> [[VQRDMULHQ_LANEQ_V]], <4 x i32> [[VQRDMULHQ_LANEQ_V1]], i32 0)
				// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V3:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_LANEQ_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQRDMULHQ_LANEQ_V3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	int32x4_t test_vqrdmulhq_laneq_s32_0(int32x4_t a, int32x4_t v) {			int32x4_t test_vqrdmulhq_laneq_s32_0(int32x4_t a, int32x4_t v) {
	return vqrdmulhq_laneq_s32(a, v, 0);			return vqrdmulhq_laneq_s32(a, v, 0);
	}			}

	// CHECK-LABEL: @test_vmla_lane_u16(			// CHECK-LABEL: @test_vmla_lane_u16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>			// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i16> [[V:%.]], <4 x i16> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: ret <2 x i64> [[VQDMLSL_V3_I]]			// CHECK-NEXT: ret <2 x i64> [[VQDMLSL_V3_I]]
	//			//
	int64x2_t test_vqdmlsl_high_laneq_s32(int64x2_t a, int32x4_t b, int32x4_t v) {			int64x2_t test_vqdmlsl_high_laneq_s32(int64x2_t a, int32x4_t b, int32x4_t v) {
	return vqdmlsl_high_laneq_s32(a, b, v, 3);			return vqdmlsl_high_laneq_s32(a, b, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulh_laneq_s16(			// CHECK-LABEL: @test_vqdmulh_laneq_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <4 x i32> <i32 7, i32 7, i32 7, i32 7>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANEQ_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16(<4 x i16> [[VQDMULH_LANEQ_V]], <8 x i16> [[VQDMULH_LANEQ_V1]], i32 7)
				// CHECK-NEXT: [[VQDMULH_LANEQ_V3:%.*]] = bitcast <4 x i16> [[VQDMULH_LANEQ_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQDMULH_LANEQ_V3]] to <4 x i16>
				// CHECK-NEXT: ret <4 x i16> [[TMP2]]
	//			//
	int16x4_t test_vqdmulh_laneq_s16(int16x4_t a, int16x8_t v) {			int16x4_t test_vqdmulh_laneq_s16(int16x4_t a, int16x8_t v) {
	return vqdmulh_laneq_s16(a, v, 7);			return vqdmulh_laneq_s16(a, v, 7);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_laneq_s16(			// CHECK-LABEL: @test_vqdmulhq_laneq_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16(<8 x i16> [[VQDMULHQ_LANEQ_V]], <8 x i16> [[VQDMULHQ_LANEQ_V1]], i32 7)
				// CHECK-NEXT: [[VQDMULHQ_LANEQ_V3:%.*]] = bitcast <8 x i16> [[VQDMULHQ_LANEQ_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQDMULHQ_LANEQ_V3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	int16x8_t test_vqdmulhq_laneq_s16(int16x8_t a, int16x8_t v) {			int16x8_t test_vqdmulhq_laneq_s16(int16x8_t a, int16x8_t v) {
	return vqdmulhq_laneq_s16(a, v, 7);			return vqdmulhq_laneq_s16(a, v, 7);
	}			}

	// CHECK-LABEL: @test_vqdmulh_laneq_s32(			// CHECK-LABEL: @test_vqdmulh_laneq_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <2 x i32> <i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQDMULH_V2_I]]			// CHECK-NEXT: [[VQDMULH_LANEQ_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32(<2 x i32> [[VQDMULH_LANEQ_V]], <4 x i32> [[VQDMULH_LANEQ_V1]], i32 3)
				// CHECK-NEXT: [[VQDMULH_LANEQ_V3:%.*]] = bitcast <2 x i32> [[VQDMULH_LANEQ_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQDMULH_LANEQ_V3]] to <2 x i32>
				// CHECK-NEXT: ret <2 x i32> [[TMP2]]
	//			//
	int32x2_t test_vqdmulh_laneq_s32(int32x2_t a, int32x4_t v) {			int32x2_t test_vqdmulh_laneq_s32(int32x2_t a, int32x4_t v) {
	return vqdmulh_laneq_s32(a, v, 3);			return vqdmulh_laneq_s32(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqdmulhq_laneq_s32(			// CHECK-LABEL: @test_vqdmulhq_laneq_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQDMULHQ_V2_I]]			// CHECK-NEXT: [[VQDMULHQ_LANEQ_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32(<4 x i32> [[VQDMULHQ_LANEQ_V]], <4 x i32> [[VQDMULHQ_LANEQ_V1]], i32 3)
				// CHECK-NEXT: [[VQDMULHQ_LANEQ_V3:%.*]] = bitcast <4 x i32> [[VQDMULHQ_LANEQ_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQDMULHQ_LANEQ_V3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	int32x4_t test_vqdmulhq_laneq_s32(int32x4_t a, int32x4_t v) {			int32x4_t test_vqdmulhq_laneq_s32(int32x4_t a, int32x4_t v) {
	return vqdmulhq_laneq_s32(a, v, 3);			return vqdmulhq_laneq_s32(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_laneq_s16(			// CHECK-LABEL: @test_vqrdmulh_laneq_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <4 x i32> <i32 7, i32 7, i32 7, i32 7>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i16> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i16> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> [[A]], <4 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x i16>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <4 x i16> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <4 x i16> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANEQ_V2:%.*]] = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16(<4 x i16> [[VQRDMULH_LANEQ_V]], <8 x i16> [[VQRDMULH_LANEQ_V1]], i32 7)
				// CHECK-NEXT: [[VQRDMULH_LANEQ_V3:%.*]] = bitcast <4 x i16> [[VQRDMULH_LANEQ_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQRDMULH_LANEQ_V3]] to <4 x i16>
				// CHECK-NEXT: ret <4 x i16> [[TMP2]]
	//			//
	int16x4_t test_vqrdmulh_laneq_s16(int16x4_t a, int16x8_t v) {			int16x4_t test_vqrdmulh_laneq_s16(int16x4_t a, int16x8_t v) {
	return vqrdmulh_laneq_s16(a, v, 7);			return vqrdmulh_laneq_s16(a, v, 7);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_laneq_s16(			// CHECK-LABEL: @test_vqrdmulhq_laneq_s16(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x i16> [[V:%.]], <8 x i16> [[V]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <8 x i16> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <8 x i16> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <8 x i16> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> [[A]], <8 x i16> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x i16>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x i16>
	// CHECK-NEXT: ret <8 x i16> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V2:%.*]] = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16(<8 x i16> [[VQRDMULHQ_LANEQ_V]], <8 x i16> [[VQRDMULHQ_LANEQ_V1]], i32 7)
				// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V3:%.*]] = bitcast <8 x i16> [[VQRDMULHQ_LANEQ_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQRDMULHQ_LANEQ_V3]] to <8 x i16>
				// CHECK-NEXT: ret <8 x i16> [[TMP2]]
	//			//
	int16x8_t test_vqrdmulhq_laneq_s16(int16x8_t a, int16x8_t v) {			int16x8_t test_vqrdmulhq_laneq_s16(int16x8_t a, int16x8_t v) {
	return vqrdmulhq_laneq_s16(a, v, 7);			return vqrdmulhq_laneq_s16(a, v, 7);
	}			}

	// CHECK-LABEL: @test_vqrdmulh_laneq_s32(			// CHECK-LABEL: @test_vqrdmulh_laneq_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <2 x i32> <i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i32> [[A:%.]] to <8 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i32> [[SHUFFLE]] to <8 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULH_V2_I:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> [[A]], <2 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULH_LANEQ_V:%.*]] = bitcast <8 x i8> [[TMP0]] to <2 x i32>
	// CHECK-NEXT: [[VQRDMULH_V3_I:%.*]] = bitcast <2 x i32> [[VQRDMULH_V2_I]] to <8 x i8>			// CHECK-NEXT: [[VQRDMULH_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <2 x i32> [[VQRDMULH_V2_I]]			// CHECK-NEXT: [[VQRDMULH_LANEQ_V2:%.*]] = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32(<2 x i32> [[VQRDMULH_LANEQ_V]], <4 x i32> [[VQRDMULH_LANEQ_V1]], i32 3)
				// CHECK-NEXT: [[VQRDMULH_LANEQ_V3:%.*]] = bitcast <2 x i32> [[VQRDMULH_LANEQ_V2]] to <8 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <8 x i8> [[VQRDMULH_LANEQ_V3]] to <2 x i32>
				// CHECK-NEXT: ret <2 x i32> [[TMP2]]
	//			//
	int32x2_t test_vqrdmulh_laneq_s32(int32x2_t a, int32x4_t v) {			int32x2_t test_vqrdmulh_laneq_s32(int32x2_t a, int32x4_t v) {
	return vqrdmulh_laneq_s32(a, v, 3);			return vqrdmulh_laneq_s32(a, v, 3);
	}			}

	// CHECK-LABEL: @test_vqrdmulhq_laneq_s32(			// CHECK-LABEL: @test_vqrdmulhq_laneq_s32(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V:%.]], <4 x i32> [[V]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>			// CHECK-NEXT: [[TMP0:%.]] = bitcast <4 x i32> [[A:%.]] to <16 x i8>
	// CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[SHUFFLE]] to <16 x i8>			// CHECK-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[V:%.]] to <16 x i8>
	// CHECK-NEXT: [[VQRDMULHQ_V2_I:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> [[A]], <4 x i32> [[SHUFFLE]]) #4			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V:%.*]] = bitcast <16 x i8> [[TMP0]] to <4 x i32>
	// CHECK-NEXT: [[VQRDMULHQ_V3_I:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_V2_I]] to <16 x i8>			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V1:%.*]] = bitcast <16 x i8> [[TMP1]] to <4 x i32>
	// CHECK-NEXT: ret <4 x i32> [[VQRDMULHQ_V2_I]]			// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V2:%.*]] = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32(<4 x i32> [[VQRDMULHQ_LANEQ_V]], <4 x i32> [[VQRDMULHQ_LANEQ_V1]], i32 3)
				// CHECK-NEXT: [[VQRDMULHQ_LANEQ_V3:%.*]] = bitcast <4 x i32> [[VQRDMULHQ_LANEQ_V2]] to <16 x i8>
				// CHECK-NEXT: [[TMP2:%.*]] = bitcast <16 x i8> [[VQRDMULHQ_LANEQ_V3]] to <4 x i32>
				// CHECK-NEXT: ret <4 x i32> [[TMP2]]
	//			//
	int32x4_t test_vqrdmulhq_laneq_s32(int32x4_t a, int32x4_t v) {			int32x4_t test_vqrdmulhq_laneq_s32(int32x4_t a, int32x4_t v) {
	return vqrdmulhq_laneq_s32(a, v, 3);			return vqrdmulhq_laneq_s32(a, v, 3);
	}			}

llvm/include/llvm/IR/Intrinsics.h

Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	#undef GET_INTRINSIC_ENUM_VALUES
/// intrinsic. This is returned by getIntrinsicInfoTableEntries.		/// intrinsic. This is returned by getIntrinsicInfoTableEntries.
struct IITDescriptor {		struct IITDescriptor {
enum IITDescriptorKind {		enum IITDescriptorKind {
Void, VarArg, MMX, Token, Metadata, Half, Float, Double, Quad,		Void, VarArg, MMX, Token, Metadata, Half, Float, Double, Quad,
Integer, Vector, Pointer, Struct,		Integer, Vector, Pointer, Struct,
Argument, ExtendArgument, TruncArgument, HalfVecArgument,		Argument, ExtendArgument, TruncArgument, HalfVecArgument,
SameVecWidthArgument, PtrToArgument, PtrToElt, VecOfAnyPtrsToElt,		SameVecWidthArgument, PtrToArgument, PtrToElt, VecOfAnyPtrsToElt,
VecElementArgument, ScalableVecArgument, Subdivide2Argument,		VecElementArgument, ScalableVecArgument, Subdivide2Argument,
Subdivide4Argument, VecOfBitcastsToInt		Subdivide4Argument, VecOfBitcastsToInt, WideVec, NarrowVec
} Kind;		} Kind;

union {		union {
unsigned Integer_Width;		unsigned Integer_Width;
unsigned Float_Width;		unsigned Float_Width;
unsigned Vector_Width;		unsigned Vector_Width;
unsigned Pointer_AddressSpace;		unsigned Pointer_AddressSpace;
unsigned Struct_NumElements;		unsigned Struct_NumElements;
Show All 10 Lines	struct IITDescriptor {
};		};

unsigned getArgumentNumber() const {		unsigned getArgumentNumber() const {
assert(Kind == Argument \|\| Kind == ExtendArgument \|\|		assert(Kind == Argument \|\| Kind == ExtendArgument \|\|
Kind == TruncArgument \|\| Kind == HalfVecArgument \|\|		Kind == TruncArgument \|\| Kind == HalfVecArgument \|\|
Kind == SameVecWidthArgument \|\| Kind == PtrToArgument \|\|		Kind == SameVecWidthArgument \|\| Kind == PtrToArgument \|\|
Kind == PtrToElt \|\| Kind == VecElementArgument \|\|		Kind == PtrToElt \|\| Kind == VecElementArgument \|\|
Kind == Subdivide2Argument \|\| Kind == Subdivide4Argument \|\|		Kind == Subdivide2Argument \|\| Kind == Subdivide4Argument \|\|
Kind == VecOfBitcastsToInt);		Kind == VecOfBitcastsToInt \|\| Kind == WideVec \|\|
		Kind == NarrowVec);
return Argument_Info >> 3;		return Argument_Info >> 3;
}		}
ArgKind getArgumentKind() const {		ArgKind getArgumentKind() const {
assert(Kind == Argument \|\| Kind == ExtendArgument \|\|		assert(Kind == Argument \|\| Kind == ExtendArgument \|\|
Kind == TruncArgument \|\| Kind == HalfVecArgument \|\|		Kind == TruncArgument \|\| Kind == HalfVecArgument \|\|
Kind == SameVecWidthArgument \|\| Kind == PtrToArgument \|\|		Kind == SameVecWidthArgument \|\| Kind == PtrToArgument \|\|
Kind == VecElementArgument \|\| Kind == Subdivide2Argument \|\|		Kind == VecElementArgument \|\| Kind == Subdivide2Argument \|\|
Kind == Subdivide4Argument \|\| Kind == VecOfBitcastsToInt);		Kind == Subdivide4Argument \|\| Kind == VecOfBitcastsToInt);
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	class LLVMScalarOrSameVectorWidth<int idx, LLVMType elty>
ValueType ElTy = elty.VT;		ValueType ElTy = elty.VT;
}		}

class LLVMPointerTo<int num> : LLVMMatchType<num>;		class LLVMPointerTo<int num> : LLVMMatchType<num>;
class LLVMPointerToElt<int num> : LLVMMatchType<num>;		class LLVMPointerToElt<int num> : LLVMMatchType<num>;
class LLVMVectorOfAnyPointersToElt<int num> : LLVMMatchType<num>;		class LLVMVectorOfAnyPointersToElt<int num> : LLVMMatchType<num>;
class LLVMVectorElementType<int num> : LLVMMatchType<num>;		class LLVMVectorElementType<int num> : LLVMMatchType<num>;

		class LLVMNarrowType<int num> : LLVMMatchType<num>;
		class LLVMWideType<int num> : LLVMMatchType<num>;

// Match the type of another intrinsic parameter that is expected to be a		// Match the type of another intrinsic parameter that is expected to be a
// vector type, but change the element count to be half as many		// vector type, but change the element count to be half as many
class LLVMHalfElementsVectorType<int num> : LLVMMatchType<num>;		class LLVMHalfElementsVectorType<int num> : LLVMMatchType<num>;

// Match the type of another intrinsic parameter that is expected to be a		// Match the type of another intrinsic parameter that is expected to be a
// vector type (i.e. <N x iM>) but with each element subdivided to		// vector type (i.e. <N x iM>) but with each element subdivided to
// form a vector with more elements that are smaller than the original.		// form a vector with more elements that are smaller than the original.
class LLVMSubdivide2VectorType<int num> : LLVMMatchType<num>;		class LLVMSubdivide2VectorType<int num> : LLVMMatchType<num>;
▲ Show 20 Lines • Show All 1,115 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".
class AdvSIMD_2VectorArg_Scalar_Wide_Intrinsic		class AdvSIMD_2VectorArg_Scalar_Wide_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMTruncatedType<0>, llvm_i32_ty],		[LLVMTruncatedType<0>, llvm_i32_ty],
[IntrNoMem]>;		[IntrNoMem]>;
class AdvSIMD_2VectorArg_Tied_Narrow_Intrinsic		class AdvSIMD_2VectorArg_Tied_Narrow_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMHalfElementsVectorType<0>, llvm_anyvector_ty],		[LLVMHalfElementsVectorType<0>, llvm_anyvector_ty],
[IntrNoMem]>;		[IntrNoMem]>;
		class AdvSIMD_2VectorArg_Narrow_Lane_Intrinsic
		: Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>, LLVMNarrowType<0>, llvm_i32_ty],
		[IntrNoMem]>;
		class AdvSIMD_2VectorArg_Wide_Lane_Intrinsic
		: Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>, LLVMWideType<0>, llvm_i32_ty],
		[IntrNoMem]>;

class AdvSIMD_3VectorArg_Intrinsic		class AdvSIMD_3VectorArg_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],		[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
[IntrNoMem]>;		[IntrNoMem]>;
class AdvSIMD_3VectorArg_Scalar_Intrinsic		class AdvSIMD_3VectorArg_Scalar_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],		[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	let TargetPrefix = "aarch64", IntrProperties = [IntrNoMem] in {
// header is no longer supported.		// header is no longer supported.
def int_aarch64_neon_addhn : AdvSIMD_2VectorArg_Narrow_Intrinsic;		def int_aarch64_neon_addhn : AdvSIMD_2VectorArg_Narrow_Intrinsic;

// Vector Rounding Add High-Half		// Vector Rounding Add High-Half
def int_aarch64_neon_raddhn : AdvSIMD_2VectorArg_Narrow_Intrinsic;		def int_aarch64_neon_raddhn : AdvSIMD_2VectorArg_Narrow_Intrinsic;

// Vector Saturating Doubling Multiply High		// Vector Saturating Doubling Multiply High
def int_aarch64_neon_sqdmulh : AdvSIMD_2IntArg_Intrinsic;		def int_aarch64_neon_sqdmulh : AdvSIMD_2IntArg_Intrinsic;
		def int_aarch64_neon_sqdmulh_lane : AdvSIMD_2VectorArg_Narrow_Lane_Intrinsic;
		def int_aarch64_neon_sqdmulh_laneq : AdvSIMD_2VectorArg_Wide_Lane_Intrinsic;

// Vector Saturating Rounding Doubling Multiply High		// Vector Saturating Rounding Doubling Multiply High
def int_aarch64_neon_sqrdmulh : AdvSIMD_2IntArg_Intrinsic;		def int_aarch64_neon_sqrdmulh : AdvSIMD_2IntArg_Intrinsic;
		def int_aarch64_neon_sqrdmulh_lane : AdvSIMD_2VectorArg_Narrow_Lane_Intrinsic;
		def int_aarch64_neon_sqrdmulh_laneq : AdvSIMD_2VectorArg_Wide_Lane_Intrinsic;

// Vector Polynominal Multiply		// Vector Polynominal Multiply
def int_aarch64_neon_pmul : AdvSIMD_2VectorArg_Intrinsic;		def int_aarch64_neon_pmul : AdvSIMD_2VectorArg_Intrinsic;

// Vector Long Multiply		// Vector Long Multiply
def int_aarch64_neon_smull : AdvSIMD_2VectorArg_Long_Intrinsic;		def int_aarch64_neon_smull : AdvSIMD_2VectorArg_Long_Intrinsic;
def int_aarch64_neon_umull : AdvSIMD_2VectorArg_Long_Intrinsic;		def int_aarch64_neon_umull : AdvSIMD_2VectorArg_Long_Intrinsic;
def int_aarch64_neon_pmull : AdvSIMD_2VectorArg_Long_Intrinsic;		def int_aarch64_neon_pmull : AdvSIMD_2VectorArg_Long_Intrinsic;
▲ Show 20 Lines • Show All 1,192 Lines • Show Last 20 Lines

llvm/lib/IR/Function.cpp

Show First 20 Lines • Show All 701 Lines • ▼ Show 20 Lines	enum IIT_Info {
IIT_STRUCT6 = 38,		IIT_STRUCT6 = 38,
IIT_STRUCT7 = 39,		IIT_STRUCT7 = 39,
IIT_STRUCT8 = 40,		IIT_STRUCT8 = 40,
IIT_F128 = 41,		IIT_F128 = 41,
IIT_VEC_ELEMENT = 42,		IIT_VEC_ELEMENT = 42,
IIT_SCALABLE_VEC = 43,		IIT_SCALABLE_VEC = 43,
IIT_SUBDIVIDE2_ARG = 44,		IIT_SUBDIVIDE2_ARG = 44,
IIT_SUBDIVIDE4_ARG = 45,		IIT_SUBDIVIDE4_ARG = 45,
IIT_VEC_OF_BITCASTS_TO_INT = 46		IIT_VEC_OF_BITCASTS_TO_INT = 46,
		IIT_NARROW_VEC = 47,
		IIT_WIDE_VEC = 48
};		};

static void DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,		static void DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,
SmallVectorImpl<Intrinsic::IITDescriptor> &OutputTable) {		SmallVectorImpl<Intrinsic::IITDescriptor> &OutputTable) {
using namespace Intrinsic;		using namespace Intrinsic;

IIT_Info Info = IIT_Info(Infos[NextElt++]);		IIT_Info Info = IIT_Info(Infos[NextElt++]);
unsigned StructElts = 2;		unsigned StructElts = 2;
▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines	case IIT_SCALABLE_VEC: {
return;		return;
}		}
case IIT_VEC_OF_BITCASTS_TO_INT: {		case IIT_VEC_OF_BITCASTS_TO_INT: {
unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);		unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
OutputTable.push_back(IITDescriptor::get(IITDescriptor::VecOfBitcastsToInt,		OutputTable.push_back(IITDescriptor::get(IITDescriptor::VecOfBitcastsToInt,
ArgInfo));		ArgInfo));
return;		return;
}		}
		case IIT_NARROW_VEC: {
		unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
		OutputTable.push_back(
		IITDescriptor::get(IITDescriptor::NarrowVec, ArgInfo));
		return;
		}
		case IIT_WIDE_VEC: {
		unsigned ArgInfo = (NextElt == Infos.size() ? 0 : Infos[NextElt++]);
		OutputTable.push_back(IITDescriptor::get(IITDescriptor::WideVec, ArgInfo));
		return;
		}
}		}
llvm_unreachable("unhandled");		llvm_unreachable("unhandled");
}		}

#define GET_INTRINSIC_GENERATOR_GLOBAL		#define GET_INTRINSIC_GENERATOR_GLOBAL
#include "llvm/IR/IntrinsicImpl.inc"		#include "llvm/IR/IntrinsicImpl.inc"
#undef GET_INTRINSIC_GENERATOR_GLOBAL		#undef GET_INTRINSIC_GENERATOR_GLOBAL

▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	static Type *DecodeFixedType(ArrayRef<Intrinsic::IITDescriptor> &Infos,
case IITDescriptor::VecOfAnyPtrsToElt:		case IITDescriptor::VecOfAnyPtrsToElt:
// Return the overloaded type (which determines the pointers address space)		// Return the overloaded type (which determines the pointers address space)
return Tys[D.getOverloadArgNumber()];		return Tys[D.getOverloadArgNumber()];
case IITDescriptor::ScalableVecArgument: {		case IITDescriptor::ScalableVecArgument: {
Type *Ty = DecodeFixedType(Infos, Tys, Context);		Type *Ty = DecodeFixedType(Infos, Tys, Context);
return VectorType::get(Ty->getVectorElementType(),		return VectorType::get(Ty->getVectorElementType(),
{ Ty->getVectorNumElements(), true });		{ Ty->getVectorNumElements(), true });
}		}
		case IITDescriptor::NarrowVec: {
		Type *Ty = Tys[D.getArgumentNumber()];
		if (VectorType *VTy = dyn_cast<VectorType>(Ty)) {
		Type *ElTy = VTy->getElementType();
		return VectorType::get(ElTy, 64 / ElTy->getIntegerBitWidth());
		}
		llvm_unreachable("Expected an argument of Vector Type");
		}
		case IITDescriptor::WideVec: {
		Type *Ty = Tys[D.getArgumentNumber()];
		if (VectorType *VTy = dyn_cast<VectorType>(Ty)) {
		Type *ElTy = VTy->getElementType();
		return VectorType::get(ElTy, 128 / ElTy->getIntegerBitWidth());
		}
		llvm_unreachable("Expected an argument of Vector Type");
		}
}		}
llvm_unreachable("unhandled");		llvm_unreachable("unhandled");
}		}

FunctionType *Intrinsic::getType(LLVMContext &Context,		FunctionType *Intrinsic::getType(LLVMContext &Context,
ID id, ArrayRef<Type*> Tys) {		ID id, ArrayRef<Type*> Tys) {
SmallVector<IITDescriptor, 8> Table;		SmallVector<IITDescriptor, 8> Table;
getIntrinsicInfoTableEntries(id, Table);		getIntrinsicInfoTableEntries(id, Table);
▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	case IITDescriptor::VecOfBitcastsToInt: {
if (D.getArgumentNumber() >= ArgTys.size())		if (D.getArgumentNumber() >= ArgTys.size())
return IsDeferredCheck \|\| DeferCheck(Ty);		return IsDeferredCheck \|\| DeferCheck(Ty);
auto *ReferenceType = dyn_cast<VectorType>(ArgTys[D.getArgumentNumber()]);		auto *ReferenceType = dyn_cast<VectorType>(ArgTys[D.getArgumentNumber()]);
auto *ThisArgVecTy = dyn_cast<VectorType>(Ty);		auto *ThisArgVecTy = dyn_cast<VectorType>(Ty);
if (!ThisArgVecTy \|\| !ReferenceType)		if (!ThisArgVecTy \|\| !ReferenceType)
return true;		return true;
return ThisArgVecTy != VectorType::getInteger(ReferenceType);		return ThisArgVecTy != VectorType::getInteger(ReferenceType);
}		}
		case IITDescriptor::NarrowVec: {
		if (D.getArgumentNumber() >= ArgTys.size())
		return IsDeferredCheck \|\| DeferCheck(Ty);

		auto *VTy = dyn_cast<VectorType>(Ty);
		auto *RefVTy = dyn_cast<VectorType>(ArgTys[D.getArgumentNumber()]);
		if (!VTy \|\| !RefVTy \|\| VTy->getBitWidth() != 64)
		efriedmaUnsubmitted Not Done Reply Inline Actions Hardcoding "64" and "128" in target-independent code here seems like a bad idea. Can we just let both vector operands have any vector type, and reject in the backend if we see an unexpected type? efriedma: Hardcoding "64" and "128" in target-independent code here seems like a bad idea. Can we just…
		sanwou01AuthorUnsubmitted Done Reply Inline Actions Makes sense. Any type vector for both operands is certainly doable. Instruction selection will fail if you try to use a non-existent intrinsic, which is not the nicest failure mode, but probably good enough for intrinsics? Emitting the correct arm_neon.h for clang is a little less trivial, but not by too much. sanwou01: Makes sense. Any type vector for both operands is certainly doable. Instruction selection will…
		return true;

		return VTy->getElementType() != RefVTy->getElementType();
		}
		case IITDescriptor::WideVec: {
		if (D.getArgumentNumber() >= ArgTys.size())
		return IsDeferredCheck \|\| DeferCheck(Ty);

		auto *VTy = dyn_cast<VectorType>(Ty);
		auto *RefVTy = dyn_cast<VectorType>(ArgTys[D.getArgumentNumber()]);
		if (!VTy \|\| !RefVTy \|\| VTy->getBitWidth() != 128)
		return true;

		return VTy->getElementType() != RefVTy->getElementType();
		}
}		}
llvm_unreachable("unhandled");		llvm_unreachable("unhandled");
}		}

Intrinsic::MatchIntrinsicTypesResult		Intrinsic::MatchIntrinsicTypesResult
Intrinsic::matchIntrinsicSignature(FunctionType *FTy,		Intrinsic::matchIntrinsicSignature(FunctionType *FTy,
ArrayRef<Intrinsic::IITDescriptor> &Infos,		ArrayRef<Intrinsic::IITDescriptor> &Infos,
SmallVectorImpl<Type *> &ArgTys) {		SmallVectorImpl<Type *> &ArgTys) {
▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,044 Lines • ▼ Show 20 Lines	case 'w':
break;		break;
// The instructions that this constraint is designed for can		// The instructions that this constraint is designed for can
// only take 128-bit registers so just use that regclass.		// only take 128-bit registers so just use that regclass.
case 'x':		case 'x':
if (!Subtarget->hasFPARMv8())		if (!Subtarget->hasFPARMv8())
break;		break;
if (VT.isScalableVector())		if (VT.isScalableVector())
return std::make_pair(0U, &AArch64::ZPR_4bRegClass);		return std::make_pair(0U, &AArch64::ZPR_4bRegClass);
		if (VT.getSizeInBits() == 64)
		return std::make_pair(0U, &AArch64::FPR64_loRegClass);
		efriedmaUnsubmitted Not Done Reply Inline Actions Is this related somehow? efriedma: Is this related somehow?
		sanwou01AuthorUnsubmitted Done Reply Inline Actions This popped up when I was looking for uses of FPR128_loRegClass; it made sense to do the same for FPR64_lo. Doesn't seem essential though, so I'm happy to leave this out. sanwou01: This popped up when I was looking for uses of FPR128_loRegClass; it made sense to do the same…
if (VT.getSizeInBits() == 128)		if (VT.getSizeInBits() == 128)
return std::make_pair(0U, &AArch64::FPR128_loRegClass);		return std::make_pair(0U, &AArch64::FPR128_loRegClass);
break;		break;
case 'y':		case 'y':
if (!Subtarget->hasFPARMv8())		if (!Subtarget->hasFPARMv8())
break;		break;
if (VT.isScalableVector())		if (VT.isScalableVector())
return std::make_pair(0U, &AArch64::ZPR_3bRegClass);		return std::make_pair(0U, &AArch64::ZPR_3bRegClass);
▲ Show 20 Lines • Show All 6,762 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines
def am_indexed7s16 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S16", []>;		def am_indexed7s16 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S16", []>;
def am_indexed7s32 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S32", []>;		def am_indexed7s32 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S32", []>;
def am_indexed7s64 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S64", []>;		def am_indexed7s64 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S64", []>;
def am_indexed7s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S128", []>;		def am_indexed7s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexed7S128", []>;

def am_indexedu6s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexedU6S128", []>;		def am_indexedu6s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexedU6S128", []>;
def am_indexeds9s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexedS9S128", []>;		def am_indexeds9s128 : ComplexPattern<i64, 2, "SelectAddrModeIndexedS9S128", []>;

		def UImmS1XForm : SDNodeXForm<imm, [{
		return CurDAG->getTargetConstant(N->getZExtValue(), SDLoc(N), MVT::i64);
		}]>;
def UImmS2XForm : SDNodeXForm<imm, [{		def UImmS2XForm : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() / 2, SDLoc(N), MVT::i64);		return CurDAG->getTargetConstant(N->getZExtValue() / 2, SDLoc(N), MVT::i64);
}]>;		}]>;
def UImmS4XForm : SDNodeXForm<imm, [{		def UImmS4XForm : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() / 4, SDLoc(N), MVT::i64);		return CurDAG->getTargetConstant(N->getZExtValue() / 4, SDLoc(N), MVT::i64);
}]>;		}]>;
def UImmS8XForm : SDNodeXForm<imm, [{		def UImmS8XForm : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(N->getZExtValue() / 8, SDLoc(N), MVT::i64);		return CurDAG->getTargetConstant(N->getZExtValue() / 8, SDLoc(N), MVT::i64);
▲ Show 20 Lines • Show All 7,495 Lines • ▼ Show 20 Lines	def v1i64_indexed : BaseSIMDIndexedTied<1, U, 1, 0b11, opc,
FPR64Op, FPR64Op, V128, VectorIndexD,		FPR64Op, FPR64Op, V128, VectorIndexD,
asm, ".d", "", "", ".d", []> {		asm, ".d", "", "", ".d", []> {
bits<1> idx;		bits<1> idx;
let Inst{11} = idx{0};		let Inst{11} = idx{0};
let Inst{21} = 0;		let Inst{21} = 0;
}		}
}		}

		multiclass SIMDIndexedHSPatterns<SDPatternOperator OpNodeLane,
		SDPatternOperator OpNodeLaneQ> {

		def : Pat<(v4i16 (OpNodeLane
		(v4i16 V64:$Rn), (v4i16 V64_lo:$Rm),
		VectorIndexS32b:$idx)),
		(!cast<Instruction>(NAME # v4i16_indexed) $Rn,
		(SUBREG_TO_REG (i32 0), (v4i16 V64_lo:$Rm), dsub),
		(UImmS1XForm $idx))>;

		def : Pat<(v4i16 (OpNodeLaneQ
		(v4i16 V64:$Rn), (v8i16 V128_lo:$Rm),
		VectorIndexH32b:$idx)),
		(!cast<Instruction>(NAME # v4i16_indexed) $Rn, $Rm,
		(UImmS1XForm $idx))>;

		def : Pat<(v8i16 (OpNodeLane
		(v8i16 V128:$Rn), (v4i16 V64_lo:$Rm),
		VectorIndexS32b:$idx)),
		(!cast<Instruction>(NAME # v8i16_indexed) $Rn,
		(SUBREG_TO_REG (i32 0), $Rm, dsub),
		(UImmS1XForm $idx))>;

		def : Pat<(v8i16 (OpNodeLaneQ
		(v8i16 V128:$Rn), (v8i16 V128_lo:$Rm),
		VectorIndexH32b:$idx)),
		(!cast<Instruction>(NAME # v8i16_indexed) $Rn, $Rm,
		(UImmS1XForm $idx))>;

		def : Pat<(v2i32 (OpNodeLane
		(v2i32 V64:$Rn), (v2i32 V64:$Rm),
		VectorIndexD32b:$idx)),
		(!cast<Instruction>(NAME # v2i32_indexed) $Rn,
		(SUBREG_TO_REG (i32 0), (v2i32 V64_lo:$Rm), dsub),
		(UImmS1XForm $idx))>;

		def : Pat<(v2i32 (OpNodeLaneQ
		(v2i32 V64:$Rn), (v4i32 V128:$Rm),
		VectorIndexS32b:$idx)),
		(!cast<Instruction>(NAME # v2i32_indexed) $Rn, $Rm,
		(UImmS1XForm $idx))>;

		def : Pat<(v4i32 (OpNodeLane
		(v4i32 V128:$Rn), (v2i32 V64:$Rm),
		VectorIndexD32b:$idx)),
		(!cast<Instruction>(NAME # v4i32_indexed) $Rn,
		(SUBREG_TO_REG (i32 0), $Rm, dsub),
		(UImmS1XForm $idx))>;

		def : Pat<(v4i32 (OpNodeLaneQ
		(v4i32 V128:$Rn),
		(v4i32 V128:$Rm),
		VectorIndexS32b:$idx)),
		(!cast<Instruction>(NAME # v4i32_indexed) $Rn, $Rm,
		(UImmS1XForm $idx))>;

		}

multiclass SIMDIndexedHS<bit U, bits<4> opc, string asm,		multiclass SIMDIndexedHS<bit U, bits<4> opc, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
def v4i16_indexed : BaseSIMDIndexed<0, U, 0, 0b01, opc, V64, V64,		def v4i16_indexed : BaseSIMDIndexed<0, U, 0, 0b01, opc, V64, V64,
V128_lo, VectorIndexH,		V128_lo, VectorIndexH,
asm, ".4h", ".4h", ".4h", ".h",		asm, ".4h", ".4h", ".4h", ".h",
[(set (v4i16 V64:$Rd),		[(set (v4i16 V64:$Rd),
(OpNode (v4i16 V64:$Rn),		(OpNode (v4i16 V64:$Rn),
(v4i16 (AArch64duplane16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx))))]> {		(v4i16 (AArch64duplane16 (v8i16 V128_lo:$Rm), VectorIndexH:$idx))))]> {
▲ Show 20 Lines • Show All 2,905 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,571 Lines • ▼ Show 20 Lines
	def : Pat<(v2f64 (fmul V128:$Rn, (AArch64dup (f64 FPR64:$Rm)))),			def : Pat<(v2f64 (fmul V128:$Rn, (AArch64dup (f64 FPR64:$Rm)))),
	(FMULv2i64_indexed V128:$Rn,			(FMULv2i64_indexed V128:$Rn,
	(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR64:$Rm, dsub),			(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR64:$Rm, dsub),
	(i64 0))>;			(i64 0))>;

	defm SQDMULH : SIMDIndexedHS<0, 0b1100, "sqdmulh", int_aarch64_neon_sqdmulh>;			defm SQDMULH : SIMDIndexedHS<0, 0b1100, "sqdmulh", int_aarch64_neon_sqdmulh>;
	defm SQRDMULH : SIMDIndexedHS<0, 0b1101, "sqrdmulh", int_aarch64_neon_sqrdmulh>;			defm SQRDMULH : SIMDIndexedHS<0, 0b1101, "sqrdmulh", int_aarch64_neon_sqrdmulh>;

				defm SQDMULH : SIMDIndexedHSPatterns<int_aarch64_neon_sqdmulh_lane,
				int_aarch64_neon_sqdmulh_laneq>;
				defm SQRDMULH : SIMDIndexedHSPatterns<int_aarch64_neon_sqrdmulh_lane,
				int_aarch64_neon_sqrdmulh_laneq>;

	// Generated by MachineCombine			// Generated by MachineCombine
	defm MLA : SIMDVectorIndexedHSTied<1, 0b0000, "mla", null_frag>;			defm MLA : SIMDVectorIndexedHSTied<1, 0b0000, "mla", null_frag>;
	defm MLS : SIMDVectorIndexedHSTied<1, 0b0100, "mls", null_frag>;			defm MLS : SIMDVectorIndexedHSTied<1, 0b0100, "mls", null_frag>;

	defm MUL : SIMDVectorIndexedHS<0, 0b1000, "mul", mul>;			defm MUL : SIMDVectorIndexedHS<0, 0b1000, "mul", mul>;
	defm SMLAL : SIMDVectorIndexedLongSDTied<0, 0b0010, "smlal",			defm SMLAL : SIMDVectorIndexedLongSDTied<0, 0b0010, "smlal",
	TriOpFrag<(add node:$LHS, (int_aarch64_neon_smull node:$MHS, node:$RHS))>>;			TriOpFrag<(add node:$LHS, (int_aarch64_neon_smull node:$MHS, node:$RHS))>>;
	defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",			defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",
	▲ Show 20 Lines • Show All 1,696 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64RegisterBankInfo.cpp

	Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines

	const RegisterBank &AArch64RegisterBankInfo::getRegBankFromRegClass(			const RegisterBank &AArch64RegisterBankInfo::getRegBankFromRegClass(
	const TargetRegisterClass &RC) const {			const TargetRegisterClass &RC) const {
	switch (RC.getID()) {			switch (RC.getID()) {
	case AArch64::FPR8RegClassID:			case AArch64::FPR8RegClassID:
	case AArch64::FPR16RegClassID:			case AArch64::FPR16RegClassID:
	case AArch64::FPR32RegClassID:			case AArch64::FPR32RegClassID:
	case AArch64::FPR64RegClassID:			case AArch64::FPR64RegClassID:
				case AArch64::FPR64_loRegClassID:
	case AArch64::FPR128RegClassID:			case AArch64::FPR128RegClassID:
	case AArch64::FPR128_loRegClassID:			case AArch64::FPR128_loRegClassID:
	case AArch64::DDRegClassID:			case AArch64::DDRegClassID:
	case AArch64::DDDRegClassID:			case AArch64::DDDRegClassID:
	case AArch64::DDDDRegClassID:			case AArch64::DDDDRegClassID:
	case AArch64::QQRegClassID:			case AArch64::QQRegClassID:
	case AArch64::QQQRegClassID:			case AArch64::QQQRegClassID:
	case AArch64::QQQQRegClassID:			case AArch64::QQQQRegClassID:
	▲ Show 20 Lines • Show All 612 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp

Show First 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	unsigned AArch64RegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
case AArch64::DDDRegClassID:		case AArch64::DDDRegClassID:
case AArch64::DDDDRegClassID:		case AArch64::DDDDRegClassID:
case AArch64::QQRegClassID:		case AArch64::QQRegClassID:
case AArch64::QQQRegClassID:		case AArch64::QQQRegClassID:
case AArch64::QQQQRegClassID:		case AArch64::QQQQRegClassID:
return 32;		return 32;

case AArch64::FPR128_loRegClassID:		case AArch64::FPR128_loRegClassID:
		case AArch64::FPR64_loRegClassID:
return 16;		return 16;
}		}
}		}

unsigned AArch64RegisterInfo::getLocalAddressRegister(		unsigned AArch64RegisterInfo::getLocalAddressRegister(
const MachineFunction &MF) const {		const MachineFunction &MF) const {
const auto &MFI = MF.getFrameInfo();		const auto &MFI = MF.getFrameInfo();
if (!MF.hasEHFunclets() && !MFI.hasVarSizedObjects())		if (!MF.hasEHFunclets() && !MFI.hasVarSizedObjects())
return AArch64::SP;		return AArch64::SP;
else if (needsStackRealignment(MF))		else if (needsStackRealignment(MF))
return getBaseRegister();		return getBaseRegister();
return getFrameRegister(MF);		return getFrameRegister(MF);
}		}

llvm/lib/Target/AArch64/AArch64RegisterInfo.td

	Show First 20 Lines • Show All 423 Lines • ▼ Show 20 Lines
	}			}
	def FPR16 : RegisterClass<"AArch64", [f16], 16, (sequence "H%u", 0, 31)> {			def FPR16 : RegisterClass<"AArch64", [f16], 16, (sequence "H%u", 0, 31)> {
	let Size = 16;			let Size = 16;
	}			}
	def FPR32 : RegisterClass<"AArch64", [f32, i32], 32,(sequence "S%u", 0, 31)>;			def FPR32 : RegisterClass<"AArch64", [f32, i32], 32,(sequence "S%u", 0, 31)>;
	def FPR64 : RegisterClass<"AArch64", [f64, i64, v2f32, v1f64, v8i8, v4i16, v2i32,			def FPR64 : RegisterClass<"AArch64", [f64, i64, v2f32, v1f64, v8i8, v4i16, v2i32,
	v1i64, v4f16],			v1i64, v4f16],
	64, (sequence "D%u", 0, 31)>;			64, (sequence "D%u", 0, 31)>;
				def FPR64_lo : RegisterClass<"AArch64",
				[v8i8, v4i16, v2i32, v1i64, v4f16, v2f32, v1f64],
				64, (trunc FPR64, 16)>;

	// We don't (yet) have an f128 legal type, so don't use that here. We			// We don't (yet) have an f128 legal type, so don't use that here. We
	// normalize 128-bit vectors to v2f64 for arg passing and such, so use			// normalize 128-bit vectors to v2f64 for arg passing and such, so use
	// that here.			// that here.
	def FPR128 : RegisterClass<"AArch64",			def FPR128 : RegisterClass<"AArch64",
	[v16i8, v8i16, v4i32, v2i64, v4f32, v2f64, f128,			[v16i8, v8i16, v4i32, v2i64, v4f32, v2f64, f128,
	v8f16],			v8f16],
	128, (sequence "Q%u", 0, 31)>;			128, (sequence "Q%u", 0, 31)>;

	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	def V128 : RegisterOperand<FPR128, "printVRegOperand"> {			def V128 : RegisterOperand<FPR128, "printVRegOperand"> {
	let ParserMatchClass = VectorReg128AsmOperand;			let ParserMatchClass = VectorReg128AsmOperand;
	}			}

	def VectorRegLoAsmOperand : AsmOperandClass {			def VectorRegLoAsmOperand : AsmOperandClass {
	let Name = "VectorRegLo";			let Name = "VectorRegLo";
	let PredicateMethod = "isNeonVectorRegLo";			let PredicateMethod = "isNeonVectorRegLo";
	}			}
				def V64_lo : RegisterOperand<FPR64_lo, "printVRegOperand"> {
				let ParserMatchClass = VectorRegLoAsmOperand;
				}
	def V128_lo : RegisterOperand<FPR128_lo, "printVRegOperand"> {			def V128_lo : RegisterOperand<FPR128_lo, "printVRegOperand"> {
	let ParserMatchClass = VectorRegLoAsmOperand;			let ParserMatchClass = VectorRegLoAsmOperand;
	}			}

	class TypedVecListAsmOperand<int count, string vecty, int lanes, int eltsize>			class TypedVecListAsmOperand<int count, string vecty, int lanes, int eltsize>
	: AsmOperandClass {			: AsmOperandClass {
	let Name = "TypedVectorList" # count # "_" # lanes # eltsize;			let Name = "TypedVectorList" # count # "_" # lanes # eltsize;

	▲ Show 20 Lines • Show All 616 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp

Show First 20 Lines • Show All 1,027 Lines • ▼ Show 20 Lines	public:
}		}

bool isNeonVectorReg() const {		bool isNeonVectorReg() const {
return Kind == k_Register && Reg.Kind == RegKind::NeonVector;		return Kind == k_Register && Reg.Kind == RegKind::NeonVector;
}		}

bool isNeonVectorRegLo() const {		bool isNeonVectorRegLo() const {
return Kind == k_Register && Reg.Kind == RegKind::NeonVector &&		return Kind == k_Register && Reg.Kind == RegKind::NeonVector &&
AArch64MCRegisterClasses[AArch64::FPR128_loRegClassID].contains(		(AArch64MCRegisterClasses[AArch64::FPR128_loRegClassID].contains(
Reg.RegNum);		Reg.RegNum) \|\|
		AArch64MCRegisterClasses[AArch64::FPR64_loRegClassID].contains(
		Reg.RegNum));
}		}

template <unsigned Class> bool isSVEVectorReg() const {		template <unsigned Class> bool isSVEVectorReg() const {
RegKind RK;		RegKind RK;
switch (Class) {		switch (Class) {
case AArch64::ZPRRegClassID:		case AArch64::ZPRRegClassID:
case AArch64::ZPR_3bRegClassID:		case AArch64::ZPR_3bRegClassID:
case AArch64::ZPR_4bRegClassID:		case AArch64::ZPR_4bRegClassID:
▲ Show 20 Lines • Show All 4,710 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s --check-prefixes=CHECK,GENERIC			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s --check-prefixes=CHECK,GENERIC
	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast -mcpu=exynos-m3 \| FileCheck %s --check-prefixes=CHECK,EXYNOSM3			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast -mcpu=exynos-m3 \| FileCheck %s --check-prefixes=CHECK,EXYNOSM3

	declare <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double>, <2 x double>)

	declare <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float>, <4 x float>)			declare <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float>, <4 x float>)

	declare <2 x float> @llvm.aarch64.neon.fmulx.v2f32(<2 x float>, <2 x float>)			declare <2 x float> @llvm.aarch64.neon.fmulx.v2f32(<2 x float>, <2 x float>)

	declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32>, <4 x i32>)
				declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32(<4 x i32>, <2 x i32>, i32)
				declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32(<4 x i32>, <4 x i32>, i32)

	declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32>, <2 x i32>)			declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32>, <2 x i32>)
				declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32(<2 x i32>, <2 x i32>, i32)
				declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32(<2 x i32>, <4 x i32>, i32)

	declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16>, <8 x i16>)			declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16>, <8 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v8i16(<8 x i16>, <4 x i16>, i32)
				declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16(<8 x i16>, <8 x i16>, i32)

	declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16>, <4 x i16>)			declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16>, <4 x i16>)
				declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v4i16(<4 x i16>, <4 x i16>, i32)
				declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16(<4 x i16>, <8 x i16>, i32)

	declare <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32>, <4 x i32>)
				declare <4 x i32> @llvm.aarch64.neon.sqdmulh.lane.v4i32(<4 x i32>, <2 x i32>, i32)
				declare <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32(<4 x i32>, <4 x i32>, i32)

	declare <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32>, <2 x i32>)			declare <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32>, <2 x i32>)
				declare <2 x i32> @llvm.aarch64.neon.sqdmulh.lane.v2i32(<2 x i32>, <2 x i32>, i32)
				declare <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32(<2 x i32>, <4 x i32>, i32)

	declare <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16>, <8 x i16>)			declare <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16>, <8 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.sqdmulh.lane.v8i16(<8 x i16>, <4 x i16>, i32)
				declare <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16(<8 x i16>, <8 x i16>, i32)

	declare <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16>, <4 x i16>)			declare <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16>, <4 x i16>)
				declare <4 x i16> @llvm.aarch64.neon.sqdmulh.lane.v4i16(<4 x i16>, <4 x i16>, i32)
				declare <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16(<4 x i16>, <8 x i16>, i32)

	declare <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32>, <2 x i32>)			declare <2 x i64> @llvm.aarch64.neon.sqdmull.v2i64(<2 x i32>, <2 x i32>)

	declare <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16>, <4 x i16>)			declare <4 x i32> @llvm.aarch64.neon.sqdmull.v4i32(<4 x i16>, <4 x i16>)

	declare <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64>, <2 x i64>)			declare <2 x i64> @llvm.aarch64.neon.sqsub.v2i64(<2 x i64>, <2 x i64>)

	declare <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32>, <4 x i32>)
	▲ Show 20 Lines • Show All 1,476 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[3]			; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>			%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)			%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
	ret <4 x i16> %vqdmulh2.i			ret <4 x i16> %vqdmulh2.i
	}			}

				define <4 x i16> @test_vqdmulh_lane_s16_intrinsic(<4 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqdmulh_lane_s16_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.lane.v4i16(<4 x i16> %a, <4 x i16> %v, i32 3)
				ret <4 x i16> %vqdmulh2.i
				}

				define <4 x i16> @test_vqdmulh_laneq_s16_intrinsic_lo(<4 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqdmulh_laneq_s16_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16(<4 x i16> %a, <8 x i16> %v, i32 3)
				ret <4 x i16> %vqdmulh2.i
				}

				define <4 x i16> @test_vqdmulh_laneq_s16_intrinsic_hi(<4 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqdmulh_laneq_s16_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.4h, v0.4h, v1.h[7]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v4i16(<4 x i16> %a, <8 x i16> %v, i32 7)
				ret <4 x i16> %vqdmulh2.i
				}

	define <8 x i16> @test_vqdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {			define <8 x i16> @test_vqdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {
	; CHECK-LABEL: test_vqdmulhq_lane_s16:			; CHECK-LABEL: test_vqdmulhq_lane_s16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[3]			; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle)			%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle)
	ret <8 x i16> %vqdmulh2.i			ret <8 x i16> %vqdmulh2.i
	}			}

				define <8 x i16> @test_vqdmulhq_lane_s16_intrinsic(<8 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqdmulhq_lane_s16_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.lane.v8i16(<8 x i16> %a, <4 x i16> %v, i32 3)
				ret <8 x i16> %vqdmulh2.i
				}

				define <8 x i16> @test_vqdmulhq_laneq_s16_intrinsic_lo(<8 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqdmulhq_laneq_s16_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16(<8 x i16> %a, <8 x i16> %v, i32 3)
				ret <8 x i16> %vqdmulh2.i
				}

				define <8 x i16> @test_vqdmulhq_laneq_s16_intrinsic_hi(<8 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqdmulhq_laneq_s16_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.8h, v0.8h, v1.h[7]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqdmulh.laneq.v8i16(<8 x i16> %a, <8 x i16> %v, i32 7)
				ret <8 x i16> %vqdmulh2.i
				}

	define <2 x i32> @test_vqdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {			define <2 x i32> @test_vqdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {
	; CHECK-LABEL: test_vqdmulh_lane_s32:			; CHECK-LABEL: test_vqdmulh_lane_s32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[1]			; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle)			%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle)
	ret <2 x i32> %vqdmulh2.i			ret <2 x i32> %vqdmulh2.i
	}			}

				define <2 x i32> @test_vqdmulh_lane_s32_intrinsic(<2 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqdmulh_lane_s32_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.lane.v2i32(<2 x i32> %a, <2 x i32> %v, i32 1)
				ret <2 x i32> %vqdmulh2.i
				}

				define <2 x i32> @test_vqdmulh_laneq_s32_intrinsic_lo(<2 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqdmulh_laneq_s32_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32(<2 x i32> %a, <4 x i32> %v, i32 1)
				ret <2 x i32> %vqdmulh2.i
				}

				define <2 x i32> @test_vqdmulh_laneq_s32_intrinsic_hi(<2 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqdmulh_laneq_s32_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.2s, v0.2s, v1.s[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v2i32(<2 x i32> %a, <4 x i32> %v, i32 3)
				ret <2 x i32> %vqdmulh2.i
				}

	define <4 x i32> @test_vqdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {			define <4 x i32> @test_vqdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {
	; CHECK-LABEL: test_vqdmulhq_lane_s32:			; CHECK-LABEL: test_vqdmulhq_lane_s32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[1]			; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>			%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle)			%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle)
	ret <4 x i32> %vqdmulh2.i			ret <4 x i32> %vqdmulh2.i
	}			}

				define <4 x i32> @test_vqdmulhq_lane_s32_intrinsic(<4 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqdmulhq_lane_s32_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.lane.v4i32(<4 x i32> %a, <2 x i32> %v, i32 1)
				ret <4 x i32> %vqdmulh2.i
				}

				define <4 x i32> @test_vqdmulhq_laneq_s32_intrinsic_lo(<4 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqdmulhq_laneq_s32_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32(<4 x i32> %a, <4 x i32> %v, i32 1)
				ret <4 x i32> %vqdmulh2.i
				}

				define <4 x i32> @test_vqdmulhq_laneq_s32_intrinsic_hi(<4 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqdmulhq_laneq_s32_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqdmulh v0.4s, v0.4s, v1.s[3]
				; CHECK-NEXT: ret
				entry:
				%vqdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqdmulh.laneq.v4i32(<4 x i32> %a, <4 x i32> %v, i32 3)
				ret <4 x i32> %vqdmulh2.i
				}

	define <4 x i16> @test_vqrdmulh_lane_s16(<4 x i16> %a, <4 x i16> %v) {			define <4 x i16> @test_vqrdmulh_lane_s16(<4 x i16> %a, <4 x i16> %v) {
	; CHECK-LABEL: test_vqrdmulh_lane_s16:			; CHECK-LABEL: test_vqrdmulh_lane_s16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[3]			; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>			%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)			%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
	ret <4 x i16> %vqrdmulh2.i			ret <4 x i16> %vqrdmulh2.i
	}			}

				define <4 x i16> @test_vqrdmulh_lane_s16_intrinsic(<4 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulh_lane_s16_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v4i16(<4 x i16> %a, <4 x i16> %v, i32 3)
				ret <4 x i16> %vqrdmulh2.i
				}

				define <4 x i16> @test_vqrdmulh_laneq_s16_intrinsic_lo(<4 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulh_laneq_s16_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16(<4 x i16> %a, <8 x i16> %v, i32 3)
				ret <4 x i16> %vqrdmulh2.i
				}

				define <4 x i16> @test_vqrdmulh_laneq_s16_intrinsic_hi(<4 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulh_laneq_s16_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.4h, v0.4h, v1.h[7]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v4i16(<4 x i16> %a, <8 x i16> %v, i32 7)
				ret <4 x i16> %vqrdmulh2.i
				}

	define <8 x i16> @test_vqrdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {			define <8 x i16> @test_vqrdmulhq_lane_s16(<8 x i16> %a, <4 x i16> %v) {
	; CHECK-LABEL: test_vqrdmulhq_lane_s16:			; CHECK-LABEL: test_vqrdmulhq_lane_s16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[3]			; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle)			%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %a, <8 x i16> %shuffle)
	ret <8 x i16> %vqrdmulh2.i			ret <8 x i16> %vqrdmulh2.i
	}			}

				define <8 x i16> @test_vqrdmulhq_lane_s16_intrinsic(<8 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulhq_lane_s16_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.lane.v8i16(<8 x i16> %a, <4 x i16> %v, i32 3)
				ret <8 x i16> %vqrdmulh2.i
				}

				define <8 x i16> @test_vqrdmulhq_laneq_s16_intrinsic_lo(<8 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulhq_laneq_s16_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16(<8 x i16> %a, <8 x i16> %v, i32 3)
				ret <8 x i16> %vqrdmulh2.i
				}

				define <8 x i16> @test_vqrdmulhq_laneq_s16_intrinsic_hi(<8 x i16> %a, <8 x i16> %v) {
				; CHECK-LABEL: test_vqrdmulhq_laneq_s16_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.8h, v0.8h, v1.h[7]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmulh.laneq.v8i16(<8 x i16> %a, <8 x i16> %v, i32 7)
				ret <8 x i16> %vqrdmulh2.i
				}

	define <2 x i32> @test_vqrdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {			define <2 x i32> @test_vqrdmulh_lane_s32(<2 x i32> %a, <2 x i32> %v) {
	; CHECK-LABEL: test_vqrdmulh_lane_s32:			; CHECK-LABEL: test_vqrdmulh_lane_s32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[1]			; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>			%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
	%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle)			%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %a, <2 x i32> %shuffle)
	ret <2 x i32> %vqrdmulh2.i			ret <2 x i32> %vqrdmulh2.i
	}			}

				define <2 x i32> @test_vqrdmulh_lane_s32_intrinsic(<2 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulh_lane_s32_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32(<2 x i32> %a, <2 x i32> %v, i32 1)
				ret <2 x i32> %vqrdmulh2.i
				}

				define <2 x i32> @test_vqrdmulh_laneq_s32_intrinsic_lo(<2 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulh_laneq_s32_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32(<2 x i32> %a, <4 x i32> %v, i32 1)
				ret <2 x i32> %vqrdmulh2.i
				}

				define <2 x i32> @test_vqrdmulh_laneq_s32_intrinsic_hi(<2 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulh_laneq_s32_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.2s, v0.2s, v1.s[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32(<2 x i32> %a, <4 x i32> %v, i32 3)
				ret <2 x i32> %vqrdmulh2.i
				}

	define <4 x i32> @test_vqrdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {			define <4 x i32> @test_vqrdmulhq_lane_s32(<4 x i32> %a, <2 x i32> %v) {
	; CHECK-LABEL: test_vqrdmulhq_lane_s32:			; CHECK-LABEL: test_vqrdmulhq_lane_s32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[1]			; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>			%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
	%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle)			%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %a, <4 x i32> %shuffle)
	ret <4 x i32> %vqrdmulh2.i			ret <4 x i32> %vqrdmulh2.i
	}			}

				define <4 x i32> @test_vqrdmulhq_lane_s32_intrinsic(<4 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulhq_lane_s32_intrinsic:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32(<4 x i32> %a, <2 x i32> %v, i32 1)
				ret <4 x i32> %vqrdmulh2.i
				}

				define <4 x i32> @test_vqrdmulhq_laneq_s32_intrinsic_lo(<4 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulhq_laneq_s32_intrinsic_lo:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[1]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32(<4 x i32> %a, <4 x i32> %v, i32 1)
				ret <4 x i32> %vqrdmulh2.i
				}

				define <4 x i32> @test_vqrdmulhq_laneq_s32_intrinsic_hi(<4 x i32> %a, <4 x i32> %v) {
				; CHECK-LABEL: test_vqrdmulhq_laneq_s32_intrinsic_hi:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: sqrdmulh v0.4s, v0.4s, v1.s[3]
				; CHECK-NEXT: ret
				entry:
				%vqrdmulh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32(<4 x i32> %a, <4 x i32> %v, i32 3)
				ret <4 x i32> %vqrdmulh2.i
				}

	define <2 x float> @test_vmul_lane_f32(<2 x float> %a, <2 x float> %v) {			define <2 x float> @test_vmul_lane_f32(<2 x float> %a, <2 x float> %v) {
	; CHECK-LABEL: test_vmul_lane_f32:			; CHECK-LABEL: test_vmul_lane_f32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[1]			; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>			%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <2 x i32> <i32 1, i32 1>
	▲ Show 20 Lines • Show All 1,762 Lines • Show Last 20 Lines

llvm/utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	enum IIT_Info {
IIT_STRUCT6 = 38,		IIT_STRUCT6 = 38,
IIT_STRUCT7 = 39,		IIT_STRUCT7 = 39,
IIT_STRUCT8 = 40,		IIT_STRUCT8 = 40,
IIT_F128 = 41,		IIT_F128 = 41,
IIT_VEC_ELEMENT = 42,		IIT_VEC_ELEMENT = 42,
IIT_SCALABLE_VEC = 43,		IIT_SCALABLE_VEC = 43,
IIT_SUBDIVIDE2_ARG = 44,		IIT_SUBDIVIDE2_ARG = 44,
IIT_SUBDIVIDE4_ARG = 45,		IIT_SUBDIVIDE4_ARG = 45,
IIT_VEC_OF_BITCASTS_TO_INT = 46		IIT_VEC_OF_BITCASTS_TO_INT = 46,
		IIT_NARROW_VEC = 47,
		IIT_WIDE_VEC = 48
};		};

static void EncodeFixedValueType(MVT::SimpleValueType VT,		static void EncodeFixedValueType(MVT::SimpleValueType VT,
std::vector<unsigned char> &Sig) {		std::vector<unsigned char> &Sig) {
if (MVT(VT).isInteger()) {		if (MVT(VT).isInteger()) {
unsigned BitWidth = MVT(VT).getSizeInBits();		unsigned BitWidth = MVT(VT).getSizeInBits();
switch (BitWidth) {		switch (BitWidth) {
default: PrintFatalError("unhandled integer type width in intrinsic!");		default: PrintFatalError("unhandled integer type width in intrinsic!");
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	if (R->isSubClassOf("LLVMMatchType")) {
else if (R->isSubClassOf("LLVMVectorElementType"))		else if (R->isSubClassOf("LLVMVectorElementType"))
Sig.push_back(IIT_VEC_ELEMENT);		Sig.push_back(IIT_VEC_ELEMENT);
else if (R->isSubClassOf("LLVMSubdivide2VectorType"))		else if (R->isSubClassOf("LLVMSubdivide2VectorType"))
Sig.push_back(IIT_SUBDIVIDE2_ARG);		Sig.push_back(IIT_SUBDIVIDE2_ARG);
else if (R->isSubClassOf("LLVMSubdivide4VectorType"))		else if (R->isSubClassOf("LLVMSubdivide4VectorType"))
Sig.push_back(IIT_SUBDIVIDE4_ARG);		Sig.push_back(IIT_SUBDIVIDE4_ARG);
else if (R->isSubClassOf("LLVMVectorOfBitcastsToInt"))		else if (R->isSubClassOf("LLVMVectorOfBitcastsToInt"))
Sig.push_back(IIT_VEC_OF_BITCASTS_TO_INT);		Sig.push_back(IIT_VEC_OF_BITCASTS_TO_INT);
		else if (R->isSubClassOf("LLVMNarrowType"))
		Sig.push_back(IIT_NARROW_VEC);
		else if (R->isSubClassOf("LLVMWideType"))
		Sig.push_back(IIT_WIDE_VEC);
else		else
Sig.push_back(IIT_ARG);		Sig.push_back(IIT_ARG);
return Sig.push_back((Number << 3) \| 7 /IITDescriptor::AK_MatchType/);		return Sig.push_back((Number << 3) \| 7 /IITDescriptor::AK_MatchType/);
}		}

MVT::SimpleValueType VT = getValueType(R->getValueAsDef("VT"));		MVT::SimpleValueType VT = getValueType(R->getValueAsDef("VT"));

unsigned Tmp = 0;		unsigned Tmp = 0;
▲ Show 20 Lines • Show All 643 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 233807

clang/include/clang/Basic/arm_neon.td

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/aarch64-neon-2velem.c

llvm/include/llvm/IR/Intrinsics.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/lib/IR/Function.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AArch64/AArch64RegisterBankInfo.cpp

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp

llvm/lib/Target/AArch64/AArch64RegisterInfo.td

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp

llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll

llvm/utils/TableGen/IntrinsicEmitter.cpp

[AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)
ClosedPublic