This is an archive of the discontinued LLVM Phabricator instance.

dancgr retitled this revision from [AArch64][SVE] Add SVE2 mla indexed intrinsics. to [AArch64][SVE] Add SVE2 mla indexed intrinsics.Jan 28 2020, 1:17 PM

I don't see any testcases for the byte variant? (smlalb z0.h, z1.b, z2.b[0]).

Err, nevermind, that's not legal.

LGTM

This revision is now accepted and ready to land.Jan 28 2020, 1:28 PM

Closed by commit rG1f85dfb2af1a: [AArch64][SVE] Add SVE2 mla indexed intrinsics. (authored by dancgr). · Explain WhyJan 28 2020, 2:15 PM

This revision was automatically updated to reflect the committed changes.

@dancgr I see you already committed the patch, but could you please still address my comments?

llvm/include/llvm/IR/IntrinsicsAArch64.td
1088	For consistency with other intrinsics, these need to use an `i32` value for the immediate. The reason is that the front-end will generate the calls to the intrinsics automatically and will use an `i32` for the immediates. Having one exception to the rule means this is something we'll need to fix in Clang or definition of the intrinsic later.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1470–1477	In order to be consistent with other intrinsics that have indexed form (such as `int_aarch64_sve_fmlalb_lane`), these indexed forms are better named `int_aarch64_sve_smlalb_lane`, where `int_aarch64_sve_smlalb` are used for the vectors unpredicated form. It may come across as if I'm being a bit pedantic here with the naming, but not deviating unnecessarily from our downstream implementation (see D71712 for reference) really helps us when we go and upstream the Clang side of the ACLE (which we're preparing to upstream as we speak). We generate code for Clang to map C/C++ level builtins -> llvm ir intrinsics, and any changes to the intrinsic names like this, we will need to fix up in our mapping and tests. This probably isn't too complicated, but it would be another thing we'd need to fix, so better to avoid this if we can.

I will make the changes suggested by Sander in a following patch, joined with the saturating multiply-add long intirnsics.

llvm/include/llvm/IR/IntrinsicsAArch64.td
1088	Sure, no problem. The reason I chose i64 was because the original InstrFormat class definition used VectorIndexH and VectorIndexS, which are i64 immediates. I will make a switch to i32 formats.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1470–1477	Sure, will update that.

dancgr mentioned this in D73633: [AArch64][SVE] Add remaining SVE2 mla indexed intrinsics..Jan 29 2020, 8:27 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

18 lines

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

24 lines

SVEInstrFormats.td

5 lines

test/

CodeGen/

AArch64/

sve2-mla-indexed.ll

458 lines

Diff 240994

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 1,074 Lines • ▼ Show 20 Lines	class SVE2_1VectorArg_Imm_Narrowing_Intrinsic
[IntrNoMem, ImmArg<1>]>;		[IntrNoMem, ImmArg<1>]>;

class SVE2_2VectorArg_Imm_Narrowing_Intrinsic		class SVE2_2VectorArg_Imm_Narrowing_Intrinsic
: Intrinsic<[LLVMSubdivide2VectorType<0>],		: Intrinsic<[LLVMSubdivide2VectorType<0>],
[LLVMSubdivide2VectorType<0>, llvm_anyvector_ty,		[LLVMSubdivide2VectorType<0>, llvm_anyvector_ty,
llvm_i32_ty],		llvm_i32_ty],
[IntrNoMem, ImmArg<2>]>;		[IntrNoMem, ImmArg<2>]>;

		class SVE2_3VectorArg_Indexed_Intrinsic
		: Intrinsic<[llvm_anyvector_ty],
		[LLVMMatchType<0>,
		LLVMSubdivide2VectorType<0>,
		LLVMSubdivide2VectorType<0>,
		llvm_i64_ty],
		sdesmalenUnsubmitted Not Done Reply Inline Actions For consistency with other intrinsics, these need to use an `i32` value for the immediate. The reason is that the front-end will generate the calls to the intrinsics automatically and will use an `i32` for the immediates. Having one exception to the rule means this is something we'll need to fix in Clang or definition of the intrinsic later. sdesmalen: For consistency with other intrinsics, these need to use an `i32` value for the immediate. The…
		dancgrAuthorUnsubmitted Not Done Reply Inline Actions Sure, no problem. The reason I chose i64 was because the original InstrFormat class definition used VectorIndexH and VectorIndexS, which are i64 immediates. I will make a switch to i32 formats. dancgr: Sure, no problem. The reason I chose i64 was because the original InstrFormat class definition…
		[IntrNoMem, ImmArg<3>]>;

// NOTE: There is no relationship between these intrinsics beyond an attempt		// NOTE: There is no relationship between these intrinsics beyond an attempt
// to reuse currently identical class definitions.		// to reuse currently identical class definitions.
class AdvSIMD_SVE_LOGB_Intrinsic : AdvSIMD_SVE_CNT_Intrinsic;		class AdvSIMD_SVE_LOGB_Intrinsic : AdvSIMD_SVE_CNT_Intrinsic;

// This class of intrinsics are not intended to be useful within LLVM IR but		// This class of intrinsics are not intended to be useful within LLVM IR but
// are instead here to support some of the more regid parts of the ACLE.		// are instead here to support some of the more regid parts of the ACLE.
class Builtin_SVCVT<string name, LLVMType OUT, LLVMType IN>		class Builtin_SVCVT<string name, LLVMType OUT, LLVMType IN>
: GCCBuiltin<"__builtin_sve_" # name>,		: GCCBuiltin<"__builtin_sve_" # name>,
▲ Show 20 Lines • Show All 636 Lines • ▼ Show 20 Lines
def int_aarch64_sve_uqrshrnt : SVE2_2VectorArg_Imm_Narrowing_Intrinsic;		def int_aarch64_sve_uqrshrnt : SVE2_2VectorArg_Imm_Narrowing_Intrinsic;

// Saturating shift right - signed input, unsigned output		// Saturating shift right - signed input, unsigned output
def int_aarch64_sve_sqshrunb : SVE2_1VectorArg_Imm_Narrowing_Intrinsic;		def int_aarch64_sve_sqshrunb : SVE2_1VectorArg_Imm_Narrowing_Intrinsic;
def int_aarch64_sve_sqshrunt : SVE2_2VectorArg_Imm_Narrowing_Intrinsic;		def int_aarch64_sve_sqshrunt : SVE2_2VectorArg_Imm_Narrowing_Intrinsic;

def int_aarch64_sve_sqrshrunb : SVE2_1VectorArg_Imm_Narrowing_Intrinsic;		def int_aarch64_sve_sqrshrunb : SVE2_1VectorArg_Imm_Narrowing_Intrinsic;
def int_aarch64_sve_sqrshrunt : SVE2_2VectorArg_Imm_Narrowing_Intrinsic;		def int_aarch64_sve_sqrshrunt : SVE2_2VectorArg_Imm_Narrowing_Intrinsic;

		def int_aarch64_sve_smlalb : SVE2_3VectorArg_Indexed_Intrinsic;
		def int_aarch64_sve_smlalt : SVE2_3VectorArg_Indexed_Intrinsic;
		def int_aarch64_sve_umlalb : SVE2_3VectorArg_Indexed_Intrinsic;
		def int_aarch64_sve_umlalt : SVE2_3VectorArg_Indexed_Intrinsic;
		def int_aarch64_sve_smlslb : SVE2_3VectorArg_Indexed_Intrinsic;
		def int_aarch64_sve_smlslt : SVE2_3VectorArg_Indexed_Intrinsic;
		def int_aarch64_sve_umlslb : SVE2_3VectorArg_Indexed_Intrinsic;
		def int_aarch64_sve_umlslt : SVE2_3VectorArg_Indexed_Intrinsic;

}		}

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 1,461 Lines • ▼ Show 20 Lines	let Predicates = [HasSVE2] in {
defm UMULLB_ZZZI : sve2_int_mul_long_by_indexed_elem<0b010, "umullb">;		defm UMULLB_ZZZI : sve2_int_mul_long_by_indexed_elem<0b010, "umullb">;
defm UMULLT_ZZZI : sve2_int_mul_long_by_indexed_elem<0b011, "umullt">;		defm UMULLT_ZZZI : sve2_int_mul_long_by_indexed_elem<0b011, "umullt">;

// SVE2 saturating multiply (indexed)		// SVE2 saturating multiply (indexed)
defm SQDMULLB_ZZZI : sve2_int_mul_long_by_indexed_elem<0b100, "sqdmullb">;		defm SQDMULLB_ZZZI : sve2_int_mul_long_by_indexed_elem<0b100, "sqdmullb">;
defm SQDMULLT_ZZZI : sve2_int_mul_long_by_indexed_elem<0b101, "sqdmullt">;		defm SQDMULLT_ZZZI : sve2_int_mul_long_by_indexed_elem<0b101, "sqdmullt">;

// SVE2 integer multiply-add long (indexed)		// SVE2 integer multiply-add long (indexed)
defm SMLALB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1000, "smlalb">;		defm SMLALB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1000, "smlalb", int_aarch64_sve_smlalb>;
defm SMLALT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1001, "smlalt">;		defm SMLALT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1001, "smlalt", int_aarch64_sve_smlalt>;
defm UMLALB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1010, "umlalb">;		defm UMLALB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1010, "umlalb", int_aarch64_sve_umlalb>;
defm UMLALT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1011, "umlalt">;		defm UMLALT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1011, "umlalt", int_aarch64_sve_umlalt>;
defm SMLSLB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1100, "smlslb">;		defm SMLSLB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1100, "smlslb", int_aarch64_sve_smlslb>;
defm SMLSLT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1101, "smlslt">;		defm SMLSLT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1101, "smlslt", int_aarch64_sve_smlslt>;
defm UMLSLB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1110, "umlslb">;		defm UMLSLB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1110, "umlslb", int_aarch64_sve_umlslb>;
defm UMLSLT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1111, "umlslt">;		defm UMLSLT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b1111, "umlslt", int_aarch64_sve_umlslt>;
		sdesmalenUnsubmitted Not Done Reply Inline Actions In order to be consistent with other intrinsics that have indexed form (such as `int_aarch64_sve_fmlalb_lane`), these indexed forms are better named `int_aarch64_sve_smlalb_lane`, where `int_aarch64_sve_smlalb` are used for the vectors unpredicated form. It may come across as if I'm being a bit pedantic here with the naming, but not deviating unnecessarily from our downstream implementation (see D71712 for reference) really helps us when we go and upstream the Clang side of the ACLE (which we're preparing to upstream as we speak). We generate code for Clang to map C/C++ level builtins -> llvm ir intrinsics, and any changes to the intrinsic names like this, we will need to fix up in our mapping and tests. This probably isn't too complicated, but it would be another thing we'd need to fix, so better to avoid this if we can. sdesmalen: In order to be consistent with other intrinsics that have indexed form (such as…
		dancgrAuthorUnsubmitted Not Done Reply Inline Actions Sure, will update that. dancgr: Sure, will update that.

// SVE2 integer multiply-add long (vectors, unpredicated)		// SVE2 integer multiply-add long (vectors, unpredicated)
defm SMLALB_ZZZ : sve2_int_mla_long<0b10000, "smlalb">;		defm SMLALB_ZZZ : sve2_int_mla_long<0b10000, "smlalb">;
defm SMLALT_ZZZ : sve2_int_mla_long<0b10001, "smlalt">;		defm SMLALT_ZZZ : sve2_int_mla_long<0b10001, "smlalt">;
defm UMLALB_ZZZ : sve2_int_mla_long<0b10010, "umlalb">;		defm UMLALB_ZZZ : sve2_int_mla_long<0b10010, "umlalb">;
defm UMLALT_ZZZ : sve2_int_mla_long<0b10011, "umlalt">;		defm UMLALT_ZZZ : sve2_int_mla_long<0b10011, "umlalt">;
defm SMLSLB_ZZZ : sve2_int_mla_long<0b10100, "smlslb">;		defm SMLSLB_ZZZ : sve2_int_mla_long<0b10100, "smlslb">;
defm SMLSLT_ZZZ : sve2_int_mla_long<0b10101, "smlslt">;		defm SMLSLT_ZZZ : sve2_int_mla_long<0b10101, "smlslt">;
defm UMLSLB_ZZZ : sve2_int_mla_long<0b10110, "umlslb">;		defm UMLSLB_ZZZ : sve2_int_mla_long<0b10110, "umlslb">;
defm UMLSLT_ZZZ : sve2_int_mla_long<0b10111, "umlslt">;		defm UMLSLT_ZZZ : sve2_int_mla_long<0b10111, "umlslt">;

// SVE2 saturating multiply-add long (indexed)		// SVE2 saturating multiply-add long (indexed)
defm SQDMLALB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b0100, "sqdmlalb">;		defm SQDMLALB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b0100, "sqdmlalb", null_frag>;
defm SQDMLALT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b0101, "sqdmlalt">;		defm SQDMLALT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b0101, "sqdmlalt", null_frag>;
defm SQDMLSLB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b0110, "sqdmlslb">;		defm SQDMLSLB_ZZZI : sve2_int_mla_long_by_indexed_elem<0b0110, "sqdmlslb", null_frag>;
defm SQDMLSLT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b0111, "sqdmlslt">;		defm SQDMLSLT_ZZZI : sve2_int_mla_long_by_indexed_elem<0b0111, "sqdmlslt", null_frag>;

// SVE2 saturating multiply-add long (vectors, unpredicated)		// SVE2 saturating multiply-add long (vectors, unpredicated)
defm SQDMLALB_ZZZ : sve2_int_mla_long<0b11000, "sqdmlalb">;		defm SQDMLALB_ZZZ : sve2_int_mla_long<0b11000, "sqdmlalb">;
defm SQDMLALT_ZZZ : sve2_int_mla_long<0b11001, "sqdmlalt">;		defm SQDMLALT_ZZZ : sve2_int_mla_long<0b11001, "sqdmlalt">;
defm SQDMLSLB_ZZZ : sve2_int_mla_long<0b11010, "sqdmlslb">;		defm SQDMLSLB_ZZZ : sve2_int_mla_long<0b11010, "sqdmlslb">;
defm SQDMLSLT_ZZZ : sve2_int_mla_long<0b11011, "sqdmlslt">;		defm SQDMLSLT_ZZZ : sve2_int_mla_long<0b11011, "sqdmlslt">;

// SVE2 saturating multiply-add interleaved long		// SVE2 saturating multiply-add interleaved long
▲ Show 20 Lines • Show All 318 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

Show First 20 Lines • Show All 2,396 Lines • ▼ Show 20 Lines	def _D : sve2_int_mla_by_indexed_elem<0b11, { 0b000, opc, S }, asm, ZPR64, ZPR64, ZPR4b64, VectorIndexD> {
let Inst{19-16} = Zm;		let Inst{19-16} = Zm;
}		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SVE2 Integer Multiply-Add Long - Indexed Group		// SVE2 Integer Multiply-Add Long - Indexed Group
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

multiclass sve2_int_mla_long_by_indexed_elem<bits<4> opc, string asm> {		multiclass sve2_int_mla_long_by_indexed_elem<bits<4> opc, string asm, SDPatternOperator op> {
def _S : sve2_int_mla_by_indexed_elem<0b10, { opc{3}, 0b0, opc{2-1}, ?, opc{0} },		def _S : sve2_int_mla_by_indexed_elem<0b10, { opc{3}, 0b0, opc{2-1}, ?, opc{0} },
asm, ZPR32, ZPR16, ZPR3b16, VectorIndexH> {		asm, ZPR32, ZPR16, ZPR3b16, VectorIndexH> {
bits<3> Zm;		bits<3> Zm;
bits<3> iop;		bits<3> iop;
let Inst{20-19} = iop{2-1};		let Inst{20-19} = iop{2-1};
let Inst{18-16} = Zm;		let Inst{18-16} = Zm;
let Inst{11} = iop{0};		let Inst{11} = iop{0};
}		}
def _D : sve2_int_mla_by_indexed_elem<0b11, { opc{3}, 0b0, opc{2-1}, ?, opc{0} },		def _D : sve2_int_mla_by_indexed_elem<0b11, { opc{3}, 0b0, opc{2-1}, ?, opc{0} },
asm, ZPR64, ZPR32, ZPR4b32, VectorIndexS> {		asm, ZPR64, ZPR32, ZPR4b32, VectorIndexS> {
bits<4> Zm;		bits<4> Zm;
bits<2> iop;		bits<2> iop;
let Inst{20} = iop{1};		let Inst{20} = iop{1};
let Inst{19-16} = Zm;		let Inst{19-16} = Zm;
let Inst{11} = iop{0};		let Inst{11} = iop{0};
}		}

		def : SVE_4_Op_Imm_Pat<nxv4i32, op, nxv4i32, nxv8i16, nxv8i16, i64, VectorIndexH_timm, !cast<Instruction>(NAME # _S)>;
		def : SVE_4_Op_Imm_Pat<nxv2i64, op, nxv2i64, nxv4i32, nxv4i32, i64, VectorIndexS_timm, !cast<Instruction>(NAME # _D)>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SVE Integer Dot Product Group		// SVE Integer Dot Product Group
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class sve_intx_dot<bit sz, bit U, string asm, ZPRRegOp zprty1,		class sve_intx_dot<bit sz, bit U, string asm, ZPRRegOp zprty1,
ZPRRegOp zprty2>		ZPRRegOp zprty2>
▲ Show 20 Lines • Show All 4,421 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve2-mla-indexed.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s \| FileCheck %s

				;
				; SMLALB
				;
				define <vscale x 4 x i32> @smlalb_i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: smlalb_i32
				; CHECK: smlalb z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.smlalb.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 1)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @smlalb_i32_2(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: smlalb_i32_2
				; CHECK: smlalb z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.smlalb.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 7)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @smlalb_i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: smlalb_i64
				; CHECK: smlalb z0.d, z1.s, z2.s[0]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.smlalb.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 0)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @smlalb_i64_2(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: smlalb_i64_2
				; CHECK: smlalb z0.d, z1.s, z2.s[3]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.smlalb.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 3)
				ret <vscale x 2 x i64> %res
				}

				;
				; SMLALT
				;
				define <vscale x 4 x i32> @smlalt_i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: smlalt_i32
				; CHECK: smlalt z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.smlalt.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 1)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @smlalt_i32_2(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: smlalt_i32_2
				; CHECK: smlalt z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.smlalt.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 7)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @smlalt_i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: smlalt_i64
				; CHECK: smlalt z0.d, z1.s, z2.s[0]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.smlalt.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 0)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @smlalt_i64_2(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: smlalt_i64_2
				; CHECK: smlalt z0.d, z1.s, z2.s[3]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.smlalt.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 3)
				ret <vscale x 2 x i64> %res
				}

				;
				; UMLALB
				;
				define <vscale x 4 x i32> @umlalb_i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: umlalb_i32
				; CHECK: umlalb z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.umlalb.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 1)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @umlalb_i32_2(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: umlalb_i32_2
				; CHECK: umlalb z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.umlalb.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 7)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @umlalb_i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: umlalb_i64
				; CHECK: umlalb z0.d, z1.s, z2.s[0]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.umlalb.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 0)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @umlalb_i64_2(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: umlalb_i64_2
				; CHECK: umlalb z0.d, z1.s, z2.s[3]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.umlalb.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 3)
				ret <vscale x 2 x i64> %res
				}

				;
				; UMLALT
				;
				define <vscale x 4 x i32> @umlalt_i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: umlalt_i32
				; CHECK: umlalt z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.umlalt.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 1)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @umlalt_i32_2(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: umlalt_i32_2
				; CHECK: umlalt z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.umlalt.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 7)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @umlalt_i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: umlalt_i64
				; CHECK: umlalt z0.d, z1.s, z2.s[0]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.umlalt.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 0)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @umlalt_i64_2(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: umlalt_i64_2
				; CHECK: umlalt z0.d, z1.s, z2.s[3]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.umlalt.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 3)
				ret <vscale x 2 x i64> %res
				}

				;
				; SMLSLB
				;
				define <vscale x 4 x i32> @smlslb_i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: smlslb_i32
				; CHECK: smlslb z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.smlslb.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 1)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @smlslb_i32_2(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: smlslb_i32_2
				; CHECK: smlslb z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.smlslb.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 7)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @smlslb_i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: smlslb_i64
				; CHECK: smlslb z0.d, z1.s, z2.s[0]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.smlslb.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 0)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @smlslb_i64_2(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: smlslb_i64_2
				; CHECK: smlslb z0.d, z1.s, z2.s[3]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.smlslb.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 3)
				ret <vscale x 2 x i64> %res
				}

				;
				; SMLSLT
				;
				define <vscale x 4 x i32> @smlslt_i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: smlslt_i32
				; CHECK: smlslt z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.smlslt.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 1)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @smlslt_i32_2(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: smlslt_i32_2
				; CHECK: smlslt z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.smlslt.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 7)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @smlslt_i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: smlslt_i64
				; CHECK: smlslt z0.d, z1.s, z2.s[0]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.smlslt.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 0)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @smlslt_i64_2(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: smlslt_i64_2
				; CHECK: smlslt z0.d, z1.s, z2.s[3]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.smlslt.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 3)
				ret <vscale x 2 x i64> %res
				}

				;
				; UMLSLB
				;
				define <vscale x 4 x i32> @umlslb_i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: umlslb_i32
				; CHECK: umlslb z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.umlslb.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 1)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @umlslb_i32_2(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: umlslb_i32_2
				; CHECK: umlslb z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.umlslb.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 7)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @umlslb_i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: umlslb_i64
				; CHECK: umlslb z0.d, z1.s, z2.s[0]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.umlslb.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 0)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @umlslb_i64_2(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: umlslb_i64_2
				; CHECK: umlslb z0.d, z1.s, z2.s[3]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.umlslb.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 3)
				ret <vscale x 2 x i64> %res
				}

				;
				; UMLSLT
				;
				define <vscale x 4 x i32> @umlslt_i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: umlslt_i32
				; CHECK: umlslt z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.umlslt.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 1)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 4 x i32> @umlslt_i32_2(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c) {
				; CHECK-LABEL: umlslt_i32_2
				; CHECK: umlslt z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%res = call <vscale x 4 x i32> @llvm.aarch64.sve.umlslt.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c,
				i64 7)
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @umlslt_i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: umlslt_i64
				; CHECK: umlslt z0.d, z1.s, z2.s[0]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.umlslt.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 0)
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 2 x i64> @umlslt_i64_2(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c) {
				; CHECK-LABEL: umlslt_i64_2
				; CHECK: umlslt z0.d, z1.s, z2.s[3]
				; CHECK-NEXT: ret
				%res = call <vscale x 2 x i64> @llvm.aarch64.sve.umlslt.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c,
				i64 3)
				ret <vscale x 2 x i64> %res
				}

				declare <vscale x 4 x i32> @llvm.aarch64.sve.smlalb.nxv4i32(<vscale x 4 x i32>, <vscale x 8 x i16>, <vscale x 8 x i16>, i64)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.smlalb.nxv2i64(<vscale x 2 x i64>, <vscale x 4 x i32>, <vscale x 4 x i32>, i64)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.smlalt.nxv4i32(<vscale x 4 x i32>, <vscale x 8 x i16>, <vscale x 8 x i16>, i64)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.smlalt.nxv2i64(<vscale x 2 x i64>, <vscale x 4 x i32>, <vscale x 4 x i32>, i64)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.umlalb.nxv4i32(<vscale x 4 x i32>, <vscale x 8 x i16>, <vscale x 8 x i16>, i64)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.umlalb.nxv2i64(<vscale x 2 x i64>, <vscale x 4 x i32>, <vscale x 4 x i32>, i64)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.umlalt.nxv4i32(<vscale x 4 x i32>, <vscale x 8 x i16>, <vscale x 8 x i16>, i64)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.umlalt.nxv2i64(<vscale x 2 x i64>, <vscale x 4 x i32>, <vscale x 4 x i32>, i64)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.smlslb.nxv4i32(<vscale x 4 x i32>, <vscale x 8 x i16>, <vscale x 8 x i16>, i64)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.smlslb.nxv2i64(<vscale x 2 x i64>, <vscale x 4 x i32>, <vscale x 4 x i32>, i64)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.smlslt.nxv4i32(<vscale x 4 x i32>, <vscale x 8 x i16>, <vscale x 8 x i16>, i64)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.smlslt.nxv2i64(<vscale x 2 x i64>, <vscale x 4 x i32>, <vscale x 4 x i32>, i64)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.umlslb.nxv4i32(<vscale x 4 x i32>, <vscale x 8 x i16>, <vscale x 8 x i16>, i64)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.umlslb.nxv2i64(<vscale x 2 x i64>, <vscale x 4 x i32>, <vscale x 4 x i32>, i64)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.umlslt.nxv4i32(<vscale x 4 x i32>, <vscale x 8 x i16>, <vscale x 8 x i16>, i64)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.umlslt.nxv2i64(<vscale x 2 x i64>, <vscale x 4 x i32>, <vscale x 4 x i32>, i64)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add SVE2 mla indexed intrinsicsClosedPublic

Details

Diff Detail