This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsAArch64.td
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64SMEInstrInfo.td
1/1
SMEInstrFormats.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/1
sme2-intrinsics-mlall.ll

Differential D143276

[SME2][AArch64] Add multi-single multiply-add long long intrinsics
ClosedPublic

Authored by kmclaughlin on Feb 3 2023, 8:56 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
MattDevereau
aemerson

Commits

rGba23bca0a83d: [SME2][AArch64] Add multi-single multiply-add long long intrinsics

Summary

Adds intrinsics for the following SME2 instructions:

smlall (1, 2 & 4 vectors)
umlall (1, 2 & 4 vectors)
smlsll (1, 2 & 4 vectors)
umlsll (1, 2 & 4 vectors)
sumlall (2 & 4 vectors)
usmlall (1, 2 & 4 vectors)

NOTE: These intrinsics are still in development and are subject to future changes.

Diff Detail

Event Timeline

kmclaughlin created this revision.Feb 3 2023, 8:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 8:56 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

kmclaughlin requested review of this revision.Feb 3 2023, 8:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 8:56 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

kmclaughlin added a child revision: D143277: [SME2][AArch64] Add multi-multi multiply-add long long intrinsics.Feb 3 2023, 8:58 AM

Harbormaster completed remote builds in B211746: Diff 494665.Feb 3 2023, 10:22 AM

david-arm added inline comments.Feb 8 2023, 8:47 AM

llvm/lib/Target/AArch64/SMEInstrFormats.td

2757

This is just a suggestion, but in theory you could reduce the amount of duplication by creating a multiclass sme2_mla_ll_array_vg24_single like this:

multiclass sme2_mla_ll_array_vg24_single<string mnemonic, bits<5> op,
                                        MatrixOperand matrix_ty,
                                        RegisterOperand multi_vector_ty,
                                        ZPRRegOp zpr_ty, ValueType vt, SDPatternOperator intrinsic> {
  def NAME: sme2_mla_ll_array_vg24_single<op, matrix_ty, multi_vector_ty,
                                        zpr_ty, mnemonic>, SMEPseudo2Instr<NAME, 1>;

  def NAME # _PSEUDO : sme2_za_array_2op_multi_single_pseudo<NAME, uimm1s4range, multi_vector_ty, zpr_ty, SMEMatrixArray>;

  def : InstAlias<mnemonic # "\t$ZAd[$Rv, $imm], $Zn, $Zm",
                 (!cast<Instruction>(NAME) matrix_ty:$ZAd,  MatrixIndexGPR32Op8_11:$Rv, uimm1s4range:$imm, multi_vector_ty:$Zn, zpr_ty:$Zm), 0>;
}

Then for each of vg2 and vg4 you just need:

multiclass sme2_mla_ll_array_vg2_single<string mnemonic, bits<5> op,
                                        MatrixOperand matrix_ty,
                                        RegisterOperand multi_vector_ty,
                                        ZPRRegOp zpr_ty, ValueType vt, SDPatternOperator intrinsic> {
  defm : sme2_mla_ll_array_vg24_single;

  def : SME2_ZA_TwoOp_VG2_Multi_Single_Pat<NAME, intrinsic, uimm1s4range, zpr_ty, vt, tileslicerange1s4>;
}

Please feel free to ignore this suggestion if you think it doesn't improve things. I wouldn't hold up the patch for it!

llvm/test/CodeGen/AArch64/sme2-intrinsics-mlall.ll

nit: I don't think we need the -mattr=+sve flag here, since sme2 implies it.

Matt added a subscriber: Matt.Feb 8 2023, 2:36 PM

david-arm mentioned this in D143277: [SME2][AArch64] Add multi-multi multiply-add long long intrinsics.Feb 9 2023, 5:25 AM

Added a sme2_mla_ll_array_vg24_single multiclass to reduce duplication in sme2_mla_ll_array_vg2_single & sme2_mla_ll_array_vg4_single
Removed -mattr=+sve from sme2-intrinsics-mlall.ll

LGTM! Muy bueno!

This revision is now accepted and ready to land.Feb 15 2023, 8:59 AM

Harbormaster completed remote builds in B213914: Diff 497698.Feb 15 2023, 10:28 AM

Closed by commit rGba23bca0a83d: [SME2][AArch64] Add multi-single multiply-add long long intrinsics (authored by kmclaughlin). · Explain WhyFeb 16 2023, 5:13 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin added a commit: rGba23bca0a83d: [SME2][AArch64] Add multi-single multiply-add long long intrinsics.

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

21 lines

lib/

Target/

AArch64/

AArch64SMEInstrInfo.td

58 lines

SMEInstrFormats.td

33 lines

test/

CodeGen/

AArch64/

sme2-intrinsics-mlall.ll

528 lines

Diff 497698

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 3,034 Lines • ▼ Show 20 Lines	foreach instr = ["mlal", "mlsl"] in {
def int_aarch64_sme_ # ty # instr # _vg2x4 : SME2_Matrix_ArrayVector_VG4_Multi_Multi_Intrinsic;		def int_aarch64_sme_ # ty # instr # _vg2x4 : SME2_Matrix_ArrayVector_VG4_Multi_Multi_Intrinsic;

def int_aarch64_sme_ # ty # instr # _lane_vg2x1 : SME2_Matrix_ArrayVector_Single_Index_Intrinsic;		def int_aarch64_sme_ # ty # instr # _lane_vg2x1 : SME2_Matrix_ArrayVector_Single_Index_Intrinsic;
def int_aarch64_sme_ # ty # instr # _lane_vg2x2 : SME2_Matrix_ArrayVector_VG2_Multi_Index_Intrinsic;		def int_aarch64_sme_ # ty # instr # _lane_vg2x2 : SME2_Matrix_ArrayVector_VG2_Multi_Index_Intrinsic;
def int_aarch64_sme_ # ty # instr # _lane_vg2x4 : SME2_Matrix_ArrayVector_VG4_Multi_Index_Intrinsic;		def int_aarch64_sme_ # ty # instr # _lane_vg2x4 : SME2_Matrix_ArrayVector_VG4_Multi_Index_Intrinsic;
}		}
}		}

		//
		// Multi-vector multiply-add long long
		//

		foreach ty = ["s", "u"] in {
		foreach instr = ["mla", "mls"] in {
		foreach za = ["za32", "za64"] in {
		def int_aarch64_sme_ # ty # instr # _ # za # _single_vg4x1 : SME2_Matrix_ArrayVector_Single_Single_Intrinsic;
		def int_aarch64_sme_ # ty # instr # _ # za # _single_vg4x2 : SME2_Matrix_ArrayVector_VG2_Multi_Single_Intrinsic;
		def int_aarch64_sme_ # ty # instr # _ # za # _single_vg4x4 : SME2_Matrix_ArrayVector_VG4_Multi_Single_Intrinsic;
		}
		}
		}

		def int_aarch64_sme_sumla_za32_single_vg4x2 : SME2_Matrix_ArrayVector_VG2_Multi_Single_Intrinsic;
		def int_aarch64_sme_sumla_za32_single_vg4x4 : SME2_Matrix_ArrayVector_VG4_Multi_Single_Intrinsic;

		def int_aarch64_sme_usmla_za32_single_vg4x1 : SME2_Matrix_ArrayVector_Single_Single_Intrinsic;
		def int_aarch64_sme_usmla_za32_single_vg4x2 : SME2_Matrix_ArrayVector_VG2_Multi_Single_Intrinsic;
		def int_aarch64_sme_usmla_za32_single_vg4x4 : SME2_Matrix_ArrayVector_VG4_Multi_Single_Intrinsic;

// Multi-vector signed saturating doubling multiply high		// Multi-vector signed saturating doubling multiply high

def int_aarch64_sve_sqdmulh_single_vgx2 : SME2_VG2_Multi_Single_Intrinsic;		def int_aarch64_sve_sqdmulh_single_vgx2 : SME2_VG2_Multi_Single_Intrinsic;
def int_aarch64_sve_sqdmulh_single_vgx4 : SME2_VG4_Multi_Single_Intrinsic;		def int_aarch64_sve_sqdmulh_single_vgx4 : SME2_VG4_Multi_Single_Intrinsic;

def int_aarch64_sve_sqdmulh_vgx2 : SME2_VG2_Multi_Multi_Intrinsic;		def int_aarch64_sve_sqdmulh_vgx2 : SME2_VG2_Multi_Multi_Intrinsic;
def int_aarch64_sve_sqdmulh_vgx4 : SME2_VG4_Multi_Multi_Intrinsic;		def int_aarch64_sve_sqdmulh_vgx4 : SME2_VG4_Multi_Multi_Intrinsic;

▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td

	Show First 20 Lines • Show All 509 Lines • ▼ Show 20 Lines
	defm USVDOT_VG4_M4ZZI_BToS : sme2_multi_vec_array_vg4_index_32b<"usvdot", 0b0101, ZZZZ_b_mul_r, ZPR4b8, nxv16i8, int_aarch64_sme_usvdot_lane_za32_vg1x4>;			defm USVDOT_VG4_M4ZZI_BToS : sme2_multi_vec_array_vg4_index_32b<"usvdot", 0b0101, ZZZZ_b_mul_r, ZPR4b8, nxv16i8, int_aarch64_sme_usvdot_lane_za32_vg1x4>;

	defm UVDOT_VG2_M2ZZI_HtoS : sme2_multi_vec_array_vg2_index_32b<"uvdot", 0b0110, ZZ_h_mul_r, ZPR4b16, nxv8i16, int_aarch64_sme_uvdot_lane_za32_vg1x2>;			defm UVDOT_VG2_M2ZZI_HtoS : sme2_multi_vec_array_vg2_index_32b<"uvdot", 0b0110, ZZ_h_mul_r, ZPR4b16, nxv8i16, int_aarch64_sme_uvdot_lane_za32_vg1x2>;
	defm UVDOT_VG4_M4ZZI_BtoS : sme2_multi_vec_array_vg4_index_32b<"uvdot", 0b0110, ZZZZ_b_mul_r, ZPR4b8, nxv16i8, int_aarch64_sme_uvdot_lane_za32_vg1x4>;			defm UVDOT_VG4_M4ZZI_BtoS : sme2_multi_vec_array_vg4_index_32b<"uvdot", 0b0110, ZZZZ_b_mul_r, ZPR4b8, nxv16i8, int_aarch64_sme_uvdot_lane_za32_vg1x4>;

	def SMLALL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"smlall", 0b000>;			def SMLALL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"smlall", 0b000>;
	defm SMLALL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"smlall", 0b000>;			defm SMLALL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"smlall", 0b000>;
	defm SMLALL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"smlall", 0b000>;			defm SMLALL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"smlall", 0b000>;
	def SMLALL_MZZ_BtoS : sme2_mla_ll_array_single<"smlall", 0b0000, MatrixOp32, ZPR8, ZPR4b8>;			defm SMLALL_MZZ_BtoS : sme2_mla_ll_array_single<"smlall", 0b0000, MatrixOp32, ZPR8, ZPR4b8, nxv16i8, int_aarch64_sme_smla_za32_single_vg4x1>;
	defm SMLALL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg24_single<"smlall", 0b00000, MatrixOp32, ZZ_b, ZPR4b8>;			defm SMLALL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg2_single<"smlall", 0b00000, MatrixOp32, ZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_smla_za32_single_vg4x2>;
	defm SMLALL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg24_single<"smlall", 0b01000, MatrixOp32, ZZZZ_b, ZPR4b8>;			defm SMLALL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg4_single<"smlall", 0b01000, MatrixOp32, ZZZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_smla_za32_single_vg4x4>;
	defm SMLALL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"smlall", 0b0000, MatrixOp32, ZZ_b_mul_r>;			defm SMLALL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"smlall", 0b0000, MatrixOp32, ZZ_b_mul_r>;
	defm SMLALL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"smlall", 0b0000, MatrixOp32, ZZZZ_b_mul_r>;			defm SMLALL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"smlall", 0b0000, MatrixOp32, ZZZZ_b_mul_r>;

	def USMLALL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"usmlall", 0b001>;			def USMLALL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"usmlall", 0b001>;
	defm USMLALL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"usmlall", 0b100>;			defm USMLALL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"usmlall", 0b100>;
	defm USMLALL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"usmlall", 0b100>;			defm USMLALL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"usmlall", 0b100>;
	def USMLALL_MZZ_BtoS : sme2_mla_ll_array_single<"usmlall", 0b0001, MatrixOp32, ZPR8, ZPR4b8>;			defm USMLALL_MZZ_BtoS : sme2_mla_ll_array_single<"usmlall", 0b0001, MatrixOp32, ZPR8, ZPR4b8, nxv16i8, int_aarch64_sme_usmla_za32_single_vg4x1>;
	defm USMLALL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg24_single<"usmlall", 0b00001, MatrixOp32, ZZ_b, ZPR4b8>;			defm USMLALL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg2_single<"usmlall", 0b00001, MatrixOp32, ZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_usmla_za32_single_vg4x2>;
	defm USMLALL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg24_single<"usmlall", 0b01001, MatrixOp32, ZZZZ_b, ZPR4b8>;			defm USMLALL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg4_single<"usmlall", 0b01001, MatrixOp32, ZZZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_usmla_za32_single_vg4x4>;
	defm USMLALL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"usmlall", 0b0001, MatrixOp32, ZZ_b_mul_r>;			defm USMLALL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"usmlall", 0b0001, MatrixOp32, ZZ_b_mul_r>;
	defm USMLALL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"usmlall", 0b0001, MatrixOp32, ZZZZ_b_mul_r>;			defm USMLALL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"usmlall", 0b0001, MatrixOp32, ZZZZ_b_mul_r>;

	def SMLSLL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"smlsll", 0b010>;			def SMLSLL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"smlsll", 0b010>;
	defm SMLSLL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"smlsll", 0b001>;			defm SMLSLL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"smlsll", 0b001>;
	defm SMLSLL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"smlsll", 0b001>;			defm SMLSLL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"smlsll", 0b001>;
	def SMLSLL_MZZ_BtoS : sme2_mla_ll_array_single<"smlsll", 0b0010, MatrixOp32, ZPR8, ZPR4b8>;			defm SMLSLL_MZZ_BtoS : sme2_mla_ll_array_single<"smlsll", 0b0010, MatrixOp32, ZPR8, ZPR4b8, nxv16i8, int_aarch64_sme_smls_za32_single_vg4x1>;
	defm SMLSLL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg24_single<"smlsll", 0b00010, MatrixOp32, ZZ_b, ZPR4b8>;			defm SMLSLL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg2_single<"smlsll", 0b00010, MatrixOp32, ZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_smls_za32_single_vg4x2>;
	defm SMLSLL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg24_single<"smlsll", 0b01010, MatrixOp32, ZZZZ_b, ZPR4b8>;			defm SMLSLL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg4_single<"smlsll", 0b01010, MatrixOp32, ZZZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_smls_za32_single_vg4x4>;
	defm SMLSLL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"smlsll", 0b0010, MatrixOp32, ZZ_b_mul_r>;			defm SMLSLL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"smlsll", 0b0010, MatrixOp32, ZZ_b_mul_r>;
	defm SMLSLL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"smlsll", 0b0010, MatrixOp32, ZZZZ_b_mul_r>;			defm SMLSLL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"smlsll", 0b0010, MatrixOp32, ZZZZ_b_mul_r>;

	def UMLALL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"umlall", 0b100>;			def UMLALL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"umlall", 0b100>;
	defm UMLALL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"umlall", 0b010>;			defm UMLALL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"umlall", 0b010>;
	defm UMLALL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"umlall", 0b010>;			defm UMLALL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"umlall", 0b010>;
	def UMLALL_MZZ_BtoS : sme2_mla_ll_array_single<"umlall", 0b0100, MatrixOp32, ZPR8, ZPR4b8>;			defm UMLALL_MZZ_BtoS : sme2_mla_ll_array_single<"umlall", 0b0100, MatrixOp32, ZPR8, ZPR4b8, nxv16i8, int_aarch64_sme_umla_za32_single_vg4x1>;
	defm UMLALL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg24_single<"umlall", 0b00100, MatrixOp32, ZZ_b, ZPR4b8>;			defm UMLALL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg2_single<"umlall", 0b00100, MatrixOp32, ZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_umla_za32_single_vg4x2>;
	defm UMLALL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg24_single<"umlall", 0b01100, MatrixOp32, ZZZZ_b, ZPR4b8>;			defm UMLALL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg4_single<"umlall", 0b01100, MatrixOp32, ZZZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_umla_za32_single_vg4x4>;
	defm UMLALL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"umlall", 0b0100, MatrixOp32, ZZ_b_mul_r>;			defm UMLALL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"umlall", 0b0100, MatrixOp32, ZZ_b_mul_r>;
	defm UMLALL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"umlall", 0b0100, MatrixOp32, ZZZZ_b_mul_r>;			defm UMLALL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"umlall", 0b0100, MatrixOp32, ZZZZ_b_mul_r>;

	def SUMLALL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"sumlall", 0b101>;			def SUMLALL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"sumlall", 0b101>;
	defm SUMLALL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"sumlall", 0b110>;			defm SUMLALL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"sumlall", 0b110>;
	defm SUMLALL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"sumlall", 0b110>;			defm SUMLALL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"sumlall", 0b110>;
	defm SUMLALL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg24_single<"sumlall", 0b00101, MatrixOp32, ZZ_b, ZPR4b8>;			defm SUMLALL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg2_single<"sumlall", 0b00101, MatrixOp32, ZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_sumla_za32_single_vg4x2>;
	defm SUMLALL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg24_single<"sumlall", 0b01101, MatrixOp32, ZZZZ_b, ZPR4b8>;			defm SUMLALL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg4_single<"sumlall", 0b01101, MatrixOp32, ZZZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_sumla_za32_single_vg4x4>;

	def UMLSLL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"umlsll", 0b110>;			def UMLSLL_MZZI_BtoS : sme2_mla_ll_array_index_32b<"umlsll", 0b110>;
	defm UMLSLL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"umlsll", 0b011>;			defm UMLSLL_VG2_M2ZZI_BtoS : sme2_mla_ll_array_vg2_index_32b<"umlsll", 0b011>;
	defm UMLSLL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"umlsll", 0b011>;			defm UMLSLL_VG4_M4ZZI_BtoS : sme2_mla_ll_array_vg4_index_32b<"umlsll", 0b011>;
	def UMLSLL_MZZ_BtoS : sme2_mla_ll_array_single<"umlsll", 0b0110, MatrixOp32, ZPR8, ZPR4b8>;			defm UMLSLL_MZZ_BtoS : sme2_mla_ll_array_single<"umlsll", 0b0110, MatrixOp32, ZPR8, ZPR4b8, nxv16i8, int_aarch64_sme_umls_za32_single_vg4x1>;
	defm UMLSLL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg24_single<"umlsll", 0b00110, MatrixOp32, ZZ_b, ZPR4b8>;			defm UMLSLL_VG2_M2ZZ_BtoS : sme2_mla_ll_array_vg2_single<"umlsll", 0b00110, MatrixOp32, ZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_umls_za32_single_vg4x2>;
	defm UMLSLL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg24_single<"umlsll", 0b01110, MatrixOp32, ZZZZ_b, ZPR4b8>;			defm UMLSLL_VG4_M4ZZ_BtoS : sme2_mla_ll_array_vg4_single<"umlsll", 0b01110, MatrixOp32, ZZZZ_b, ZPR4b8, nxv16i8, int_aarch64_sme_umls_za32_single_vg4x4>;
	defm UMLSLL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"umlsll", 0b0110, MatrixOp32, ZZ_b_mul_r>;			defm UMLSLL_VG2_M2Z2Z_BtoS : sme2_mla_ll_array_vg2_multi<"umlsll", 0b0110, MatrixOp32, ZZ_b_mul_r>;
	defm UMLSLL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"umlsll", 0b0110, MatrixOp32, ZZZZ_b_mul_r>;			defm UMLSLL_VG4_M4Z4Z_BtoS : sme2_mla_ll_array_vg4_multi<"umlsll", 0b0110, MatrixOp32, ZZZZ_b_mul_r>;

	defm BMOPA_MPPZZ_S : sme2_int_bmopx_tile<"bmopa", 0b100, int_aarch64_sme_bmopa_za32>;			defm BMOPA_MPPZZ_S : sme2_int_bmopx_tile<"bmopa", 0b100, int_aarch64_sme_bmopa_za32>;
	defm BMOPS_MPPZZ_S : sme2_int_bmopx_tile<"bmops", 0b101, int_aarch64_sme_bmops_za32>;			defm BMOPS_MPPZZ_S : sme2_int_bmopx_tile<"bmops", 0b101, int_aarch64_sme_bmops_za32>;

	defm SMOPA_MPPZZ_HtoS : sme2_int_mopx_tile<"smopa", 0b000, int_aarch64_sme_smopa_za32>;			defm SMOPA_MPPZZ_HtoS : sme2_int_mopx_tile<"smopa", 0b000, int_aarch64_sme_smopa_za32>;
	defm SMOPS_MPPZZ_HtoS : sme2_int_mopx_tile<"smops", 0b001, int_aarch64_sme_smops_za32>;			defm SMOPS_MPPZZ_HtoS : sme2_int_mopx_tile<"smops", 0b001, int_aarch64_sme_smops_za32>;
	▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	defm UDOT_VG2_M2Z2Z_HtoD : sme2_dot_mla_add_sub_array_vg2_multi<"udot", 0b110110, MatrixOp64, ZZ_h_mul_r, nxv8i16, int_aarch64_sme_udot_za64_vg1x2>;			defm UDOT_VG2_M2Z2Z_HtoD : sme2_dot_mla_add_sub_array_vg2_multi<"udot", 0b110110, MatrixOp64, ZZ_h_mul_r, nxv8i16, int_aarch64_sme_udot_za64_vg1x2>;
	defm UDOT_VG4_M4Z4Z_HtoD : sme2_dot_mla_add_sub_array_vg4_multi<"udot", 0b110110, MatrixOp64, ZZZZ_h_mul_r, nxv8i16, int_aarch64_sme_udot_za64_vg1x4>;			defm UDOT_VG4_M4Z4Z_HtoD : sme2_dot_mla_add_sub_array_vg4_multi<"udot", 0b110110, MatrixOp64, ZZZZ_h_mul_r, nxv8i16, int_aarch64_sme_udot_za64_vg1x4>;

	defm UVDOT_VG4_M4ZZI_HtoD : sme2_multi_vec_array_vg4_index_64b<"uvdot", 0b111, ZZZZ_h_mul_r, ZPR4b16, nxv8i16, int_aarch64_sme_uvdot_lane_za64_vg1x4>;			defm UVDOT_VG4_M4ZZI_HtoD : sme2_multi_vec_array_vg4_index_64b<"uvdot", 0b111, ZZZZ_h_mul_r, ZPR4b16, nxv8i16, int_aarch64_sme_uvdot_lane_za64_vg1x4>;

	def SMLALL_MZZI_HtoD : sme2_mla_ll_array_index_64b<"smlall", 0b00>;			def SMLALL_MZZI_HtoD : sme2_mla_ll_array_index_64b<"smlall", 0b00>;
	defm SMLALL_VG2_M2ZZI_HtoD : sme2_mla_ll_array_vg2_index_64b<"smlall", 0b00>;			defm SMLALL_VG2_M2ZZI_HtoD : sme2_mla_ll_array_vg2_index_64b<"smlall", 0b00>;
	defm SMLALL_VG4_M4ZZI_HtoD : sme2_mla_ll_array_vg4_index_64b<"smlall", 0b00>;			defm SMLALL_VG4_M4ZZI_HtoD : sme2_mla_ll_array_vg4_index_64b<"smlall", 0b00>;
	def SMLALL_MZZ_HtoD : sme2_mla_ll_array_single<"smlall", 0b1000, MatrixOp64, ZPR16, ZPR4b16>;			defm SMLALL_MZZ_HtoD : sme2_mla_ll_array_single<"smlall", 0b1000, MatrixOp64, ZPR16, ZPR4b16, nxv8i16, int_aarch64_sme_smla_za64_single_vg4x1>;
	defm SMLALL_VG2_M2ZZ_HtoD : sme2_mla_ll_array_vg24_single<"smlall", 0b10000, MatrixOp64, ZZ_h, ZPR4b16>;			defm SMLALL_VG2_M2ZZ_HtoD : sme2_mla_ll_array_vg2_single<"smlall", 0b10000, MatrixOp64, ZZ_h, ZPR4b16, nxv8i16, int_aarch64_sme_smla_za64_single_vg4x2>;
	defm SMLALL_VG4_M4ZZ_HtoD : sme2_mla_ll_array_vg24_single<"smlall", 0b11000, MatrixOp64, ZZZZ_h, ZPR4b16>;			defm SMLALL_VG4_M4ZZ_HtoD : sme2_mla_ll_array_vg4_single<"smlall", 0b11000, MatrixOp64, ZZZZ_h, ZPR4b16, nxv8i16, int_aarch64_sme_smla_za64_single_vg4x4>;
	defm SMLALL_VG2_M2Z2Z_HtoD : sme2_mla_ll_array_vg2_multi<"smlall", 0b1000, MatrixOp64, ZZ_h_mul_r>;			defm SMLALL_VG2_M2Z2Z_HtoD : sme2_mla_ll_array_vg2_multi<"smlall", 0b1000, MatrixOp64, ZZ_h_mul_r>;
	defm SMLALL_VG4_M4Z4Z_HtoD : sme2_mla_ll_array_vg4_multi<"smlall", 0b1000, MatrixOp64, ZZZZ_h_mul_r>;			defm SMLALL_VG4_M4Z4Z_HtoD : sme2_mla_ll_array_vg4_multi<"smlall", 0b1000, MatrixOp64, ZZZZ_h_mul_r>;

	def SMLSLL_MZZI_HtoD : sme2_mla_ll_array_index_64b<"smlsll", 0b01>;			def SMLSLL_MZZI_HtoD : sme2_mla_ll_array_index_64b<"smlsll", 0b01>;
	defm SMLSLL_VG2_M2ZZI_HtoD : sme2_mla_ll_array_vg2_index_64b<"smlsll", 0b01>;			defm SMLSLL_VG2_M2ZZI_HtoD : sme2_mla_ll_array_vg2_index_64b<"smlsll", 0b01>;
	defm SMLSLL_VG4_M4ZZI_HtoD : sme2_mla_ll_array_vg4_index_64b<"smlsll", 0b01>;			defm SMLSLL_VG4_M4ZZI_HtoD : sme2_mla_ll_array_vg4_index_64b<"smlsll", 0b01>;
	def SMLSLL_MZZ_HtoD : sme2_mla_ll_array_single<"smlsll", 0b1010, MatrixOp64, ZPR16, ZPR4b16>;			defm SMLSLL_MZZ_HtoD : sme2_mla_ll_array_single<"smlsll", 0b1010, MatrixOp64, ZPR16, ZPR4b16, nxv8i16, int_aarch64_sme_smls_za64_single_vg4x1>;
	defm SMLSLL_VG2_M2ZZ_HtoD : sme2_mla_ll_array_vg24_single<"smlsll", 0b10010, MatrixOp64, ZZ_h, ZPR4b16>;			defm SMLSLL_VG2_M2ZZ_HtoD : sme2_mla_ll_array_vg2_single<"smlsll", 0b10010, MatrixOp64, ZZ_h, ZPR4b16, nxv8i16, int_aarch64_sme_smls_za64_single_vg4x2>;
	defm SMLSLL_VG4_M4ZZ_HtoD : sme2_mla_ll_array_vg24_single<"smlsll", 0b11010, MatrixOp64, ZZZZ_h, ZPR4b16>;			defm SMLSLL_VG4_M4ZZ_HtoD : sme2_mla_ll_array_vg4_single<"smlsll", 0b11010, MatrixOp64, ZZZZ_h, ZPR4b16, nxv8i16, int_aarch64_sme_smls_za64_single_vg4x4>;
	defm SMLSLL_VG2_M2Z2Z_HtoD : sme2_mla_ll_array_vg2_multi<"smlsll", 0b1010, MatrixOp64, ZZ_h_mul_r>;			defm SMLSLL_VG2_M2Z2Z_HtoD : sme2_mla_ll_array_vg2_multi<"smlsll", 0b1010, MatrixOp64, ZZ_h_mul_r>;
	defm SMLSLL_VG4_M4Z4Z_HtoD : sme2_mla_ll_array_vg4_multi<"smlsll", 0b1010, MatrixOp64, ZZZZ_h_mul_r>;			defm SMLSLL_VG4_M4Z4Z_HtoD : sme2_mla_ll_array_vg4_multi<"smlsll", 0b1010, MatrixOp64, ZZZZ_h_mul_r>;

	def UMLALL_MZZI_HtoD : sme2_mla_ll_array_index_64b<"umlall", 0b10>;			def UMLALL_MZZI_HtoD : sme2_mla_ll_array_index_64b<"umlall", 0b10>;
	defm UMLALL_VG2_M2ZZI_HtoD : sme2_mla_ll_array_vg2_index_64b<"umlall", 0b10>;			defm UMLALL_VG2_M2ZZI_HtoD : sme2_mla_ll_array_vg2_index_64b<"umlall", 0b10>;
	defm UMLALL_VG4_M4ZZI_HtoD : sme2_mla_ll_array_vg4_index_64b<"umlall", 0b10>;			defm UMLALL_VG4_M4ZZI_HtoD : sme2_mla_ll_array_vg4_index_64b<"umlall", 0b10>;
	def UMLALL_MZZ_HtoD : sme2_mla_ll_array_single<"umlall", 0b1100, MatrixOp64, ZPR16, ZPR4b16>;			defm UMLALL_MZZ_HtoD : sme2_mla_ll_array_single<"umlall", 0b1100, MatrixOp64, ZPR16, ZPR4b16, nxv8i16, int_aarch64_sme_umla_za64_single_vg4x1>;
	defm UMLALL_VG2_M2ZZ_HtoD : sme2_mla_ll_array_vg24_single<"umlall", 0b10100, MatrixOp64, ZZ_h, ZPR4b16>;			defm UMLALL_VG2_M2ZZ_HtoD : sme2_mla_ll_array_vg2_single<"umlall", 0b10100, MatrixOp64, ZZ_h, ZPR4b16, nxv8i16, int_aarch64_sme_umla_za64_single_vg4x2>;
	defm UMLALL_VG4_M4ZZ_HtoD : sme2_mla_ll_array_vg24_single<"umlall", 0b11100, MatrixOp64, ZZZZ_h, ZPR4b16>;			defm UMLALL_VG4_M4ZZ_HtoD : sme2_mla_ll_array_vg4_single<"umlall", 0b11100, MatrixOp64, ZZZZ_h, ZPR4b16, nxv8i16, int_aarch64_sme_umla_za64_single_vg4x4>;
	defm UMLALL_VG2_M2Z2Z_HtoD : sme2_mla_ll_array_vg2_multi<"umlall", 0b1100, MatrixOp64, ZZ_h_mul_r>;			defm UMLALL_VG2_M2Z2Z_HtoD : sme2_mla_ll_array_vg2_multi<"umlall", 0b1100, MatrixOp64, ZZ_h_mul_r>;
	defm UMLALL_VG4_M4Z4Z_HtoD : sme2_mla_ll_array_vg4_multi<"umlall", 0b1100, MatrixOp64, ZZZZ_h_mul_r>;			defm UMLALL_VG4_M4Z4Z_HtoD : sme2_mla_ll_array_vg4_multi<"umlall", 0b1100, MatrixOp64, ZZZZ_h_mul_r>;

	def UMLSLL_MZZI_HtoD : sme2_mla_ll_array_index_64b<"umlsll", 0b11>;			def UMLSLL_MZZI_HtoD : sme2_mla_ll_array_index_64b<"umlsll", 0b11>;
	defm UMLSLL_VG2_M2ZZI_HtoD : sme2_mla_ll_array_vg2_index_64b<"umlsll", 0b11>;			defm UMLSLL_VG2_M2ZZI_HtoD : sme2_mla_ll_array_vg2_index_64b<"umlsll", 0b11>;
	defm UMLSLL_VG4_M4ZZI_HtoD : sme2_mla_ll_array_vg4_index_64b<"umlsll", 0b11>;			defm UMLSLL_VG4_M4ZZI_HtoD : sme2_mla_ll_array_vg4_index_64b<"umlsll", 0b11>;
	def UMLSLL_MZZ_HtoD : sme2_mla_ll_array_single<"umlsll", 0b1110, MatrixOp64, ZPR16, ZPR4b16>;			defm UMLSLL_MZZ_HtoD : sme2_mla_ll_array_single<"umlsll", 0b1110, MatrixOp64, ZPR16, ZPR4b16, nxv8i16, int_aarch64_sme_umls_za64_single_vg4x1>;
	defm UMLSLL_VG2_M2ZZ_HtoD : sme2_mla_ll_array_vg24_single<"umlsll", 0b10110, MatrixOp64, ZZ_h, ZPR4b16>;			defm UMLSLL_VG2_M2ZZ_HtoD : sme2_mla_ll_array_vg2_single<"umlsll", 0b10110, MatrixOp64, ZZ_h, ZPR4b16, nxv8i16, int_aarch64_sme_umls_za64_single_vg4x2>;
	defm UMLSLL_VG4_M4ZZ_HtoD : sme2_mla_ll_array_vg24_single<"umlsll", 0b11110, MatrixOp64, ZZZZ_h, ZPR4b16>;			defm UMLSLL_VG4_M4ZZ_HtoD : sme2_mla_ll_array_vg4_single<"umlsll", 0b11110, MatrixOp64, ZZZZ_h, ZPR4b16, nxv8i16, int_aarch64_sme_umls_za64_single_vg4x4>;
	defm UMLSLL_VG2_M2Z2Z_HtoD : sme2_mla_ll_array_vg2_multi<"umlsll", 0b1110, MatrixOp64, ZZ_h_mul_r>;			defm UMLSLL_VG2_M2Z2Z_HtoD : sme2_mla_ll_array_vg2_multi<"umlsll", 0b1110, MatrixOp64, ZZ_h_mul_r>;
	defm UMLSLL_VG4_M4Z4Z_HtoD : sme2_mla_ll_array_vg4_multi<"umlsll", 0b1110, MatrixOp64, ZZZZ_h_mul_r>;			defm UMLSLL_VG4_M4Z4Z_HtoD : sme2_mla_ll_array_vg4_multi<"umlsll", 0b1110, MatrixOp64, ZZZZ_h_mul_r>;
	}			}

	let Predicates = [HasSME2, HasSMEF64F64] in {			let Predicates = [HasSME2, HasSMEF64F64] in {
	defm FMLA_VG2_M2ZZI_D : sme2_multi_vec_array_vg2_index_64b<"fmla", 0b00, ZZ_d_mul_r, ZPR4b64, nxv2f64, int_aarch64_sme_fmla_lane_vg1x2>;			defm FMLA_VG2_M2ZZI_D : sme2_multi_vec_array_vg2_index_64b<"fmla", 0b00, ZZ_d_mul_r, ZPR4b64, nxv2f64, int_aarch64_sme_fmla_lane_vg1x2>;
	defm FMLA_VG4_M4ZZI_D : sme2_multi_vec_array_vg4_index_64b<"fmla", 0b000, ZZZZ_d_mul_r, ZPR4b64, nxv2f64, int_aarch64_sme_fmla_lane_vg1x4>;			defm FMLA_VG4_M4ZZI_D : sme2_multi_vec_array_vg4_index_64b<"fmla", 0b000, ZZZZ_d_mul_r, ZPR4b64, nxv2f64, int_aarch64_sme_fmla_lane_vg1x4>;
	defm FMLA_VG2_M2ZZ_D : sme2_dot_mla_add_sub_array_vg2_single<"fmla", 0b1011000, MatrixOp64, ZZ_d, ZPR4b64, nxv2f64, int_aarch64_sme_fmla_single_vg1x2>;			defm FMLA_VG2_M2ZZ_D : sme2_dot_mla_add_sub_array_vg2_single<"fmla", 0b1011000, MatrixOp64, ZZ_d, ZPR4b64, nxv2f64, int_aarch64_sme_fmla_single_vg1x2>;
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SMEInstrFormats.td

Show First 20 Lines • Show All 2,708 Lines • ▼ Show 20 Lines	class sme2_mla_ll_array_single<string mnemonic, bits<4> op,
let Inst{12-10} = 0b001;		let Inst{12-10} = 0b001;
let Inst{9-5} = Zn;		let Inst{9-5} = Zn;
let Inst{4-2} = op{2-0};		let Inst{4-2} = op{2-0};
let Inst{1-0} = imm;		let Inst{1-0} = imm;

let Constraints = "$ZAda = $_ZAda";		let Constraints = "$ZAda = $_ZAda";
}		}

		multiclass sme2_mla_ll_array_single<string mnemonic, bits<4> op,
		MatrixOperand matrix_ty, ZPRRegOp vector_ty,
		ZPRRegOp zpr_ty, ValueType vt, SDPatternOperator intrinsic> {
		def NAME : sme2_mla_ll_array_single<mnemonic, op, matrix_ty, vector_ty, zpr_ty>, SMEPseudo2Instr<NAME, 1>;

		def NAME # _PSEUDO : sme2_za_array_2op_multi_single_pseudo<NAME, uimm2s4range, vector_ty, zpr_ty, SMEMatrixArray>;

		def : SME2_ZA_TwoOp_Multi_Single_Pat<NAME, intrinsic, uimm2s4range, zpr_ty, vt, tileslicerange2s4>;
		}

class sme2_mla_ll_array_vg24_single<bits<5> op, MatrixOperand matrix_ty,		class sme2_mla_ll_array_vg24_single<bits<5> op, MatrixOperand matrix_ty,
RegisterOperand vector_ty, ZPRRegOp zpr_ty,		RegisterOperand vector_ty, ZPRRegOp zpr_ty,
string mnemonic>		string mnemonic>
: I<(outs matrix_ty:$ZAda),		: I<(outs matrix_ty:$ZAda),
(ins matrix_ty:$_ZAda, MatrixIndexGPR32Op8_11:$Rv, uimm1s4range:$imm,		(ins matrix_ty:$_ZAda, MatrixIndexGPR32Op8_11:$Rv, uimm1s4range:$imm,
vector_ty:$Zn, zpr_ty:$Zm),		vector_ty:$Zn, zpr_ty:$Zm),
mnemonic, "\t$ZAda[$Rv, $imm, " # !if(op{3}, "vgx4", "vgx2") # "], $Zn, $Zm",		mnemonic, "\t$ZAda[$Rv, $imm, " # !if(op{3}, "vgx4", "vgx2") # "], $Zn, $Zm",
"", []>, Sched<[]> {		"", []>, Sched<[]> {
Show All 14 Lines	class sme2_mla_ll_array_vg24_single<bits<5> op, MatrixOperand matrix_ty,
let Inst{1} = 0b0;		let Inst{1} = 0b0;
let Inst{0} = imm;		let Inst{0} = imm;

let Constraints = "$ZAda = $_ZAda";		let Constraints = "$ZAda = $_ZAda";
}		}

//SME2 single-multi long long MLA two and four sources		//SME2 single-multi long long MLA two and four sources

multiclass sme2_mla_ll_array_vg24_single<string mnemonic, bits<5> op,		multiclass sme2_mla_ll_array_vg24_single<string mnemonic, bits<5> op,
		david-armUnsubmitted Done Reply Inline Actions This is just a suggestion, but in theory you could reduce the amount of duplication by creating a multiclass sme2_mla_ll_array_vg24_single like this: multiclass sme2_mla_ll_array_vg24_single<string mnemonic, bits<5> op, MatrixOperand matrix_ty, RegisterOperand multi_vector_ty, ZPRRegOp zpr_ty, ValueType vt, SDPatternOperator intrinsic> { def NAME: sme2_mla_ll_array_vg24_single<op, matrix_ty, multi_vector_ty, zpr_ty, mnemonic>, SMEPseudo2Instr<NAME, 1>; def NAME # _PSEUDO : sme2_za_array_2op_multi_single_pseudo<NAME, uimm1s4range, multi_vector_ty, zpr_ty, SMEMatrixArray>; def : InstAlias<mnemonic # "\t$ZAd[$Rv, $imm], $Zn, $Zm", (!cast<Instruction>(NAME) matrix_ty:$ZAd, MatrixIndexGPR32Op8_11:$Rv, uimm1s4range:$imm, multi_vector_ty:$Zn, zpr_ty:$Zm), 0>; } Then for each of vg2 and vg4 you just need: multiclass sme2_mla_ll_array_vg2_single<string mnemonic, bits<5> op, MatrixOperand matrix_ty, RegisterOperand multi_vector_ty, ZPRRegOp zpr_ty, ValueType vt, SDPatternOperator intrinsic> { defm : sme2_mla_ll_array_vg24_single; def : SME2_ZA_TwoOp_VG2_Multi_Single_Pat<NAME, intrinsic, uimm1s4range, zpr_ty, vt, tileslicerange1s4>; } Please feel free to ignore this suggestion if you think it doesn't improve things. I wouldn't hold up the patch for it! david-arm: This is just a suggestion, but in theory you could reduce the amount of duplication by creating…
MatrixOperand matrix_ty,		MatrixOperand matrix_ty,
RegisterOperand multi_vector_ty,		RegisterOperand multi_vector_ty,
ZPRRegOp zpr_ty> {		ZPRRegOp zpr_ty> {
def NAME: sme2_mla_ll_array_vg24_single<op, matrix_ty, multi_vector_ty,		def NAME: sme2_mla_ll_array_vg24_single<op, matrix_ty, multi_vector_ty,
zpr_ty, mnemonic>;		zpr_ty, mnemonic>, SMEPseudo2Instr<NAME, 1>;

		def NAME # _PSEUDO : sme2_za_array_2op_multi_single_pseudo<NAME, uimm1s4range, multi_vector_ty, zpr_ty, SMEMatrixArray>;

def : InstAlias<mnemonic # "\t$ZAd[$Rv, $imm], $Zn, $Zm",		def : InstAlias<mnemonic # "\t$ZAd[$Rv, $imm], $Zn, $Zm",
(!cast<Instruction>(NAME) matrix_ty:$ZAd, MatrixIndexGPR32Op8_11:$Rv, uimm1s4range:$imm, multi_vector_ty:$Zn, zpr_ty:$Zm), 0>;		(!cast<Instruction>(NAME) matrix_ty:$ZAd, MatrixIndexGPR32Op8_11:$Rv, uimm1s4range:$imm, multi_vector_ty:$Zn, zpr_ty:$Zm), 0>;
}		}

		multiclass sme2_mla_ll_array_vg2_single<string mnemonic, bits<5> op,
		MatrixOperand matrix_ty,
		RegisterOperand multi_vector_ty,
		ZPRRegOp zpr_ty, ValueType vt, SDPatternOperator intrinsic> {

		defm NAME: sme2_mla_ll_array_vg24_single<mnemonic, op, matrix_ty, multi_vector_ty, zpr_ty>;

		def : SME2_ZA_TwoOp_VG2_Multi_Single_Pat<NAME, intrinsic, uimm1s4range, zpr_ty, vt, tileslicerange1s4>;
		}

		multiclass sme2_mla_ll_array_vg4_single<string mnemonic, bits<5> op,
		MatrixOperand matrix_ty,
		RegisterOperand multi_vector_ty,
		ZPRRegOp zpr_ty, ValueType vt, SDPatternOperator intrinsic> {
		defm NAME: sme2_mla_ll_array_vg24_single<mnemonic, op, matrix_ty, multi_vector_ty, zpr_ty>;

		def : SME2_ZA_TwoOp_VG4_Multi_Single_Pat<NAME, intrinsic, uimm1s4range, zpr_ty, vt, tileslicerange1s4>;
		}

// SME2 multiple vectors long long MLA two sources		// SME2 multiple vectors long long MLA two sources

class sme2_mla_ll_array_vg2_multi<bits<4> op, MatrixOperand matrix_ty,		class sme2_mla_ll_array_vg2_multi<bits<4> op, MatrixOperand matrix_ty,
RegisterOperand vector_ty,string mnemonic>		RegisterOperand vector_ty,string mnemonic>
: I<(outs matrix_ty:$ZAda),		: I<(outs matrix_ty:$ZAda),
(ins matrix_ty:$_ZAda, MatrixIndexGPR32Op8_11:$Rv, uimm1s4range:$imm,		(ins matrix_ty:$_ZAda, MatrixIndexGPR32Op8_11:$Rv, uimm1s4range:$imm,
vector_ty:$Zn, vector_ty:$Zm),		vector_ty:$Zn, vector_ty:$Zm),
mnemonic, "\t$ZAda[$Rv, $imm, vgx2], $Zn, $Zm",		mnemonic, "\t$ZAda[$Rv, $imm, vgx2], $Zn, $Zm",
▲ Show 20 Lines • Show All 1,801 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sme2-intrinsics-mlall.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sme2 -mattr=+sme-i16i64 -verify-machineinstrs < %s \| FileCheck %s

				david-armUnsubmitted Done Reply Inline Actions nit: I don't think we need the `-mattr=+sve` flag here, since sme2 implies it. david-arm: nit: I don't think we need the `-mattr=+sve` flag here, since sme2 implies it.
				;
				; SMLALL
				;

				; Single x1

				define void @multi_vector_mul_add_single_long_vg4x1_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x1_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: smlall za.s[w8, 0:3], z1.b, z2.b
				; CHECK-NEXT: smlall za.s[w8, 12:15], z1.b, z2.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smla.za32.single.vg4x1.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				%slice.12 = add i32 %slice, 12
				call void @llvm.aarch64.sme.smla.za32.single.vg4x1.nxv16i8(i32 %slice.12, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_add_single_long_vg4x1_s16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x1_s16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: smlall za.d[w8, 0:3], z1.h, z2.h
				; CHECK-NEXT: smlall za.d[w8, 12:15], z1.h, z2.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smla.za64.single.vg4x1.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm)
				%slice.12 = add i32 %slice, 12
				call void @llvm.aarch64.sme.smla.za64.single.vg4x1.nxv8i16(i32 %slice.12, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm)
				ret void
				}

				; Single x2

				define void @multi_vector_mul_add_single_long_vg4x2_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x2_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: smlall za.s[w8, 0:3, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: smlall za.s[w8, 4:7, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smla.za32.single.vg4x2.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.smla.za32.single.vg4x2.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_add_single_long_vg4x2_s16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x2_s16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: smlall za.d[w8, 0:3, vgx2], { z1.h, z2.h }, z3.h
				; CHECK-NEXT: smlall za.d[w8, 4:7, vgx2], { z1.h, z2.h }, z3.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smla.za64.single.vg4x2.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.smla.za64.single.vg4x2.nxv8i16(i32 %slice.4, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm)
				ret void
				}

				; Single x4

				define void @multi_vector_mul_add_single_long_vg4x4_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x4_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: smlall za.s[w8, 0:3, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: smlall za.s[w8, 4:7, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smla.za32.single.vg4x4.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.smla.za32.single.vg4x4.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_add_single_long_vg4x4_s16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x4_s16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: smlall za.d[w8, 0:3, vgx4], { z1.h - z4.h }, z5.h
				; CHECK-NEXT: smlall za.d[w8, 4:7, vgx4], { z1.h - z4.h }, z5.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smla.za64.single.vg4x4.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.smla.za64.single.vg4x4.nxv8i16(i32 %slice.4, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm)
				ret void
				}

				; UMLALL

				; Single x1

				define void @multi_vector_mul_add_single_long_vg4x1_u8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x1_u8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: umlall za.s[w8, 0:3], z1.b, z2.b
				; CHECK-NEXT: umlall za.s[w8, 12:15], z1.b, z2.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umla.za32.single.vg4x1.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				%slice.12 = add i32 %slice, 12
				call void @llvm.aarch64.sme.umla.za32.single.vg4x1.nxv16i8(i32 %slice.12, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_add_single_long_vg4x1_u16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x1_u16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: umlall za.d[w8, 0:3], z1.h, z2.h
				; CHECK-NEXT: umlall za.d[w8, 12:15], z1.h, z2.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umla.za64.single.vg4x1.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm)
				%slice.12 = add i32 %slice, 12
				call void @llvm.aarch64.sme.umla.za64.single.vg4x1.nxv8i16(i32 %slice.12, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm)
				ret void
				}

				; Single x2

				define void @multi_vector_mul_add_single_long_vg4x2_u8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x2_u8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: umlall za.s[w8, 0:3, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: umlall za.s[w8, 4:7, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umla.za32.single.vg4x2.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.umla.za32.single.vg4x2.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_add_single_long_vg4x2_u16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x2_u16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: umlall za.d[w8, 0:3, vgx2], { z1.h, z2.h }, z3.h
				; CHECK-NEXT: umlall za.d[w8, 4:7, vgx2], { z1.h, z2.h }, z3.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umla.za64.single.vg4x2.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.umla.za64.single.vg4x2.nxv8i16(i32 %slice.4, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm)
				ret void
				}

				; Single x4

				define void @multi_vector_mul_add_single_long_vg4x4_u8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x4_u8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: umlall za.s[w8, 0:3, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: umlall za.s[w8, 4:7, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umla.za32.single.vg4x4.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.umla.za32.single.vg4x4.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_add_single_long_vg4x4_u16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_long_vg4x4_u16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: umlall za.d[w8, 0:3, vgx4], { z1.h - z4.h }, z5.h
				; CHECK-NEXT: umlall za.d[w8, 4:7, vgx4], { z1.h - z4.h }, z5.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umla.za64.single.vg4x4.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.umla.za64.single.vg4x4.nxv8i16(i32 %slice.4, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm)
				ret void
				}

				; SMLSLL

				; Single x1

				define void @multi_vector_mul_sub_single_long_vg4x1_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x1_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: smlsll za.s[w8, 0:3], z1.b, z2.b
				; CHECK-NEXT: smlsll za.s[w8, 12:15], z1.b, z2.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smls.za32.single.vg4x1.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				%slice.12 = add i32 %slice, 12
				call void @llvm.aarch64.sme.smls.za32.single.vg4x1.nxv16i8(i32 %slice.12, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_sub_single_long_vg4x1_s16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x1_s16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: smlsll za.d[w8, 0:3], z1.h, z2.h
				; CHECK-NEXT: smlsll za.d[w8, 12:15], z1.h, z2.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smls.za64.single.vg4x1.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm)
				%slice.12 = add i32 %slice, 12
				call void @llvm.aarch64.sme.smls.za64.single.vg4x1.nxv8i16(i32 %slice.12, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm)
				ret void
				}

				; Single x2

				define void @multi_vector_mul_sub_single_long_vg4x2_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x2_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: smlsll za.s[w8, 0:3, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: smlsll za.s[w8, 4:7, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smls.za32.single.vg4x2.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.smls.za32.single.vg4x2.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_sub_single_long_vg4x2_s16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x2_s16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: smlsll za.d[w8, 0:3, vgx2], { z1.h, z2.h }, z3.h
				; CHECK-NEXT: smlsll za.d[w8, 4:7, vgx2], { z1.h, z2.h }, z3.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smls.za64.single.vg4x2.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.smls.za64.single.vg4x2.nxv8i16(i32 %slice.4, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm)
				ret void
				}

				; Single x4

				define void @multi_vector_mul_sub_single_long_vg4x4_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x4_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: smlsll za.s[w8, 0:3, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: smlsll za.s[w8, 4:7, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smls.za32.single.vg4x4.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.smls.za32.single.vg4x4.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_sub_single_long_vg4x4_s16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x4_s16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: smlsll za.d[w8, 0:3, vgx4], { z1.h - z4.h }, z5.h
				; CHECK-NEXT: smlsll za.d[w8, 4:7, vgx4], { z1.h - z4.h }, z5.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.smls.za64.single.vg4x4.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.smls.za64.single.vg4x4.nxv8i16(i32 %slice.4, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm)
				ret void
				}

				; UMLSLL

				; Single x1

				define void @multi_vector_mul_sub_single_long_vg4x1_u8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x1_u8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: umlsll za.s[w8, 0:3], z1.b, z2.b
				; CHECK-NEXT: umlsll za.s[w8, 12:15], z1.b, z2.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umls.za32.single.vg4x1.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				%slice.12 = add i32 %slice, 12
				call void @llvm.aarch64.sme.umls.za32.single.vg4x1.nxv16i8(i32 %slice.12, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_sub_single_long_vg4x1_u16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x1_u16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: umlsll za.d[w8, 0:3], z1.h, z2.h
				; CHECK-NEXT: umlsll za.d[w8, 12:15], z1.h, z2.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umls.za64.single.vg4x1.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm)
				%slice.12 = add i32 %slice, 12
				call void @llvm.aarch64.sme.umls.za64.single.vg4x1.nxv8i16(i32 %slice.12, <vscale x 8 x i16> %zn, <vscale x 8 x i16> %zm)
				ret void
				}

				; Single x2

				define void @multi_vector_mul_sub_single_long_vg4x2_u8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x2_u8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: umlsll za.s[w8, 0:3, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: umlsll za.s[w8, 4:7, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umls.za32.single.vg4x2.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.umls.za32.single.vg4x2.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_sub_single_long_vg4x2_u16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x2_u16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: umlsll za.d[w8, 0:3, vgx2], { z1.h, z2.h }, z3.h
				; CHECK-NEXT: umlsll za.d[w8, 4:7, vgx2], { z1.h, z2.h }, z3.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umls.za64.single.vg4x2.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.umls.za64.single.vg4x2.nxv8i16(i32 %slice.4, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zm)
				ret void
				}

				; Single x4

				define void @multi_vector_mul_sub_single_long_vg4x4_u8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x4_u8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: umlsll za.s[w8, 0:3, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: umlsll za.s[w8, 4:7, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umls.za32.single.vg4x4.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.umls.za32.single.vg4x4.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				ret void
				}

				define void @multi_vector_mul_sub_single_long_vg4x4_u16(i32 %slice, <vscale x 8 x i16> %dummy, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm) {
				; CHECK-LABEL: multi_vector_mul_sub_single_long_vg4x4_u16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: umlsll za.d[w8, 0:3, vgx4], { z1.h - z4.h }, z5.h
				; CHECK-NEXT: umlsll za.d[w8, 4:7, vgx4], { z1.h - z4.h }, z5.h
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.umls.za64.single.vg4x4.nxv8i16(i32 %slice, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.umls.za64.single.vg4x4.nxv8i16(i32 %slice.4, <vscale x 8 x i16> %zn0, <vscale x 8 x i16> %zn1, <vscale x 8 x i16> %zn2, <vscale x 8 x i16> %zn3, <vscale x 8 x i16> %zm)
				ret void
				}

				;
				; SUMLALL
				;

				; Single x 2

				define void @multi_vector_mul_add_single_signed_long_vg4x2_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_signed_long_vg4x2_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: sumlall za.s[w8, 0:3, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: sumlall za.s[w8, 4:7, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.sumla.za32.single.vg4x2.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.sumla.za32.single.vg4x2.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				ret void
				}

				; Single x 4

				define void @multi_vector_mul_add_single_signed_long_vg4x4_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_signed_long_vg4x4_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: sumlall za.s[w8, 0:3, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: sumlall za.s[w8, 4:7, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.sumla.za32.single.vg4x4.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.sumla.za32.single.vg4x4.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				ret void
				}

				; USMLALL

				; Single x1

				define void @multi_vector_mul_add_single_unsigned_long_vg4x1_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_unsigned_long_vg4x1_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: usmlall za.s[w8, 0:3], z1.b, z2.b
				; CHECK-NEXT: usmlall za.s[w8, 12:15], z1.b, z2.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.usmla.za32.single.vg4x1.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				%slice.12 = add i32 %slice, 12
				call void @llvm.aarch64.sme.usmla.za32.single.vg4x1.nxv16i8(i32 %slice.12, <vscale x 16 x i8> %zn, <vscale x 16 x i8> %zm)
				ret void
				}

				; Single x 2

				define void @multi_vector_mul_add_single_unsigned_long_vg4x2_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_unsigned_long_vg4x2_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2 def $z1_z2
				; CHECK-NEXT: usmlall za.s[w8, 0:3, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: usmlall za.s[w8, 4:7, vgx2], { z1.b, z2.b }, z3.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.usmla.za32.single.vg4x2.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.usmla.za32.single.vg4x2.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zm)
				ret void
				}

				; Single x4

				define void @multi_vector_mul_add_single_unsigned_long_vg4x4_s8(i32 %slice, <vscale x 16 x i8> %dummy, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm) {
				; CHECK-LABEL: multi_vector_mul_add_single_unsigned_long_vg4x4_s8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $z4 killed $z4 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: mov w8, w0
				; CHECK-NEXT: // kill: def $z3 killed $z3 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z2 killed $z2 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: // kill: def $z1 killed $z1 killed $z1_z2_z3_z4 def $z1_z2_z3_z4
				; CHECK-NEXT: usmlall za.s[w8, 0:3, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: usmlall za.s[w8, 4:7, vgx4], { z1.b - z4.b }, z5.b
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sme.usmla.za32.single.vg4x4.nxv16i8(i32 %slice, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				%slice.4 = add i32 %slice, 4
				call void @llvm.aarch64.sme.usmla.za32.single.vg4x4.nxv16i8(i32 %slice.4, <vscale x 16 x i8> %zn0, <vscale x 16 x i8> %zn1, <vscale x 16 x i8> %zn2, <vscale x 16 x i8> %zn3, <vscale x 16 x i8> %zm)
				ret void
				}

				declare void @llvm.aarch64.sme.smla.za32.single.vg4x1.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.smla.za32.single.vg4x2.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.smla.za32.single.vg4x4.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)

				declare void @llvm.aarch64.sme.smla.za64.single.vg4x1.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare void @llvm.aarch64.sme.smla.za64.single.vg4x2.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare void @llvm.aarch64.sme.smla.za64.single.vg4x4.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)

				declare void @llvm.aarch64.sme.umla.za32.single.vg4x1.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.umla.za32.single.vg4x2.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.umla.za32.single.vg4x4.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)

				declare void @llvm.aarch64.sme.umla.za64.single.vg4x1.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare void @llvm.aarch64.sme.umla.za64.single.vg4x2.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare void @llvm.aarch64.sme.umla.za64.single.vg4x4.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)

				declare void @llvm.aarch64.sme.smls.za32.single.vg4x1.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.smls.za32.single.vg4x2.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.smls.za32.single.vg4x4.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)

				declare void @llvm.aarch64.sme.smls.za64.single.vg4x1.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare void @llvm.aarch64.sme.smls.za64.single.vg4x2.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare void @llvm.aarch64.sme.smls.za64.single.vg4x4.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)

				declare void @llvm.aarch64.sme.umls.za32.single.vg4x1.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.umls.za32.single.vg4x2.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.umls.za32.single.vg4x4.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)

				declare void @llvm.aarch64.sme.umls.za64.single.vg4x1.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare void @llvm.aarch64.sme.umls.za64.single.vg4x2.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare void @llvm.aarch64.sme.umls.za64.single.vg4x4.nxv8i16(i32, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)

				declare void @llvm.aarch64.sme.sumla.za32.single.vg4x2.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.sumla.za32.single.vg4x4.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)

				declare void @llvm.aarch64.sme.usmla.za32.single.vg4x1.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.usmla.za32.single.vg4x2.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare void @llvm.aarch64.sme.usmla.za32.single.vg4x4.nxv16i8(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)

This is an archive of the discontinued LLVM Phabricator instance.

[SME2][AArch64] Add multi-single multiply-add long long intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 497698

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td

llvm/lib/Target/AArch64/SMEInstrFormats.td

llvm/test/CodeGen/AArch64/sme2-intrinsics-mlall.ll

[SME2][AArch64] Add multi-single multiply-add long long intrinsics
ClosedPublic