This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] [ARM]: Armv8.6-a Matrix Mul Asm and Intrinsics Support
AbandonedPublic

Authored by LukeGeeson on Apr 6 2020, 4:55 AM.

Download Raw Diff

Details

Reviewers

ostannard
rengolin
t.p.northover
stuij
eli.friedman
fhahn
aemerson

Summary

This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

This patch includes:

Assembly support for AArch32, AArch64, NEON, and SVE
Intrinsics Support for AArch32, AArch64
Command line options to enable these features with +f32mm or f64mm

Note: these extensions are optional in the 8.6a architecture and so have
to be enabled by default

No IR types or C Types are needed for this extension.

This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)

Based on work by:

Luke Geeson
Oliver Stannard
Luke Cheeseman

Diff Detail

Event Timeline

LukeGeeson created this revision.Apr 6 2020, 4:55 AM

Herald added a reviewer: rengolin. · View Herald TranscriptApr 6 2020, 4:55 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, danielkiss, hiraditya, kristof.beyls. · View Herald Transcript

Harbormaster failed remote builds in B51919: Diff 255289!Apr 6 2020, 6:27 AM

build failures related to linting, unit tests passing

DavidSpickett added a subscriber: DavidSpickett.Apr 8 2020, 3:00 AM

DavidSpickett added inline comments.

clang/lib/Basic/Targets/AArch64.cpp
284	I don't see specific tests for this #define
clang/lib/Basic/Targets/AArch64.h
39	Why is this feature a number for AArch64, does >1 mean something?

changed hasmatmul type to boolean in clang/lib/basic/target/aarch64.h and related files

clang/lib/Basic/Targets/AArch64.h
39	It seems this was an artifact from how this is handled in ARM.h where they do have special meaning. I'll modify this to be boolean in line with the others in AArch64.h

added __ARM_FEATURE_MATMUL_INT8 check test

LukeGeeson marked 2 inline comments as done.Apr 8 2020, 8:40 AM

LukeGeeson added inline comments.

clang/lib/Basic/Targets/AArch64.cpp
284	test added in clang/test/CodeGen/aarch64-matmul.cpp

LukeGeeson edited the summary of this revision. (Show Details)Apr 8 2020, 8:44 AM

LukeGeeson added reviewers: t.p.northover, stuij.

LukeGeeson marked an inline comment as done.Apr 8 2020, 8:47 AM

LukeGeeson added reviewers: eli.friedman, fhahn, aemerson.Apr 8 2020, 9:10 AM

This patch is quite big and I think it would be easier to review if it would be split up into distinct clang/llvm parts and maybe NEON/SVE parts on the LLVM side.

clang/lib/Driver/ToolChains/Arch/AArch64.cpp
71	Not sure if that is the right place to handle the implication. This function just decodes a given feature string. And shouldn't that already be handled in the backend by the implications in llvm/lib/Target/AArch64/AArch64.td?

In D77540#1969873, @fhahn wrote:

This patch is quite big and I think it would be easier to review if it would be split up into distinct clang/llvm parts and maybe NEON/SVE parts on the LLVM side.

That's a good suggestion, I shouldn't have pushed my luck!

Luckily this patch doesn't introduce any new C types or IR types and so the frontend code is light, relatively speaking. That said, there is so little intrinsics work needed here that it might make sense to split the subsequent patches up to include intrinsics out-of-the-box as AArch64 asm+intrinsics, AArch32 asm+intrinsics, AArch64SVE asm + intrinsics etc... instead.

Does that sound ok?

In D77540#1970034, @LukeGeeson wrote:

In D77540#1969873, @fhahn wrote:

This patch is quite big and I think it would be easier to review if it would be split up into distinct clang/llvm parts and maybe NEON/SVE parts on the LLVM side.

That's a good suggestion, I shouldn't have pushed my luck!

Luckily this patch doesn't introduce any new C types or IR types and so the frontend code is light, relatively speaking. That said, there is so little intrinsics work needed here that it might make sense to split the subsequent patches up to include intrinsics out-of-the-box as AArch64 asm+intrinsics, AArch32 asm+intrinsics, AArch64SVE asm + intrinsics etc... instead.

Does that sound ok?

Yeah that sounds good. In particular separating the clang/llvm changes would be helpful.

LukeGeeson added a child revision: D77871: [AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics.Apr 10 2020, 6:34 AM

Hi folks,
Thank you for your help, I have split the patches up as follows:

https://reviews.llvm.org/D77871 [AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics
https://reviews.llvm.org/D77872 [AArch32] Armv8.6-a Matrix Mult Assembly + Intrinsics
https://reviews.llvm.org/D77873 [AArch64] Armv8.6-A Mat Mul SVE Assembly
https://reviews.llvm.org/D77874 [AArch32] Armv8.6a Matrix Mul Assembly
https://reviews.llvm.org/D77875 [ARM] Armv8.6-a Matrix Mul cmd line support

In detail this covers the intrinsics (with minimum llvm backend needed to build and pass tests), SVE Assembly, (non-SVE) Assembly, and command-line support to enable the use of this extension. I mentioned above that there was so little intrinsics work needed and so the natural way to split up this patch was to include the llvm code needed to get each patch to build in the order above.

I'd appreciate any help you can give reviewing these!

LukeGeeson removed a child revision: D77871: [AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics.Apr 10 2020, 7:23 AM

Sub-issues all closed

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

arm_neon.td

34 lines

lib/

Basic/

Targets/

1 line

6 lines

1 line

6 lines

CodeGen/

CGBuiltin.cpp

28 lines

Driver/

ToolChains/

Arch/

AArch64.cpp

20 lines

test/

CodeGen/

aarch64-matmul.cpp

8 lines

aarch64-v8.6a-neon-intrinsics.c

171 lines

arm-v8.6a-neon-intrinsics.c

100 lines

Driver/

aarch64-cpus.c

28 lines

arm-matrix-multiply.c

14 lines

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

11 lines

IntrinsicsARM.td

13 lines

Support/

AArch64TargetParser.h

4 lines

AArch64TargetParser.def

2 lines

ARMTargetParser.h

17 lines

ARMTargetParser.def

4 lines

lib/

Target/

AArch64/

AArch64.td

12 lines

AArch64InstrFormats.td

42 lines

AArch64InstrInfo.td

48 lines

AArch64SVEInstrInfo.td

31 lines

AArch64Subtarget.h

6 lines

AsmParser/

AArch64AsmParser.cpp

3 lines

SVEInstrFormats.td

174 lines

ARM/

6 lines

79 lines

2 lines

5 lines

AsmParser/

ARMAsmParser.cpp

9 lines

test/

CodeGen/

AArch64/

aarch64-matmul.ll

136 lines

ARM/

arm-matmul.ll

83 lines

MC/

AArch64/

SVE/

matrix-multiply-fp-diagnostics.s

86 lines

matrix-multiply-fp32.s

17 lines

matrix-multiply-fp64.s

185 lines

matrix-multiply-int8-diagnostics.s

78 lines

matrix-multiply-int8.s

118 lines

armv8.6a-simd-matmul-error.s

34 lines

armv8.6a-simd-matmul.s

43 lines

ARM/

armv8.6a-matmul-error.s

113 lines

armv8.6a-matmul.s

49 lines

Disassembler/

AArch64/

armv8.6a-simd-matmul.txt

34 lines

ARM/

armv8.6a-matmul-arm.txt

38 lines

armv8.6a-matmul-thumb.txt

38 lines

unittests/

Support/

TargetParserTest.cpp

6 lines

Diff 256032

clang/include/clang/Basic/arm_neon.td

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	def OP_FMLAL_LN : Op<(call "vfmlal_low", $p0, $p1,
(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;		(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
def OP_FMLSL_LN : Op<(call "vfmlsl_low", $p0, $p1,		def OP_FMLSL_LN : Op<(call "vfmlsl_low", $p0, $p1,
(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;		(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
def OP_FMLAL_LN_Hi : Op<(call "vfmlal_high", $p0, $p1,		def OP_FMLAL_LN_Hi : Op<(call "vfmlal_high", $p0, $p1,
(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;		(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
def OP_FMLSL_LN_Hi : Op<(call "vfmlsl_high", $p0, $p1,		def OP_FMLSL_LN_Hi : Op<(call "vfmlsl_high", $p0, $p1,
(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;		(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;

		def OP_USDOT_LN
		: Op<(call "vusdot", $p0, $p1,
		(cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
		def OP_USDOT_LNQ
		: Op<(call "vusdot", $p0, $p1,
		(cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;

		// sudot splats the second vector and then calls vusdot
		def OP_SUDOT_LN
		: Op<(call "vusdot", $p0,
		(cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
		def OP_SUDOT_LNQ
		: Op<(call "vusdot", $p0,
		(cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Auxiliary Instructions		// Auxiliary Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Splat operation - performs a range-checked splat over a vector		// Splat operation - performs a range-checked splat over a vector
def SPLAT : WInst<"splat_lane", ".(!q)I",		def SPLAT : WInst<"splat_lane", ".(!q)I",
"UcUsUicsilPcPsfQUcQUsQUiQcQsQiQPcQPsQflUlQlQUlhdQhQdPlQPl">;		"UcUsUicsilPcPsfQUcQUsQUiQcQsQiQPcQPsQflUlQlQUlhdQhQdPlQPl">;
def SPLATQ : WInst<"splat_laneq", ".(!Q)I",		def SPLATQ : WInst<"splat_laneq", ".(!Q)I",
▲ Show 20 Lines • Show All 1,555 Lines • ▼ Show 20 Lines	let ArchGuard = "defined(__ARM_FEATURE_FP16FML) && defined(__aarch64__)" in {
def VFMLAL_LANEQ_HIGH : SOpInst<"vfmlal_laneq_high", "(F>)(F>)F(FQ)I", "hQh", OP_FMLAL_LN_Hi> {		def VFMLAL_LANEQ_HIGH : SOpInst<"vfmlal_laneq_high", "(F>)(F>)F(FQ)I", "hQh", OP_FMLAL_LN_Hi> {
let isLaneQ = 1;		let isLaneQ = 1;
}		}
def VFMLSL_LANEQ_HIGH : SOpInst<"vfmlsl_laneq_high", "(F>)(F>)F(FQ)I", "hQh", OP_FMLSL_LN_Hi> {		def VFMLSL_LANEQ_HIGH : SOpInst<"vfmlsl_laneq_high", "(F>)(F>)F(FQ)I", "hQh", OP_FMLSL_LN_Hi> {
let isLaneQ = 1;		let isLaneQ = 1;
}		}
}		}

		let ArchGuard = "defined(__ARM_FEATURE_MATMUL_INT8)" in {
		def VMMLA : SInst<"vmmla", "..(<<)(<<)", "QUiQi">;
		def VUSMMLA : SInst<"vusmmla", "..(<<U)(<<)", "Qi">;

		def VUSDOT : SInst<"vusdot", "..(<<U)(<<)", "iQi">;

		def VUSDOT_LANE : SOpInst<"vusdot_lane", "..(<<U)(<<q)I", "iQi", OP_USDOT_LN>;
		def VSUDOT_LANE : SOpInst<"vsudot_lane", "..(<<)(<<qU)I", "iQi", OP_SUDOT_LN>;

		let ArchGuard = "defined(__aarch64__)" in {
		let isLaneQ = 1 in {
		def VUSDOT_LANEQ : SOpInst<"vusdot_laneq", "..(<<U)(<<Q)I", "iQi", OP_USDOT_LNQ>;
		def VSUDOT_LANEQ : SOpInst<"vsudot_laneq", "..(<<)(<<QU)I", "iQi", OP_SUDOT_LNQ>;
		}
		}
		}

// v8.3-A Vector complex addition intrinsics		// v8.3-A Vector complex addition intrinsics
let ArchGuard = "defined(__ARM_FEATURE_COMPLEX) && defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)" in {		let ArchGuard = "defined(__ARM_FEATURE_COMPLEX) && defined(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC)" in {
def VCADD_ROT90_FP16 : SInst<"vcadd_rot90", "...", "h">;		def VCADD_ROT90_FP16 : SInst<"vcadd_rot90", "...", "h">;
def VCADD_ROT270_FP16 : SInst<"vcadd_rot270", "...", "h">;		def VCADD_ROT270_FP16 : SInst<"vcadd_rot270", "...", "h">;
def VCADDQ_ROT90_FP16 : SInst<"vcaddq_rot90", "QQQ", "h">;		def VCADDQ_ROT90_FP16 : SInst<"vcaddq_rot90", "QQQ", "h">;
def VCADDQ_ROT270_FP16 : SInst<"vcaddq_rot270", "QQQ", "h">;		def VCADDQ_ROT270_FP16 : SInst<"vcaddq_rot270", "QQQ", "h">;
}		}
let ArchGuard = "defined(__ARM_FEATURE_COMPLEX)" in {		let ArchGuard = "defined(__ARM_FEATURE_COMPLEX)" in {
def VCADD_ROT90 : SInst<"vcadd_rot90", "...", "f">;		def VCADD_ROT90 : SInst<"vcadd_rot90", "...", "f">;
def VCADD_ROT270 : SInst<"vcadd_rot270", "...", "f">;		def VCADD_ROT270 : SInst<"vcadd_rot270", "...", "f">;
def VCADDQ_ROT90 : SInst<"vcaddq_rot90", "QQQ", "f">;		def VCADDQ_ROT90 : SInst<"vcaddq_rot90", "QQQ", "f">;
def VCADDQ_ROT270 : SInst<"vcaddq_rot270", "QQQ", "f">;		def VCADDQ_ROT270 : SInst<"vcaddq_rot270", "QQQ", "f">;
}		}
let ArchGuard = "defined(__ARM_FEATURE_COMPLEX) && defined(__aarch64__)" in {		let ArchGuard = "defined(__ARM_FEATURE_COMPLEX) && defined(__aarch64__)" in {
def VCADDQ_ROT90_FP64 : SInst<"vcaddq_rot90", "QQQ", "d">;		def VCADDQ_ROT90_FP64 : SInst<"vcaddq_rot90", "QQQ", "d">;
def VCADDQ_ROT270_FP64 : SInst<"vcaddq_rot270", "QQQ", "d">;		def VCADDQ_ROT270_FP64 : SInst<"vcaddq_rot270", "QQQ", "d">;
}		}
No newline at end of file

clang/lib/Basic/Targets/AArch64.h

Show All 30 Lines	class LLVM_LIBRARY_VISIBILITY AArch64TargetInfo : public TargetInfo {
bool HasCRC;		bool HasCRC;
bool HasCrypto;		bool HasCrypto;
bool HasUnaligned;		bool HasUnaligned;
bool HasFullFP16;		bool HasFullFP16;
bool HasDotProd;		bool HasDotProd;
bool HasFP16FML;		bool HasFP16FML;
bool HasMTE;		bool HasMTE;
bool HasTME;		bool HasTME;
		bool HasMatMul;
		DavidSpickettUnsubmitted Done Reply Inline Actions Why is this feature a number for AArch64, does >1 mean something? DavidSpickett: Why is this feature a number for AArch64, does >1 mean something?
		LukeGeesonAuthorUnsubmitted Done Reply Inline Actions It seems this was an artifact from how this is handled in ARM.h where they do have special meaning. I'll modify this to be boolean in line with the others in AArch64.h LukeGeeson: It seems this was an artifact from how this is handled in ARM.h where they do have special…

llvm::AArch64::ArchKind ArchKind;		llvm::AArch64::ArchKind ArchKind;

static const Builtin::Info BuiltinInfo[];		static const Builtin::Info BuiltinInfo[];

std::string ABI;		std::string ABI;

public:		public:
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/AArch64.cpp

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	if (HasDotProd)
Builder.defineMacro("__ARM_FEATURE_DOTPROD", "1");		Builder.defineMacro("__ARM_FEATURE_DOTPROD", "1");

if (HasMTE)		if (HasMTE)
Builder.defineMacro("__ARM_FEATURE_MEMORY_TAGGING", "1");		Builder.defineMacro("__ARM_FEATURE_MEMORY_TAGGING", "1");

if (HasTME)		if (HasTME)
Builder.defineMacro("__ARM_FEATURE_TME", "1");		Builder.defineMacro("__ARM_FEATURE_TME", "1");

		if (HasMatMul)
		Builder.defineMacro("__ARM_FEATURE_MATMUL_INT8", "1");
		DavidSpickettUnsubmitted Done Reply Inline Actions I don't see specific tests for this #define DavidSpickett: I don't see specific tests for this #define
		LukeGeesonAuthorUnsubmitted Done Reply Inline Actions test added in clang/test/CodeGen/aarch64-matmul.cpp LukeGeeson: test added in clang/test/CodeGen/aarch64-matmul.cpp

if ((FPU & NeonMode) && HasFP16FML)		if ((FPU & NeonMode) && HasFP16FML)
Builder.defineMacro("__ARM_FEATURE_FP16FML", "1");		Builder.defineMacro("__ARM_FEATURE_FP16FML", "1");

if (Opts.hasSignReturnAddress()) {		if (Opts.hasSignReturnAddress()) {
// Bitmask:		// Bitmask:
// 0: Protection using the A key		// 0: Protection using the A key
// 1: Protection using the B key		// 1: Protection using the B key
// 2: Protection including leaf functions		// 2: Protection including leaf functions
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	bool AArch64TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
HasCRC = false;		HasCRC = false;
HasCrypto = false;		HasCrypto = false;
HasUnaligned = true;		HasUnaligned = true;
HasFullFP16 = false;		HasFullFP16 = false;
HasDotProd = false;		HasDotProd = false;
HasFP16FML = false;		HasFP16FML = false;
HasMTE = false;		HasMTE = false;
HasTME = false;		HasTME = false;
		HasMatMul = false;
ArchKind = llvm::AArch64::ArchKind::ARMV8A;		ArchKind = llvm::AArch64::ArchKind::ARMV8A;

for (const auto &Feature : Features) {		for (const auto &Feature : Features) {
if (Feature == "+neon")		if (Feature == "+neon")
FPU \|= NeonMode;		FPU \|= NeonMode;
if (Feature == "+sve")		if (Feature == "+sve")
FPU \|= SveMode;		FPU \|= SveMode;
if (Feature == "+crc")		if (Feature == "+crc")
Show All 19 Lines	for (const auto &Feature : Features) {
if (Feature == "+dotprod")		if (Feature == "+dotprod")
HasDotProd = true;		HasDotProd = true;
if (Feature == "+fp16fml")		if (Feature == "+fp16fml")
HasFP16FML = true;		HasFP16FML = true;
if (Feature == "+mte")		if (Feature == "+mte")
HasMTE = true;		HasMTE = true;
if (Feature == "+tme")		if (Feature == "+tme")
HasTME = true;		HasTME = true;
		if (Feature == "+i8mm")
		HasMatMul = true;
}		}

setDataLayout();		setDataLayout();

return true;		return true;
}		}

TargetInfo::CallingConvCheckResult		TargetInfo::CallingConvCheckResult
▲ Show 20 Lines • Show All 383 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/ARM.h

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	class LLVM_LIBRARY_VISIBILITY ARMTargetInfo : public TargetInfo {
unsigned SoftFloat : 1;		unsigned SoftFloat : 1;
unsigned SoftFloatABI : 1;		unsigned SoftFloatABI : 1;

unsigned CRC : 1;		unsigned CRC : 1;
unsigned Crypto : 1;		unsigned Crypto : 1;
unsigned DSP : 1;		unsigned DSP : 1;
unsigned Unaligned : 1;		unsigned Unaligned : 1;
unsigned DotProd : 1;		unsigned DotProd : 1;
		unsigned HasMatMul : 1;

enum {		enum {
LDREX_B = (1 << 0), /// byte (8-bit)		LDREX_B = (1 << 0), /// byte (8-bit)
LDREX_H = (1 << 1), /// half (16-bit)		LDREX_H = (1 << 1), /// half (16-bit)
LDREX_W = (1 << 2), /// word (32-bit)		LDREX_W = (1 << 2), /// word (32-bit)
LDREX_D = (1 << 3), /// double (64-bit)		LDREX_D = (1 << 3), /// double (64-bit)
};		};

▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/ARM.cpp

Show First 20 Lines • Show All 419 Lines • ▼ Show 20 Lines	bool ARMTargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
CRC = 0;		CRC = 0;
Crypto = 0;		Crypto = 0;
DSP = 0;		DSP = 0;
Unaligned = 1;		Unaligned = 1;
SoftFloat = false;		SoftFloat = false;
// Note that SoftFloatABI is initialized in our constructor.		// Note that SoftFloatABI is initialized in our constructor.
HWDiv = 0;		HWDiv = 0;
DotProd = 0;		DotProd = 0;
		HasMatMul = 0;
HasFloat16 = true;		HasFloat16 = true;
ARMCDECoprocMask = 0;		ARMCDECoprocMask = 0;

// This does not diagnose illegal cases like having both		// This does not diagnose illegal cases like having both
// "+vfpv2" and "+vfpv3" or having "+neon" and "-fp64".		// "+vfpv2" and "+vfpv3" or having "+neon" and "-fp64".
for (const auto &Feature : Features) {		for (const auto &Feature : Features) {
if (Feature == "+soft-float") {		if (Feature == "+soft-float") {
SoftFloat = true;		SoftFloat = true;
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (Feature == "+soft-float") {
DotProd = true;		DotProd = true;
} else if (Feature == "+mve") {		} else if (Feature == "+mve") {
MVE \|= MVE_INT;		MVE \|= MVE_INT;
} else if (Feature == "+mve.fp") {		} else if (Feature == "+mve.fp") {
HasLegalHalfType = true;		HasLegalHalfType = true;
FPU \|= FPARMV8;		FPU \|= FPARMV8;
MVE \|= MVE_INT \| MVE_FP;		MVE \|= MVE_INT \| MVE_FP;
HW_FP \|= HW_FP_SP \| HW_FP_HP;		HW_FP \|= HW_FP_SP \| HW_FP_HP;
		} else if (Feature == "+i8mm") {
		HasMatMul = 1;
} else if (Feature.size() == strlen("+cdecp0") && Feature >= "+cdecp0" &&		} else if (Feature.size() == strlen("+cdecp0") && Feature >= "+cdecp0" &&
Feature <= "+cdecp7") {		Feature <= "+cdecp7") {
unsigned Coproc = Feature.back() - '0';		unsigned Coproc = Feature.back() - '0';
ARMCDECoprocMask \|= (1U << Coproc);		ARMCDECoprocMask \|= (1U << Coproc);
}		}
}		}

switch (ArchVersion) {		switch (ArchVersion) {
▲ Show 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	void ARMTargetInfo::getTargetDefines(const LangOptions &Opts,
// Armv8.2-A FP16 scalar intrinsics		// Armv8.2-A FP16 scalar intrinsics
if (HasLegalHalfType)		if (HasLegalHalfType)
Builder.defineMacro("__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", "1");		Builder.defineMacro("__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", "1");

// Armv8.2-A dot product intrinsics		// Armv8.2-A dot product intrinsics
if (DotProd)		if (DotProd)
Builder.defineMacro("__ARM_FEATURE_DOTPROD", "1");		Builder.defineMacro("__ARM_FEATURE_DOTPROD", "1");

		if (HasMatMul)
		Builder.defineMacro("__ARM_FEATURE_MATMUL_INT8", "1");

switch (ArchKind) {		switch (ArchKind) {
default:		default:
break;		break;
case llvm::ARM::ArchKind::ARMV8_1A:		case llvm::ARM::ArchKind::ARMV8_1A:
getTargetDefinesARMV81A(Opts, Builder);		getTargetDefinesARMV81A(Opts, Builder);
break;		break;
case llvm::ARM::ArchKind::ARMV8_2A:		case llvm::ARM::ArchKind::ARMV8_2A:
getTargetDefinesARMV82A(Opts, Builder);		getTargetDefinesARMV82A(Opts, Builder);
▲ Show 20 Lines • Show All 451 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,775 Lines • ▼ Show 20 Lines	static const ARMVectorIntrinsicInfo ARMSIMDIntrinsicMap [] = {
NEONMAP2(vmax_v, arm_neon_vmaxu, arm_neon_vmaxs, Add1ArgType \| UnsignedAlts),		NEONMAP2(vmax_v, arm_neon_vmaxu, arm_neon_vmaxs, Add1ArgType \| UnsignedAlts),
NEONMAP1(vmaxnm_v, arm_neon_vmaxnm, Add1ArgType),		NEONMAP1(vmaxnm_v, arm_neon_vmaxnm, Add1ArgType),
NEONMAP1(vmaxnmq_v, arm_neon_vmaxnm, Add1ArgType),		NEONMAP1(vmaxnmq_v, arm_neon_vmaxnm, Add1ArgType),
NEONMAP2(vmaxq_v, arm_neon_vmaxu, arm_neon_vmaxs, Add1ArgType \| UnsignedAlts),		NEONMAP2(vmaxq_v, arm_neon_vmaxu, arm_neon_vmaxs, Add1ArgType \| UnsignedAlts),
NEONMAP2(vmin_v, arm_neon_vminu, arm_neon_vmins, Add1ArgType \| UnsignedAlts),		NEONMAP2(vmin_v, arm_neon_vminu, arm_neon_vmins, Add1ArgType \| UnsignedAlts),
NEONMAP1(vminnm_v, arm_neon_vminnm, Add1ArgType),		NEONMAP1(vminnm_v, arm_neon_vminnm, Add1ArgType),
NEONMAP1(vminnmq_v, arm_neon_vminnm, Add1ArgType),		NEONMAP1(vminnmq_v, arm_neon_vminnm, Add1ArgType),
NEONMAP2(vminq_v, arm_neon_vminu, arm_neon_vmins, Add1ArgType \| UnsignedAlts),		NEONMAP2(vminq_v, arm_neon_vminu, arm_neon_vmins, Add1ArgType \| UnsignedAlts),
		NEONMAP2(vmmlaq_v, arm_neon_ummla, arm_neon_smmla, 0),
NEONMAP0(vmovl_v),		NEONMAP0(vmovl_v),
NEONMAP0(vmovn_v),		NEONMAP0(vmovn_v),
NEONMAP1(vmul_v, arm_neon_vmulp, Add1ArgType),		NEONMAP1(vmul_v, arm_neon_vmulp, Add1ArgType),
NEONMAP0(vmull_v),		NEONMAP0(vmull_v),
NEONMAP1(vmulq_v, arm_neon_vmulp, Add1ArgType),		NEONMAP1(vmulq_v, arm_neon_vmulp, Add1ArgType),
NEONMAP2(vpadal_v, arm_neon_vpadalu, arm_neon_vpadals, UnsignedAlts),		NEONMAP2(vpadal_v, arm_neon_vpadalu, arm_neon_vpadals, UnsignedAlts),
NEONMAP2(vpadalq_v, arm_neon_vpadalu, arm_neon_vpadals, UnsignedAlts),		NEONMAP2(vpadalq_v, arm_neon_vpadalu, arm_neon_vpadals, UnsignedAlts),
NEONMAP1(vpadd_v, arm_neon_vpadd, Add1ArgType),		NEONMAP1(vpadd_v, arm_neon_vpadd, Add1ArgType),
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	static const ARMVectorIntrinsicInfo ARMSIMDIntrinsicMap [] = {
NEONMAP1(vst4_v, arm_neon_vst4, 0),		NEONMAP1(vst4_v, arm_neon_vst4, 0),
NEONMAP1(vst4q_lane_v, arm_neon_vst4lane, 0),		NEONMAP1(vst4q_lane_v, arm_neon_vst4lane, 0),
NEONMAP1(vst4q_v, arm_neon_vst4, 0),		NEONMAP1(vst4q_v, arm_neon_vst4, 0),
NEONMAP0(vsubhn_v),		NEONMAP0(vsubhn_v),
NEONMAP0(vtrn_v),		NEONMAP0(vtrn_v),
NEONMAP0(vtrnq_v),		NEONMAP0(vtrnq_v),
NEONMAP0(vtst_v),		NEONMAP0(vtst_v),
NEONMAP0(vtstq_v),		NEONMAP0(vtstq_v),
		NEONMAP1(vusdot_v, arm_neon_usdot, 0),
		NEONMAP1(vusdotq_v, arm_neon_usdot, 0),
		NEONMAP1(vusmmlaq_v, arm_neon_usmmla, 0),
NEONMAP0(vuzp_v),		NEONMAP0(vuzp_v),
NEONMAP0(vuzpq_v),		NEONMAP0(vuzpq_v),
NEONMAP0(vzip_v),		NEONMAP0(vzip_v),
NEONMAP0(vzipq_v)		NEONMAP0(vzipq_v)
};		};

static const ARMVectorIntrinsicInfo AArch64SIMDIntrinsicMap[] = {		static const ARMVectorIntrinsicInfo AArch64SIMDIntrinsicMap[] = {
NEONMAP0(splat_lane_v),		NEONMAP0(splat_lane_v),
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	static const ARMVectorIntrinsicInfo AArch64SIMDIntrinsicMap[] = {
NEONMAP2(vhsub_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhsub_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),
NEONMAP2(vhsubq_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhsubq_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),
NEONMAP1(vld1_x2_v, aarch64_neon_ld1x2, 0),		NEONMAP1(vld1_x2_v, aarch64_neon_ld1x2, 0),
NEONMAP1(vld1_x3_v, aarch64_neon_ld1x3, 0),		NEONMAP1(vld1_x3_v, aarch64_neon_ld1x3, 0),
NEONMAP1(vld1_x4_v, aarch64_neon_ld1x4, 0),		NEONMAP1(vld1_x4_v, aarch64_neon_ld1x4, 0),
NEONMAP1(vld1q_x2_v, aarch64_neon_ld1x2, 0),		NEONMAP1(vld1q_x2_v, aarch64_neon_ld1x2, 0),
NEONMAP1(vld1q_x3_v, aarch64_neon_ld1x3, 0),		NEONMAP1(vld1q_x3_v, aarch64_neon_ld1x3, 0),
NEONMAP1(vld1q_x4_v, aarch64_neon_ld1x4, 0),		NEONMAP1(vld1q_x4_v, aarch64_neon_ld1x4, 0),
		NEONMAP2(vmmlaq_v, aarch64_neon_ummla, aarch64_neon_smmla, 0),
NEONMAP0(vmovl_v),		NEONMAP0(vmovl_v),
NEONMAP0(vmovn_v),		NEONMAP0(vmovn_v),
NEONMAP1(vmul_v, aarch64_neon_pmul, Add1ArgType),		NEONMAP1(vmul_v, aarch64_neon_pmul, Add1ArgType),
NEONMAP1(vmulq_v, aarch64_neon_pmul, Add1ArgType),		NEONMAP1(vmulq_v, aarch64_neon_pmul, Add1ArgType),
NEONMAP1(vpadd_v, aarch64_neon_addp, Add1ArgType),		NEONMAP1(vpadd_v, aarch64_neon_addp, Add1ArgType),
NEONMAP2(vpaddl_v, aarch64_neon_uaddlp, aarch64_neon_saddlp, UnsignedAlts),		NEONMAP2(vpaddl_v, aarch64_neon_uaddlp, aarch64_neon_saddlp, UnsignedAlts),
NEONMAP2(vpaddlq_v, aarch64_neon_uaddlp, aarch64_neon_saddlp, UnsignedAlts),		NEONMAP2(vpaddlq_v, aarch64_neon_uaddlp, aarch64_neon_saddlp, UnsignedAlts),
NEONMAP1(vpaddq_v, aarch64_neon_addp, Add1ArgType),		NEONMAP1(vpaddq_v, aarch64_neon_addp, Add1ArgType),
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	static const ARMVectorIntrinsicInfo AArch64SIMDIntrinsicMap[] = {
NEONMAP1(vst1_x3_v, aarch64_neon_st1x3, 0),		NEONMAP1(vst1_x3_v, aarch64_neon_st1x3, 0),
NEONMAP1(vst1_x4_v, aarch64_neon_st1x4, 0),		NEONMAP1(vst1_x4_v, aarch64_neon_st1x4, 0),
NEONMAP1(vst1q_x2_v, aarch64_neon_st1x2, 0),		NEONMAP1(vst1q_x2_v, aarch64_neon_st1x2, 0),
NEONMAP1(vst1q_x3_v, aarch64_neon_st1x3, 0),		NEONMAP1(vst1q_x3_v, aarch64_neon_st1x3, 0),
NEONMAP1(vst1q_x4_v, aarch64_neon_st1x4, 0),		NEONMAP1(vst1q_x4_v, aarch64_neon_st1x4, 0),
NEONMAP0(vsubhn_v),		NEONMAP0(vsubhn_v),
NEONMAP0(vtst_v),		NEONMAP0(vtst_v),
NEONMAP0(vtstq_v),		NEONMAP0(vtstq_v),
		NEONMAP1(vusdot_v, aarch64_neon_usdot, 0),
		NEONMAP1(vusdotq_v, aarch64_neon_usdot, 0),
		NEONMAP1(vusmmlaq_v, aarch64_neon_usmmla, 0),
};		};

static const ARMVectorIntrinsicInfo AArch64SISDIntrinsicMap[] = {		static const ARMVectorIntrinsicInfo AArch64SISDIntrinsicMap[] = {
NEONMAP1(vabdd_f64, aarch64_sisd_fabd, Add1ArgType),		NEONMAP1(vabdd_f64, aarch64_sisd_fabd, Add1ArgType),
NEONMAP1(vabds_f32, aarch64_sisd_fabd, Add1ArgType),		NEONMAP1(vabds_f32, aarch64_sisd_fabd, Add1ArgType),
NEONMAP1(vabsd_s64, aarch64_neon_abs, Add1ArgType),		NEONMAP1(vabsd_s64, aarch64_neon_abs, Add1ArgType),
NEONMAP1(vaddlv_s32, aarch64_neon_saddlv, AddRetType \| Add1ArgType),		NEONMAP1(vaddlv_s32, aarch64_neon_saddlv, AddRetType \| Add1ArgType),
NEONMAP1(vaddlv_u32, aarch64_neon_uaddlv, AddRetType \| Add1ArgType),		NEONMAP1(vaddlv_u32, aarch64_neon_uaddlv, AddRetType \| Add1ArgType),
▲ Show 20 Lines • Show All 966 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
}		}
case NEON::BI__builtin_neon_vfmlsl_high_v:		case NEON::BI__builtin_neon_vfmlsl_high_v:
case NEON::BI__builtin_neon_vfmlslq_high_v: {		case NEON::BI__builtin_neon_vfmlslq_high_v: {
llvm::Type *InputTy =		llvm::Type *InputTy =
llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);		llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);
llvm::Type *Tys[2] = { Ty, InputTy };		llvm::Type *Tys[2] = { Ty, InputTy };
return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlsl_high");		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlsl_high");
}		}
		case NEON::BI__builtin_neon_vmmlaq_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(Int8Ty, Ty->getPrimitiveSizeInBits() / 8);
		llvm::Type *Tys[2] = { Ty, InputTy };
		Int = Usgn ? LLVMIntrinsic : AltLLVMIntrinsic;
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vmmla");
		}
		case NEON::BI__builtin_neon_vusmmlaq_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(Int8Ty, Ty->getPrimitiveSizeInBits() / 8);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vusmmla");
		}
		case NEON::BI__builtin_neon_vusdot_v:
		case NEON::BI__builtin_neon_vusdotq_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(Int8Ty, Ty->getPrimitiveSizeInBits() / 8);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vusdot");
		}
}		}

assert(Int && "Expected valid intrinsic number");		assert(Int && "Expected valid intrinsic number");

// Determine the type(s) of this overloaded AArch64 intrinsic.		// Determine the type(s) of this overloaded AArch64 intrinsic.
Function *F = LookupNeonLLVMIntrinsic(Int, Modifier, Ty, E);		Function *F = LookupNeonLLVMIntrinsic(Int, Modifier, Ty, E);

Value *Result = EmitNeonCall(F, Ops, NameHint);		Value *Result = EmitNeonCall(F, Ops, NameHint);
▲ Show 20 Lines • Show All 9,487 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Arch/AArch64.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (Args.getLastArg(options::OPT_arch) \|\| Triple.isOSDarwin())
return Triple.getArch() == llvm::Triple::aarch64_32 ? "apple-s4"		return Triple.getArch() == llvm::Triple::aarch64_32 ? "apple-s4"
: "apple-a7";		: "apple-a7";

return "generic";		return "generic";
}		}

// Decode AArch64 features from string like +[no]featureA+[no]featureB+...		// Decode AArch64 features from string like +[no]featureA+[no]featureB+...
static bool DecodeAArch64Features(const Driver &D, StringRef text,		static bool DecodeAArch64Features(const Driver &D, StringRef text,
std::vector<StringRef> &Features) {		std::vector<StringRef> &Features,
		llvm::AArch64::ArchKind ArchKind) {
SmallVector<StringRef, 8> Split;		SmallVector<StringRef, 8> Split;
text.split(Split, StringRef("+"), -1, false);		text.split(Split, StringRef("+"), -1, false);

for (StringRef Feature : Split) {		for (StringRef Feature : Split) {
StringRef FeatureName = llvm::AArch64::getArchExtFeature(Feature);		StringRef FeatureName = llvm::AArch64::getArchExtFeature(Feature);
if (!FeatureName.empty())		if (!FeatureName.empty())
Features.push_back(FeatureName);		Features.push_back(FeatureName);
else if (Feature == "neon" \|\| Feature == "noneon")		else if (Feature == "neon" \|\| Feature == "noneon")
D.Diag(clang::diag::err_drv_no_neon_modifier);		D.Diag(clang::diag::err_drv_no_neon_modifier);
else		else
return false;		return false;

		// +sve implies +f32mm if the base architecture is v8.6A
		fhahnUnsubmitted Not Done Reply Inline Actions Not sure if that is the right place to handle the implication. This function just decodes a given feature string. And shouldn't that already be handled in the backend by the implications in llvm/lib/Target/AArch64/AArch64.td? fhahn: Not sure if that is the right place to handle the implication. This function just decodes a…
		if ((ArchKind == llvm::AArch64::ArchKind::ARMV8_6A) && Feature == "sve")
		Features.push_back("+f32mm");
}		}
return true;		return true;
}		}

// Check if the CPU name and feature modifiers in -mcpu are legal. If yes,		// Check if the CPU name and feature modifiers in -mcpu are legal. If yes,
// decode CPU and feature.		// decode CPU and feature.
static bool DecodeAArch64Mcpu(const Driver &D, StringRef Mcpu, StringRef &CPU,		static bool DecodeAArch64Mcpu(const Driver &D, StringRef Mcpu, StringRef &CPU,
std::vector<StringRef> &Features) {		std::vector<StringRef> &Features) {
std::pair<StringRef, StringRef> Split = Mcpu.split("+");		std::pair<StringRef, StringRef> Split = Mcpu.split("+");
CPU = Split.first;		CPU = Split.first;
		llvm::AArch64::ArchKind ArchKind = llvm::AArch64::ArchKind::ARMV8A;

if (CPU == "native")		if (CPU == "native")
CPU = llvm::sys::getHostCPUName();		CPU = llvm::sys::getHostCPUName();

if (CPU == "generic") {		if (CPU == "generic") {
Features.push_back("+neon");		Features.push_back("+neon");
} else {		} else {
llvm::AArch64::ArchKind ArchKind = llvm::AArch64::parseCPUArch(CPU);		ArchKind = llvm::AArch64::parseCPUArch(CPU);
if (!llvm::AArch64::getArchFeatures(ArchKind, Features))		if (!llvm::AArch64::getArchFeatures(ArchKind, Features))
return false;		return false;

unsigned Extension = llvm::AArch64::getDefaultExtensions(CPU, ArchKind);		unsigned Extension = llvm::AArch64::getDefaultExtensions(CPU, ArchKind);
if (!llvm::AArch64::getExtensionFeatures(Extension, Features))		if (!llvm::AArch64::getExtensionFeatures(Extension, Features))
return false;		return false;
}		}

if (Split.second.size() && !DecodeAArch64Features(D, Split.second, Features))		if (Split.second.size() &&
		!DecodeAArch64Features(D, Split.second, Features, ArchKind))
return false;		return false;

return true;		return true;
}		}

static bool		static bool
getAArch64ArchFeaturesFromMarch(const Driver &D, StringRef March,		getAArch64ArchFeaturesFromMarch(const Driver &D, StringRef March,
const ArgList &Args,		const ArgList &Args,
std::vector<StringRef> &Features) {		std::vector<StringRef> &Features) {
std::string MarchLowerCase = March.lower();		std::string MarchLowerCase = March.lower();
std::pair<StringRef, StringRef> Split = StringRef(MarchLowerCase).split("+");		std::pair<StringRef, StringRef> Split = StringRef(MarchLowerCase).split("+");

llvm::AArch64::ArchKind ArchKind = llvm::AArch64::parseArch(Split.first);		llvm::AArch64::ArchKind ArchKind = llvm::AArch64::parseArch(Split.first);
if (ArchKind == llvm::AArch64::ArchKind::INVALID \|\|		if (ArchKind == llvm::AArch64::ArchKind::INVALID \|\|
!llvm::AArch64::getArchFeatures(ArchKind, Features) \|\|		!llvm::AArch64::getArchFeatures(ArchKind, Features) \|\|
(Split.second.size() && !DecodeAArch64Features(D, Split.second, Features)))		(Split.second.size() &&
		!DecodeAArch64Features(D, Split.second, Features, ArchKind)))
return false;		return false;

return true;		return true;
}		}

static bool		static bool
getAArch64ArchFeaturesFromMcpu(const Driver &D, StringRef Mcpu,		getAArch64ArchFeaturesFromMcpu(const Driver &D, StringRef Mcpu,
const ArgList &Args,		const ArgList &Args,
▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-matmul.cpp

This file was added.

				// RUN: %clang_cc1 -triple aarch64-eabi -target-feature +neon -target-feature +i8mm -S -emit-llvm %s -o - \| FileCheck %s

				#ifdef __ARM_FEATURE_MATMUL_INT8
				extern "C" void arm_feature_matmulint8_defined() {}
				#endif
				// CHECK: define void @arm_feature_matmulint8_defined()

clang/test/CodeGen/aarch64-v8.6a-neon-intrinsics.c

This file was added.

				// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +neon -target-feature +fullfp16 -target-feature +v8.6a -target-feature +i8mm \
				// RUN: -fallow-half-arguments-and-returns -S -disable-O0-optnone -emit-llvm -o - %s \
				// RUN: \| opt -S -mem2reg \
				// RUN: \| FileCheck %s

				// REQUIRES: aarch64-registered-target

				#include <arm_neon.h>

				// CHECK-LABEL: test_vmmlaq_s32
				// CHECK: [[VAL:%.*]] = call <4 x i32> @llvm.aarch64.neon.smmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b)
				// CHECK: ret <4 x i32> [[VAL]]
				int32x4_t test_vmmlaq_s32(int32x4_t r, int8x16_t a, int8x16_t b) {
				return vmmlaq_s32(r, a, b);
				}

				// CHECK-LABEL: test_vmmlaq_u32
				// CHECK: [[VAL:%.*]] = call <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b)
				// CHECK: ret <4 x i32> [[VAL]]
				uint32x4_t test_vmmlaq_u32(uint32x4_t r, uint8x16_t a, uint8x16_t b) {
				return vmmlaq_u32(r, a, b);
				}

				// CHECK-LABEL: test_vusmmlaq_s32
				// CHECK: [[VAL:%.*]] = call <4 x i32> @llvm.aarch64.neon.usmmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b)
				// CHECK: ret <4 x i32> [[VAL]]
				int32x4_t test_vusmmlaq_s32(int32x4_t r, uint8x16_t a, int8x16_t b) {
				return vusmmlaq_s32(r, a, b);
				}

				// CHECK-LABEL: test_vusdot_s32
				// CHECK: [[VAL:%.*]] = call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b)
				// CHECK: ret <2 x i32> [[VAL]]
				int32x2_t test_vusdot_s32(int32x2_t r, uint8x8_t a, int8x8_t b) {
				return vusdot_s32(r, a, b);
				}

				// CHECK-LABEL: test_vusdot_lane_s32
				// CHECK: [[REINT:%.*]] = alloca <8 x i8>
				// CHECK: store <8 x i8> %b, <8 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <8 x i8> [[REINT]] to <2 x i32>*
				// CHECK: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <2 x i32> [[TMP1]] to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <2 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <2 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[LANE]] to <8 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <2 x i32> %r to <8 x i8>
				// CHECK: [[OP:%.*]] = call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> [[TMP4]])
				// CHECK: ret <2 x i32> [[OP]]
				int32x2_t test_vusdot_lane_s32(int32x2_t r, uint8x8_t a, int8x8_t b) {
				return vusdot_lane_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vsudot_lane_s32
				// CHECK: [[REINT:%.*]] = alloca <8 x i8>
				// CHECK: store <8 x i8> %b, <8 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <8 x i8> [[REINT]] to <2 x i32>*
				// CHECK: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <2 x i32> %1 to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> %2 to <2 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> %3, <2 x i32> %3, <2 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[LANE]] to <8 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <2 x i32> %r to <8 x i8>
				// CHECK: [[OP:%.*]] = call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> [[TMP4]], <8 x i8> %a)
				// CHECK: ret <2 x i32> [[OP]]
				int32x2_t test_vsudot_lane_s32(int32x2_t r, int8x8_t a, uint8x8_t b) {
				return vsudot_lane_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vusdot_laneq_s32
				// CHECK: [[REINT:%.*]] = alloca <16 x i8>
				// CHECK: store <16 x i8> %b, <16 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <16 x i8> [[REINT]] to <4 x i32>*
				// CHECK: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <4 x i32> [[TMP1]] to <16 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <4 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP3]], <2 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[LANE]] to <8 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <2 x i32> %r to <8 x i8>
				// CHECK: [[OP:%.*]] = call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> [[TMP4]])
				// CHECK: ret <2 x i32> [[OP]]
				int32x2_t test_vusdot_laneq_s32(int32x2_t r, uint8x8_t a, int8x16_t b) {
				return vusdot_laneq_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vsudot_laneq_s32
				// CHECK: [[REINT:%.*]] = alloca <16 x i8>
				// CHECK: store <16 x i8> %b, <16 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <16 x i8> [[REINT]] to <4 x i32>*
				// CHECK: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <4 x i32> [[TMP1]] to <16 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <4 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP3]], <2 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[LANE]] to <8 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <2 x i32> %r to <8 x i8>
				// CHECK: [[OP:%.*]] = call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> [[TMP4]], <8 x i8> %a)
				// CHECK: ret <2 x i32> [[OP]]
				int32x2_t test_vsudot_laneq_s32(int32x2_t r, int8x8_t a, uint8x16_t b) {
				return vsudot_laneq_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vusdotq_s32
				// CHECK: [[VAL:%.*]] = call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b)
				// CHECK: ret <4 x i32> [[VAL]]
				int32x4_t test_vusdotq_s32(int32x4_t r, uint8x16_t a, int8x16_t b) {
				return vusdotq_s32(r, a, b);
				}

				// CHECK-LABEL: test_vusdotq_lane_s32
				// CHECK: [[REINT:%.*]] = alloca <8 x i8>
				// CHECK: store <8 x i8> %b, <8 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <8 x i8> [[REINT]] to <2 x i32>*
				// CHECK: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <2 x i32> [[TMP1]] to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <2 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <4 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[LANE]] to <16 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <4 x i32> %r to <16 x i8>
				// CHECK: [[OP:%.*]] = call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> [[TMP4]])
				// CHECK: ret <4 x i32> [[OP]]
				int32x4_t test_vusdotq_lane_s32(int32x4_t r, uint8x16_t a, int8x8_t b) {
				return vusdotq_lane_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vsudotq_lane_s32
				// CHECK: [[REINT:%.*]] = alloca <8 x i8>
				// CHECK: store <8 x i8> %b, <8 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <8 x i8> [[REINT]] to <2 x i32>*
				// CHECK: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <2 x i32> [[TMP1]] to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <2 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <4 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[LANE]] to <16 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <4 x i32> %r to <16 x i8>
				// CHECK: [[OP:%.*]] = call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> [[TMP4]], <16 x i8> %a)
				// CHECK: ret <4 x i32> [[OP]]
				int32x4_t test_vsudotq_lane_s32(int32x4_t r, int8x16_t a, uint8x8_t b) {
				return vsudotq_lane_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vusdotq_laneq_s32
				// CHECK: [[REINT:%.*]] = alloca <16 x i8>
				// CHECK: store <16 x i8> %b, <16 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <16 x i8> [[REINT]] to <4 x i32>*
				// CHECK: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <4 x i32> [[TMP1]] to <16 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <4 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP3]], <4 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[LANE]] to <16 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <4 x i32> %r to <16 x i8>
				// CHECK: [[OP:%.*]] = call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> [[TMP4]])
				// CHECK: ret <4 x i32> [[OP]]
				int32x4_t test_vusdotq_laneq_s32(int32x4_t r, uint8x16_t a, int8x16_t b) {
				return vusdotq_laneq_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vsudotq_laneq_s32
				// CHECK: [[REINT:%.*]] = alloca <16 x i8>
				// CHECK: store <16 x i8> %b, <16 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <16 x i8> [[REINT]] to <4 x i32>*
				// CHECK: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <4 x i32> [[TMP1]] to <16 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP2]] to <4 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP3]], <4 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[LANE]] to <16 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <4 x i32> %r to <16 x i8>
				// CHECK: [[OP:%.*]] = call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> [[TMP4]], <16 x i8> %a)
				// CHECK: ret <4 x i32> [[OP]]
				int32x4_t test_vsudotq_laneq_s32(int32x4_t r, int8x16_t a, uint8x16_t b) {
				return vsudotq_laneq_s32(r, a, b, 0);
				}

clang/test/CodeGen/arm-v8.6a-neon-intrinsics.c

This file was added.

				// RUN: %clang_cc1 -triple armv8.6a-arm-none-eabi -target-feature +neon -target-feature +fullfp16 -target-feature +i8mm \
				// RUN: -fallow-half-arguments-and-returns -S -disable-O0-optnone -emit-llvm -o - %s \
				// RUN: \| opt -S -mem2reg \
				// RUN: \| FileCheck %s

				// REQUIRES: arm-registered-target

				#include <arm_neon.h>

				// CHECK-LABEL: test_vmmlaq_s32
				// CHECK: [[VAL:%.*]] = call <4 x i32> @llvm.arm.neon.smmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b)
				// CHECK: ret <4 x i32> [[VAL]]
				int32x4_t test_vmmlaq_s32(int32x4_t r, int8x16_t a, int8x16_t b) {
				return vmmlaq_s32(r, a, b);
				}

				// CHECK-LABEL: test_vmmlaq_u32
				// CHECK: [[VAL:%.*]] = call <4 x i32> @llvm.arm.neon.ummla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b)
				// CHECK: ret <4 x i32> [[VAL]]
				uint32x4_t test_vmmlaq_u32(uint32x4_t r, uint8x16_t a, uint8x16_t b) {
				return vmmlaq_u32(r, a, b);
				}

				// CHECK-LABEL: test_vusmmlaq_s32
				// CHECK: [[VAL:%.*]] = call <4 x i32> @llvm.arm.neon.usmmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b)
				// CHECK: ret <4 x i32> [[VAL]]
				int32x4_t test_vusmmlaq_s32(int32x4_t r, uint8x16_t a, int8x16_t b) {
				return vusmmlaq_s32(r, a, b);
				}

				// CHECK-LABEL: test_vusdot_s32
				// CHECK: [[VAL:%.*]] = call <2 x i32> @llvm.arm.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b)
				// CHECK: ret <2 x i32> [[VAL]]
				int32x2_t test_vusdot_s32(int32x2_t r, uint8x8_t a, int8x8_t b) {
				return vusdot_s32(r, a, b);
				}

				// CHECK-LABEL: test_vusdot_lane_s32
				// CHECK: [[REINT:%.*]] = alloca <8 x i8>
				// CHECK: store <8 x i8> %b, <8 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <8 x i8> [[REINT]] to <2 x i32>*
				// CHECK: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <2 x i32> [[TMP1]] to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <2 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <2 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[LANE]] to <8 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <2 x i32> %r to <8 x i8>
				// CHECK: [[OP:%.*]] = call <2 x i32> @llvm.arm.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> [[TMP4]])
				// CHECK: ret <2 x i32> [[OP]]
				int32x2_t test_vusdot_lane_s32(int32x2_t r, uint8x8_t a, int8x8_t b) {
				return vusdot_lane_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vsudot_lane_s32
				// CHECK: [[REINT:%.*]] = alloca <8 x i8>
				// CHECK: store <8 x i8> %b, <8 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <8 x i8> [[REINT]] to <2 x i32>*
				// CHECK: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <2 x i32> [[TMP1]] to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <2 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <2 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <2 x i32> [[LANE]] to <8 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <2 x i32> %r to <8 x i8>
				// CHECK: [[OP:%.*]] = call <2 x i32> @llvm.arm.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> [[TMP4]], <8 x i8> %a)
				// CHECK: ret <2 x i32> [[OP]]
				int32x2_t test_vsudot_lane_s32(int32x2_t r, int8x8_t a, uint8x8_t b) {
				return vsudot_lane_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vusdotq_lane_s32
				// CHECK: [[REINT:%.*]] = alloca <8 x i8>
				// CHECK: store <8 x i8> %b, <8 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <8 x i8> [[REINT]] to <2 x i32>*
				// CHECK: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <2 x i32> [[TMP1]] to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <2 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP3]], <4 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[LANE]] to <16 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <4 x i32> %r to <16 x i8>
				// CHECK: [[OP:%.*]] = call <4 x i32> @llvm.arm.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> [[TMP4]])
				// CHECK: ret <4 x i32> [[OP]]
				int32x4_t test_vusdotq_lane_s32(int32x4_t r, uint8x16_t a, int8x8_t b) {
				return vusdotq_lane_s32(r, a, b, 0);
				}

				// CHECK-LABEL: test_vsudotq_lane_s32
				// CHECK: [[REINT:%.*]] = alloca <8 x i8>
				// CHECK: store <8 x i8> %b, <8 x i8>* [[REINT]]
				// CHECK: [[TMP0:%.]] = bitcast <8 x i8> [[REINT]] to <2 x i32>*
				// CHECK: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]]
				// CHECK: [[TMP2:%.*]] = bitcast <2 x i32> %1 to <8 x i8>
				// CHECK: [[TMP3:%.*]] = bitcast <8 x i8> %2 to <2 x i32>
				// CHECK: [[LANE:%.*]] = shufflevector <2 x i32> %3, <2 x i32> %3, <4 x i32> zeroinitializer
				// CHECK: [[TMP4:%.*]] = bitcast <4 x i32> [[LANE]] to <16 x i8>
				// CHECK: [[TMP5:%.*]] = bitcast <4 x i32> %r to <16 x i8>
				// CHECK: [[OP:%.*]] = call <4 x i32> @llvm.arm.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> [[TMP4]], <16 x i8> %a)
				// CHECK: ret <4 x i32> [[OP]]
				int32x4_t test_vsudotq_lane_s32(int32x4_t r, int8x16_t a, uint8x8_t b) {
				return vsudotq_lane_s32(r, a, b, 0);
				}

clang/test/Driver/aarch64-cpus.c

	Show First 20 Lines • Show All 630 Lines • ▼ Show 20 Lines
	// optional feature for any implementation of Armv8.2-A to Armv8.5-A (inclusive)			// optional feature for any implementation of Armv8.2-A to Armv8.5-A (inclusive)
	// RUN: %clang -target aarch64 -march=armv8.5a+bf16 -### -c %s 2>&1 \| FileCheck -check-prefix=GENERICV85A-BF16 %s			// RUN: %clang -target aarch64 -march=armv8.5a+bf16 -### -c %s 2>&1 \| FileCheck -check-prefix=GENERICV85A-BF16 %s
	// GENERICV85A-BF16: "-target-feature" "+bf16"			// GENERICV85A-BF16: "-target-feature" "+bf16"
	// RUN: %clang -target aarch64 -march=armv8.5a+bf16+nobf16 -### -c %s 2>&1 \| FileCheck -check-prefix=GENERICV85A-BF16-NO-BF16 %s			// RUN: %clang -target aarch64 -march=armv8.5a+bf16+nobf16 -### -c %s 2>&1 \| FileCheck -check-prefix=GENERICV85A-BF16-NO-BF16 %s
	// GENERICV85A-BF16-NO-BF16: "-target-feature" "-bf16"			// GENERICV85A-BF16-NO-BF16: "-target-feature" "-bf16"
	// RUN: %clang -target aarch64 -march=armv8.5a+bf16+sve -### -c %s 2>&1 \| FileCheck -check-prefixes=GENERICV85A-BF16-SVE %s			// RUN: %clang -target aarch64 -march=armv8.5a+bf16+sve -### -c %s 2>&1 \| FileCheck -check-prefixes=GENERICV85A-BF16-SVE %s
	// GENERICV85A-BF16-SVE: "-target-feature" "+bf16" "-target-feature" "+sve"			// GENERICV85A-BF16-SVE: "-target-feature" "+bf16" "-target-feature" "+sve"

				// The 8-bit integer matrix multiply extension is a mandatory component of the
				// Armv8.6-A extensions, but is permitted as an optional feature for any
				// implementation of Armv8.2-A to Armv8.5-A (inclusive)
				// RUN: %clang -target aarch64 -march=armv8.5a -### -c %s 2>&1 \| FileCheck -check-prefix=NO-I8MM %s
				// RUN: %clang -target aarch64 -march=armv8.5a+i8mm -### -c %s 2>&1 \| FileCheck -check-prefix=I8MM %s
				// NO-I8MM-NOT: "-target-feature" "+i8mm"
				// I8MM: "-target-feature" "+i8mm"

				// The 32-bit floating point matrix multiply extension is enabled by default
				// for armv8.6-a targets (or later) with SVE, and can optionally be enabled for
				// any target from armv8.2a onwards (we don't enforce not using it with earlier
				// targets).
				// RUN: %clang -target aarch64 -march=armv8.6a -### -c %s 2>&1 \| FileCheck -check-prefix=NO-F32MM %s
				// RUN: %clang -target aarch64 -march=armv8.6a+sve -### -c %s 2>&1 \| FileCheck -check-prefix=F32MM %s
				// RUN: %clang -target aarch64 -march=armv8.5a+f32mm -### -c %s 2>&1 \| FileCheck -check-prefix=F32MM %s
				// NO-F32MM-NOT: "-target-feature" "+f32mm"
				// F32MM: "-target-feature" "+f32mm"

				// The 64-bit floating point matrix multiply extension is not currently enabled
				// by default for any targets, because it requires an SVE vector length >= 256
				// bits. When we add a CPU which has that, then it can be enabled by default,
				// but for now it can only be used by adding the +f64mm feature.
				// RUN: %clang -target aarch64 -march=armv8.6a -### -c %s 2>&1 \| FileCheck -check-prefix=NO-F64MM %s
				// RUN: %clang -target aarch64 -march=armv8.6a+sve -### -c %s 2>&1 \| FileCheck -check-prefix=NO-F64MM %s
				// RUN: %clang -target aarch64 -march=armv8.6a+f64mm -### -c %s 2>&1 \| FileCheck -check-prefix=F64MM %s
				// NO-F64MM-NOT: "-target-feature" "+f64mm"
				// F64MM: "-target-feature" "+f64mm"

	// fullfp16 is off by default for v8a, feature must not be mentioned			// fullfp16 is off by default for v8a, feature must not be mentioned
	// RUN: %clang -target aarch64 -march=armv8a -### -c %s 2>&1 \| FileCheck -check-prefix=V82ANOFP16 -check-prefix=GENERIC %s			// RUN: %clang -target aarch64 -march=armv8a -### -c %s 2>&1 \| FileCheck -check-prefix=V82ANOFP16 -check-prefix=GENERIC %s
	// RUN: %clang -target aarch64 -march=armv8-a -### -c %s 2>&1 \| FileCheck -check-prefix=V82ANOFP16 -check-prefix=GENERIC %s			// RUN: %clang -target aarch64 -march=armv8-a -### -c %s 2>&1 \| FileCheck -check-prefix=V82ANOFP16 -check-prefix=GENERIC %s
	// V82ANOFP16-NOT: "-target-feature" "{{[+-]}}fp16fml"			// V82ANOFP16-NOT: "-target-feature" "{{[+-]}}fp16fml"
	// V82ANOFP16-NOT: "-target-feature" "{{[+-]}}fullfp16"			// V82ANOFP16-NOT: "-target-feature" "{{[+-]}}fullfp16"

	// RAS is on by default for v8.2a, but can be disabled by +noras			// RAS is on by default for v8.2a, but can be disabled by +noras
	// RUN: %clang -target aarch64 -march=armv8.2a -### -c %s 2>&1 \| FileCheck -check-prefix=V82ARAS -check-prefix=GENERICV82A %s			// RUN: %clang -target aarch64 -march=armv8.2a -### -c %s 2>&1 \| FileCheck -check-prefix=V82ARAS -check-prefix=GENERICV82A %s
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

clang/test/Driver/arm-matrix-multiply.c

This file was added.

				// RUN: %clang -### -target arm-none-none-eabi -march=armv8.5a+i8mm %s 2>&1 \| FileCheck %s
				// RUN: %clang -### -target aarch64-none-none-eabi -march=armv8.5a+i8mm %s 2>&1 \| FileCheck %s
				// CHECK: "-target-feature" "+i8mm"
				// CHECK-NOT: "-target-feature" "-i8mm"

				// RUN: %clang -### -target arm-none-none-eabi -march=armv8.6a+noi8mm %s 2>&1 \| FileCheck %s --check-prefix=NOI8MM
				// RUN: %clang -### -target aarch64-none-none-eabi -march=armv8.6a+noi8mm %s 2>&1 \| FileCheck %s --check-prefix=NOI8MM
				// NOI8MM: "-target-feature" "-i8mm"
				// NOI8MM-NOT: "-target-feature" "+i8mm"

				// RUN: %clang -### -target arm-none-none-eabi %s 2>&1 \| FileCheck %s --check-prefix=ABSENT
				// RUN: %clang -### -target aarch64-none-none-eabi %s 2>&1 \| FileCheck %s --check-prefix=ABSENT
				// ABSENT-NOT: "-target-feature" "+i8mm"
				// ABSENT-NOT: "-target-feature" "-i8mm"

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	class AdvSIMD_Dot_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, llvm_anyvector_ty, LLVMMatchType<1>],		[LLVMMatchType<0>, llvm_anyvector_ty, LLVMMatchType<1>],
[IntrNoMem]>;		[IntrNoMem]>;

class AdvSIMD_FP16FML_Intrinsic		class AdvSIMD_FP16FML_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, llvm_anyvector_ty, LLVMMatchType<1>],		[LLVMMatchType<0>, llvm_anyvector_ty, LLVMMatchType<1>],
[IntrNoMem]>;		[IntrNoMem]>;

		class AdvSIMD_MatMul_Intrinsic
		: Intrinsic<[llvm_anyvector_ty],
		[LLVMMatchType<0>, llvm_anyvector_ty, LLVMMatchType<1>],
		[IntrNoMem]>;
}		}

// Arithmetic ops		// Arithmetic ops

let TargetPrefix = "aarch64", IntrProperties = [IntrNoMem] in {		let TargetPrefix = "aarch64", IntrProperties = [IntrNoMem] in {
// Vector Add Across Lanes		// Vector Add Across Lanes
def int_aarch64_neon_saddv : AdvSIMD_1VectorArg_Int_Across_Intrinsic;		def int_aarch64_neon_saddv : AdvSIMD_1VectorArg_Int_Across_Intrinsic;
def int_aarch64_neon_uaddv : AdvSIMD_1VectorArg_Int_Across_Intrinsic;		def int_aarch64_neon_uaddv : AdvSIMD_1VectorArg_Int_Across_Intrinsic;
▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	let TargetPrefix = "aarch64", IntrProperties = [IntrNoMem] in {
// Scalar FP Inexact Narrowing		// Scalar FP Inexact Narrowing
def int_aarch64_sisd_fcvtxn : Intrinsic<[llvm_float_ty], [llvm_double_ty],		def int_aarch64_sisd_fcvtxn : Intrinsic<[llvm_float_ty], [llvm_double_ty],
[IntrNoMem]>;		[IntrNoMem]>;

// v8.2-A Dot Product		// v8.2-A Dot Product
def int_aarch64_neon_udot : AdvSIMD_Dot_Intrinsic;		def int_aarch64_neon_udot : AdvSIMD_Dot_Intrinsic;
def int_aarch64_neon_sdot : AdvSIMD_Dot_Intrinsic;		def int_aarch64_neon_sdot : AdvSIMD_Dot_Intrinsic;

		// v8.6-A Matrix Multiply Intrinsics
		def int_aarch64_neon_ummla : AdvSIMD_MatMul_Intrinsic;
		def int_aarch64_neon_smmla : AdvSIMD_MatMul_Intrinsic;
		def int_aarch64_neon_usmmla : AdvSIMD_MatMul_Intrinsic;
		def int_aarch64_neon_usdot : AdvSIMD_Dot_Intrinsic;

// v8.2-A FP16 Fused Multiply-Add Long		// v8.2-A FP16 Fused Multiply-Add Long
def int_aarch64_neon_fmlal : AdvSIMD_FP16FML_Intrinsic;		def int_aarch64_neon_fmlal : AdvSIMD_FP16FML_Intrinsic;
def int_aarch64_neon_fmlsl : AdvSIMD_FP16FML_Intrinsic;		def int_aarch64_neon_fmlsl : AdvSIMD_FP16FML_Intrinsic;
def int_aarch64_neon_fmlal2 : AdvSIMD_FP16FML_Intrinsic;		def int_aarch64_neon_fmlal2 : AdvSIMD_FP16FML_Intrinsic;
def int_aarch64_neon_fmlsl2 : AdvSIMD_FP16FML_Intrinsic;		def int_aarch64_neon_fmlsl2 : AdvSIMD_FP16FML_Intrinsic;

// v8.3-A Floating-point complex add		// v8.3-A Floating-point complex add
def int_aarch64_neon_vcadd_rot90 : AdvSIMD_2VectorArg_Intrinsic;		def int_aarch64_neon_vcadd_rot90 : AdvSIMD_2VectorArg_Intrinsic;
▲ Show 20 Lines • Show All 1,812 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsARM.td

	Show First 20 Lines • Show All 767 Lines • ▼ Show 20 Lines
	class Neon_Dot_Intrinsic			class Neon_Dot_Intrinsic
	: Intrinsic<[llvm_anyvector_ty],			: Intrinsic<[llvm_anyvector_ty],
	[LLVMMatchType<0>, llvm_anyvector_ty,			[LLVMMatchType<0>, llvm_anyvector_ty,
	LLVMMatchType<1>],			LLVMMatchType<1>],
	[IntrNoMem]>;			[IntrNoMem]>;
	def int_arm_neon_udot : Neon_Dot_Intrinsic;			def int_arm_neon_udot : Neon_Dot_Intrinsic;
	def int_arm_neon_sdot : Neon_Dot_Intrinsic;			def int_arm_neon_sdot : Neon_Dot_Intrinsic;

				// v8.6-A Matrix Multiply Intrinsics
				class Neon_MatMul_Intrinsic
				: Intrinsic<[llvm_anyvector_ty],
				[LLVMMatchType<0>, llvm_anyvector_ty,
				LLVMMatchType<1>],
				[IntrNoMem]>;
				def int_arm_neon_ummla : Neon_MatMul_Intrinsic;
				def int_arm_neon_smmla : Neon_MatMul_Intrinsic;
				def int_arm_neon_usmmla : Neon_MatMul_Intrinsic;
				def int_arm_neon_usdot : Neon_Dot_Intrinsic;

				// v8.6-A Bfloat Intrinsics

	def int_arm_cls: Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem]>;			def int_arm_cls: Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem]>;
	def int_arm_cls64: Intrinsic<[llvm_i32_ty], [llvm_i64_ty], [IntrNoMem]>;			def int_arm_cls64: Intrinsic<[llvm_i32_ty], [llvm_i64_ty], [IntrNoMem]>;

	def int_arm_mve_vctp8 : Intrinsic<[llvm_v16i1_ty], [llvm_i32_ty], [IntrNoMem]>;			def int_arm_mve_vctp8 : Intrinsic<[llvm_v16i1_ty], [llvm_i32_ty], [IntrNoMem]>;
	def int_arm_mve_vctp16 : Intrinsic<[llvm_v8i1_ty], [llvm_i32_ty], [IntrNoMem]>;			def int_arm_mve_vctp16 : Intrinsic<[llvm_v8i1_ty], [llvm_i32_ty], [IntrNoMem]>;
	def int_arm_mve_vctp32 : Intrinsic<[llvm_v4i1_ty], [llvm_i32_ty], [IntrNoMem]>;			def int_arm_mve_vctp32 : Intrinsic<[llvm_v4i1_ty], [llvm_i32_ty], [IntrNoMem]>;
	// vctp64 takes v4i1, to work around v2i1 not being a legal MVE type			// vctp64 takes v4i1, to work around v2i1 not being a legal MVE type
	def int_arm_mve_vctp64 : Intrinsic<[llvm_v4i1_ty], [llvm_i32_ty], [IntrNoMem]>;			def int_arm_mve_vctp64 : Intrinsic<[llvm_v4i1_ty], [llvm_i32_ty], [IntrNoMem]>;
	▲ Show 20 Lines • Show All 569 Lines • Show Last 20 Lines

llvm/include/llvm/Support/AArch64TargetParser.h

Show All 18 Lines
#include "llvm/Support/ARMTargetParser.h"		#include "llvm/Support/ARMTargetParser.h"
#include <vector>		#include <vector>

// FIXME:This should be made into class design,to avoid dupplication.		// FIXME:This should be made into class design,to avoid dupplication.
namespace llvm {		namespace llvm {
namespace AArch64 {		namespace AArch64 {

// Arch extension modifiers for CPUs.		// Arch extension modifiers for CPUs.
enum ArchExtKind : unsigned {		enum ArchExtKind : uint64_t {
AEK_INVALID = 0,		AEK_INVALID = 0,
AEK_NONE = 1,		AEK_NONE = 1,
AEK_CRC = 1 << 1,		AEK_CRC = 1 << 1,
AEK_CRYPTO = 1 << 2,		AEK_CRYPTO = 1 << 2,
AEK_FP = 1 << 3,		AEK_FP = 1 << 3,
AEK_SIMD = 1 << 4,		AEK_SIMD = 1 << 4,
AEK_FP16 = 1 << 5,		AEK_FP16 = 1 << 5,
AEK_PROFILE = 1 << 6,		AEK_PROFILE = 1 << 6,
Show All 16 Lines	enum ArchExtKind : uint64_t {
AEK_SVE2 = 1 << 23,		AEK_SVE2 = 1 << 23,
AEK_SVE2AES = 1 << 24,		AEK_SVE2AES = 1 << 24,
AEK_SVE2SM4 = 1 << 25,		AEK_SVE2SM4 = 1 << 25,
AEK_SVE2SHA3 = 1 << 26,		AEK_SVE2SHA3 = 1 << 26,
AEK_SVE2BITPERM = 1 << 27,		AEK_SVE2BITPERM = 1 << 27,
AEK_TME = 1 << 28,		AEK_TME = 1 << 28,
AEK_BF16 = 1 << 29,		AEK_BF16 = 1 << 29,
AEK_I8MM = 1 << 30,		AEK_I8MM = 1 << 30,
		AEK_F32MM = 1ULL << 31,
		AEK_F64MM = 1ULL << 32,
};		};

enum class ArchKind {		enum class ArchKind {
#define AARCH64_ARCH(NAME, ID, CPU_ATTR, SUB_ARCH, ARCH_ATTR, ARCH_FPU, ARCH_BASE_EXT) ID,		#define AARCH64_ARCH(NAME, ID, CPU_ATTR, SUB_ARCH, ARCH_ATTR, ARCH_FPU, ARCH_BASE_EXT) ID,
#include "AArch64TargetParser.def"		#include "AArch64TargetParser.def"
};		};

const ARM::ArchNames<ArchKind> AArch64ARCHNames[] = {		const ARM::ArchNames<ArchKind> AArch64ARCHNames[] = {
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/include/llvm/Support/AArch64TargetParser.def

	Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	AARCH64_ARCH_EXT_NAME("rcpc", AArch64::AEK_RCPC, "+rcpc", "-rcpc")			AARCH64_ARCH_EXT_NAME("rcpc", AArch64::AEK_RCPC, "+rcpc", "-rcpc")
	AARCH64_ARCH_EXT_NAME("rng", AArch64::AEK_RAND, "+rand", "-rand")			AARCH64_ARCH_EXT_NAME("rng", AArch64::AEK_RAND, "+rand", "-rand")
	AARCH64_ARCH_EXT_NAME("memtag", AArch64::AEK_MTE, "+mte", "-mte")			AARCH64_ARCH_EXT_NAME("memtag", AArch64::AEK_MTE, "+mte", "-mte")
	AARCH64_ARCH_EXT_NAME("ssbs", AArch64::AEK_SSBS, "+ssbs", "-ssbs")			AARCH64_ARCH_EXT_NAME("ssbs", AArch64::AEK_SSBS, "+ssbs", "-ssbs")
	AARCH64_ARCH_EXT_NAME("sb", AArch64::AEK_SB, "+sb", "-sb")			AARCH64_ARCH_EXT_NAME("sb", AArch64::AEK_SB, "+sb", "-sb")
	AARCH64_ARCH_EXT_NAME("predres", AArch64::AEK_PREDRES, "+predres", "-predres")			AARCH64_ARCH_EXT_NAME("predres", AArch64::AEK_PREDRES, "+predres", "-predres")
	AARCH64_ARCH_EXT_NAME("bf16", AArch64::AEK_BF16, "+bf16", "-bf16")			AARCH64_ARCH_EXT_NAME("bf16", AArch64::AEK_BF16, "+bf16", "-bf16")
	AARCH64_ARCH_EXT_NAME("i8mm", AArch64::AEK_I8MM, "+i8mm", "-i8mm")			AARCH64_ARCH_EXT_NAME("i8mm", AArch64::AEK_I8MM, "+i8mm", "-i8mm")
				AARCH64_ARCH_EXT_NAME("f32mm", AArch64::AEK_F32MM, "+f32mm", "-f32mm")
				AARCH64_ARCH_EXT_NAME("f64mm", AArch64::AEK_F64MM, "+f64mm", "-f64mm")
	AARCH64_ARCH_EXT_NAME("tme", AArch64::AEK_TME, "+tme", "-tme")			AARCH64_ARCH_EXT_NAME("tme", AArch64::AEK_TME, "+tme", "-tme")
	#undef AARCH64_ARCH_EXT_NAME			#undef AARCH64_ARCH_EXT_NAME

	#ifndef AARCH64_CPU_NAME			#ifndef AARCH64_CPU_NAME
	#define AARCH64_CPU_NAME(NAME, ID, DEFAULT_FPU, IS_DEFAULT, DEFAULT_EXT)			#define AARCH64_CPU_NAME(NAME, ID, DEFAULT_FPU, IS_DEFAULT, DEFAULT_EXT)
	#endif			#endif
	AARCH64_CPU_NAME("cortex-a34", ARMV8A, FK_CRYPTO_NEON_FP_ARMV8, false,			AARCH64_CPU_NAME("cortex-a34", ARMV8A, FK_CRYPTO_NEON_FP_ARMV8, false,
	(AArch64::AEK_CRC))			(AArch64::AEK_CRC))
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/include/llvm/Support/ARMTargetParser.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	enum ArchExtKind : uint64_t {
AEK_DOTPROD = 1 << 13,		AEK_DOTPROD = 1 << 13,
AEK_SHA2 = 1 << 14,		AEK_SHA2 = 1 << 14,
AEK_AES = 1 << 15,		AEK_AES = 1 << 15,
AEK_FP16FML = 1 << 16,		AEK_FP16FML = 1 << 16,
AEK_SB = 1 << 17,		AEK_SB = 1 << 17,
AEK_FP_DP = 1 << 18,		AEK_FP_DP = 1 << 18,
AEK_LOB = 1 << 19,		AEK_LOB = 1 << 19,
AEK_BF16 = 1 << 20,		AEK_BF16 = 1 << 20,
AEK_CDECP0 = 1 << 21,		AEK_I8MM = 1 << 21,
AEK_CDECP1 = 1 << 22,		AEK_CDECP0 = 1 << 22,
AEK_CDECP2 = 1 << 23,		AEK_CDECP1 = 1 << 23,
AEK_CDECP3 = 1 << 24,		AEK_CDECP2 = 1 << 24,
AEK_CDECP4 = 1 << 25,		AEK_CDECP3 = 1 << 25,
AEK_CDECP5 = 1 << 26,		AEK_CDECP4 = 1 << 26,
AEK_CDECP6 = 1 << 27,		AEK_CDECP5 = 1 << 27,
AEK_CDECP7 = 1 << 28,		AEK_CDECP6 = 1 << 28,
		AEK_CDECP7 = 1 << 29,

// Unsupported extensions.		// Unsupported extensions.
AEK_OS = 1ULL << 59,		AEK_OS = 1ULL << 59,
AEK_IWMMXT = 1ULL << 60,		AEK_IWMMXT = 1ULL << 60,
AEK_IWMMXT2 = 1ULL << 61,		AEK_IWMMXT2 = 1ULL << 61,
AEK_MAVERICK = 1ULL << 62,		AEK_MAVERICK = 1ULL << 62,
AEK_XSCALE = 1ULL << 63,		AEK_XSCALE = 1ULL << 63,
};		};
▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

llvm/include/llvm/Support/ARMTargetParser.def

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	ARM_ARCH("armv8.5-a", ARMV8_5A, "8.5-A", "v8.5a",
ARMBuildAttrs::CPUArch::v8_A, FK_CRYPTO_NEON_FP_ARMV8,		ARMBuildAttrs::CPUArch::v8_A, FK_CRYPTO_NEON_FP_ARMV8,
(ARM::AEK_SEC \| ARM::AEK_MP \| ARM::AEK_VIRT \| ARM::AEK_HWDIVARM \|		(ARM::AEK_SEC \| ARM::AEK_MP \| ARM::AEK_VIRT \| ARM::AEK_HWDIVARM \|
ARM::AEK_HWDIVTHUMB \| ARM::AEK_DSP \| ARM::AEK_CRC \| ARM::AEK_RAS \|		ARM::AEK_HWDIVTHUMB \| ARM::AEK_DSP \| ARM::AEK_CRC \| ARM::AEK_RAS \|
ARM::AEK_DOTPROD))		ARM::AEK_DOTPROD))
ARM_ARCH("armv8.6-a", ARMV8_6A, "8.6-A", "v8.6a",		ARM_ARCH("armv8.6-a", ARMV8_6A, "8.6-A", "v8.6a",
ARMBuildAttrs::CPUArch::v8_A, FK_CRYPTO_NEON_FP_ARMV8,		ARMBuildAttrs::CPUArch::v8_A, FK_CRYPTO_NEON_FP_ARMV8,
(ARM::AEK_SEC \| ARM::AEK_MP \| ARM::AEK_VIRT \| ARM::AEK_HWDIVARM \|		(ARM::AEK_SEC \| ARM::AEK_MP \| ARM::AEK_VIRT \| ARM::AEK_HWDIVARM \|
ARM::AEK_HWDIVTHUMB \| ARM::AEK_DSP \| ARM::AEK_CRC \| ARM::AEK_RAS \|		ARM::AEK_HWDIVTHUMB \| ARM::AEK_DSP \| ARM::AEK_CRC \| ARM::AEK_RAS \|
ARM::AEK_DOTPROD \| ARM::AEK_BF16 \| ARM::AEK_SHA2 \| ARM::AEK_AES))		ARM::AEK_DOTPROD \| ARM::AEK_BF16 \| ARM::AEK_SHA2 \| ARM::AEK_AES \|
		ARM::AEK_I8MM))
ARM_ARCH("armv8-r", ARMV8R, "8-R", "v8r", ARMBuildAttrs::CPUArch::v8_R,		ARM_ARCH("armv8-r", ARMV8R, "8-R", "v8r", ARMBuildAttrs::CPUArch::v8_R,
FK_NEON_FP_ARMV8,		FK_NEON_FP_ARMV8,
(ARM::AEK_MP \| ARM::AEK_VIRT \| ARM::AEK_HWDIVARM \| ARM::AEK_HWDIVTHUMB \|		(ARM::AEK_MP \| ARM::AEK_VIRT \| ARM::AEK_HWDIVARM \| ARM::AEK_HWDIVTHUMB \|
ARM::AEK_DSP \| ARM::AEK_CRC))		ARM::AEK_DSP \| ARM::AEK_CRC))
ARM_ARCH("armv8-m.base", ARMV8MBaseline, "8-M.Baseline", "v8m.base",		ARM_ARCH("armv8-m.base", ARMV8MBaseline, "8-M.Baseline", "v8m.base",
ARMBuildAttrs::CPUArch::v8_M_Base, FK_NONE, ARM::AEK_HWDIVTHUMB)		ARMBuildAttrs::CPUArch::v8_M_Base, FK_NONE, ARM::AEK_HWDIVTHUMB)
ARM_ARCH("armv8-m.main", ARMV8MMainline, "8-M.Mainline", "v8m.main",		ARM_ARCH("armv8-m.main", ARMV8MMainline, "8-M.Mainline", "v8m.main",
ARMBuildAttrs::CPUArch::v8_M_Main, FK_FPV5_D16, ARM::AEK_HWDIVTHUMB)		ARMBuildAttrs::CPUArch::v8_M_Main, FK_FPV5_D16, ARM::AEK_HWDIVTHUMB)
Show All 38 Lines
ARM_ARCH_EXT_NAME("os", ARM::AEK_OS, nullptr, nullptr)		ARM_ARCH_EXT_NAME("os", ARM::AEK_OS, nullptr, nullptr)
ARM_ARCH_EXT_NAME("iwmmxt", ARM::AEK_IWMMXT, nullptr, nullptr)		ARM_ARCH_EXT_NAME("iwmmxt", ARM::AEK_IWMMXT, nullptr, nullptr)
ARM_ARCH_EXT_NAME("iwmmxt2", ARM::AEK_IWMMXT2, nullptr, nullptr)		ARM_ARCH_EXT_NAME("iwmmxt2", ARM::AEK_IWMMXT2, nullptr, nullptr)
ARM_ARCH_EXT_NAME("maverick", ARM::AEK_MAVERICK, nullptr, nullptr)		ARM_ARCH_EXT_NAME("maverick", ARM::AEK_MAVERICK, nullptr, nullptr)
ARM_ARCH_EXT_NAME("xscale", ARM::AEK_XSCALE, nullptr, nullptr)		ARM_ARCH_EXT_NAME("xscale", ARM::AEK_XSCALE, nullptr, nullptr)
ARM_ARCH_EXT_NAME("fp16fml", ARM::AEK_FP16FML, "+fp16fml", "-fp16fml")		ARM_ARCH_EXT_NAME("fp16fml", ARM::AEK_FP16FML, "+fp16fml", "-fp16fml")
ARM_ARCH_EXT_NAME("bf16", ARM::AEK_BF16, "+bf16", "-bf16")		ARM_ARCH_EXT_NAME("bf16", ARM::AEK_BF16, "+bf16", "-bf16")
ARM_ARCH_EXT_NAME("sb", ARM::AEK_SB, "+sb", "-sb")		ARM_ARCH_EXT_NAME("sb", ARM::AEK_SB, "+sb", "-sb")
		ARM_ARCH_EXT_NAME("i8mm", ARM::AEK_I8MM, "+i8mm", "-i8mm")
ARM_ARCH_EXT_NAME("lob", ARM::AEK_LOB, "+lob", "-lob")		ARM_ARCH_EXT_NAME("lob", ARM::AEK_LOB, "+lob", "-lob")
ARM_ARCH_EXT_NAME("cdecp0", ARM::AEK_CDECP0, "+cdecp0", "-cdecp0")		ARM_ARCH_EXT_NAME("cdecp0", ARM::AEK_CDECP0, "+cdecp0", "-cdecp0")
ARM_ARCH_EXT_NAME("cdecp1", ARM::AEK_CDECP1, "+cdecp1", "-cdecp1")		ARM_ARCH_EXT_NAME("cdecp1", ARM::AEK_CDECP1, "+cdecp1", "-cdecp1")
ARM_ARCH_EXT_NAME("cdecp2", ARM::AEK_CDECP2, "+cdecp2", "-cdecp2")		ARM_ARCH_EXT_NAME("cdecp2", ARM::AEK_CDECP2, "+cdecp2", "-cdecp2")
ARM_ARCH_EXT_NAME("cdecp3", ARM::AEK_CDECP3, "+cdecp3", "-cdecp3")		ARM_ARCH_EXT_NAME("cdecp3", ARM::AEK_CDECP3, "+cdecp3", "-cdecp3")
ARM_ARCH_EXT_NAME("cdecp4", ARM::AEK_CDECP4, "+cdecp4", "-cdecp4")		ARM_ARCH_EXT_NAME("cdecp4", ARM::AEK_CDECP4, "+cdecp4", "-cdecp4")
ARM_ARCH_EXT_NAME("cdecp5", ARM::AEK_CDECP5, "+cdecp5", "-cdecp5")		ARM_ARCH_EXT_NAME("cdecp5", ARM::AEK_CDECP5, "+cdecp5", "-cdecp5")
ARM_ARCH_EXT_NAME("cdecp6", ARM::AEK_CDECP6, "+cdecp6", "-cdecp6")		ARM_ARCH_EXT_NAME("cdecp6", ARM::AEK_CDECP6, "+cdecp6", "-cdecp6")
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64.td

Show First 20 Lines • Show All 367 Lines • ▼ Show 20 Lines
def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",		def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
"AllowTaggedGlobals",		"AllowTaggedGlobals",
"true", "Use an instruction sequence for taking the address of a global "		"true", "Use an instruction sequence for taking the address of a global "
"that allows a memory tag in the upper address bits">;		"that allows a memory tag in the upper address bits">;

def FeatureBF16 : SubtargetFeature<"bf16", "HasBF16",		def FeatureBF16 : SubtargetFeature<"bf16", "HasBF16",
"true", "Enable BFloat16 Extension" >;		"true", "Enable BFloat16 Extension" >;

		def FeatureMatMulInt8 : SubtargetFeature<"i8mm", "HasMatMulInt8",
		"true", "Enable Matrix Multiply Int8 Extension">;

		def FeatureMatMulFP32 : SubtargetFeature<"f32mm", "HasMatMulFP32",
		"true", "Enable Matrix Multiply FP32 Extension", [FeatureSVE]>;

		def FeatureMatMulFP64 : SubtargetFeature<"f64mm", "HasMatMulFP64",
		"true", "Enable Matrix Multiply FP64 Extension", [FeatureSVE]>;

def FeatureFineGrainedTraps : SubtargetFeature<"fgt", "HasFineGrainedTraps",		def FeatureFineGrainedTraps : SubtargetFeature<"fgt", "HasFineGrainedTraps",
"true", "Enable fine grained virtualization traps extension">;		"true", "Enable fine grained virtualization traps extension">;

def FeatureEnhancedCounterVirtualization :		def FeatureEnhancedCounterVirtualization :
SubtargetFeature<"ecv", "HasEnhancedCounterVirtualization",		SubtargetFeature<"ecv", "HasEnhancedCounterVirtualization",
"true", "Enable enhanced counter virtualization extension">;		"true", "Enable enhanced counter virtualization extension">;


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Architectures.		// Architectures.
//		//

def HasV8_1aOps : SubtargetFeature<"v8.1a", "HasV8_1aOps", "true",		def HasV8_1aOps : SubtargetFeature<"v8.1a", "HasV8_1aOps", "true",
"Support ARM v8.1a instructions", [FeatureCRC, FeatureLSE, FeatureRDM,		"Support ARM v8.1a instructions", [FeatureCRC, FeatureLSE, FeatureRDM,
FeaturePAN, FeatureLOR, FeatureVH]>;		FeaturePAN, FeatureLOR, FeatureVH]>;

Show All 16 Lines	def HasV8_5aOps : SubtargetFeature<
[HasV8_4aOps, FeatureAltFPCmp, FeatureFRInt3264, FeatureSpecRestrict,		[HasV8_4aOps, FeatureAltFPCmp, FeatureFRInt3264, FeatureSpecRestrict,
FeatureSSBS, FeatureSB, FeaturePredRes, FeatureCacheDeepPersist,		FeatureSSBS, FeatureSB, FeaturePredRes, FeatureCacheDeepPersist,
FeatureBranchTargetId]>;		FeatureBranchTargetId]>;

def HasV8_6aOps : SubtargetFeature<		def HasV8_6aOps : SubtargetFeature<
"v8.6a", "HasV8_6aOps", "true", "Support ARM v8.6a instructions",		"v8.6a", "HasV8_6aOps", "true", "Support ARM v8.6a instructions",

[HasV8_5aOps, FeatureAMVS, FeatureBF16, FeatureFineGrainedTraps,		[HasV8_5aOps, FeatureAMVS, FeatureBF16, FeatureFineGrainedTraps,
FeatureEnhancedCounterVirtualization]>;		FeatureEnhancedCounterVirtualization, FeatureMatMulInt8]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Register File Description		// Register File Description
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "AArch64RegisterInfo.td"		include "AArch64RegisterInfo.td"
include "AArch64RegisterBanks.td"		include "AArch64RegisterBanks.td"
include "AArch64CallingConvention.td"		include "AArch64CallingConvention.td"
▲ Show 20 Lines • Show All 570 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 492 Lines • ▼ Show 20 Lines

// simm4sN predicate - True if the immediate is a multiple of N in the range		// simm4sN predicate - True if the immediate is a multiple of N in the range
// [ -8* N, 7 * N].		// [ -8* N, 7 * N].
def SImm4s1Operand : SImmScaledMemoryIndexed<4, 1>;		def SImm4s1Operand : SImmScaledMemoryIndexed<4, 1>;
def SImm4s2Operand : SImmScaledMemoryIndexed<4, 2>;		def SImm4s2Operand : SImmScaledMemoryIndexed<4, 2>;
def SImm4s3Operand : SImmScaledMemoryIndexed<4, 3>;		def SImm4s3Operand : SImmScaledMemoryIndexed<4, 3>;
def SImm4s4Operand : SImmScaledMemoryIndexed<4, 4>;		def SImm4s4Operand : SImmScaledMemoryIndexed<4, 4>;
def SImm4s16Operand : SImmScaledMemoryIndexed<4, 16>;		def SImm4s16Operand : SImmScaledMemoryIndexed<4, 16>;
		def SImm4s32Operand : SImmScaledMemoryIndexed<4, 32>;

def simm4s1 : Operand<i64>, ImmLeaf<i64,		def simm4s1 : Operand<i64>, ImmLeaf<i64,
[{ return Imm >=-8 && Imm <= 7; }]> {		[{ return Imm >=-8 && Imm <= 7; }]> {
let ParserMatchClass = SImm4s1Operand;		let ParserMatchClass = SImm4s1Operand;
let DecoderMethod = "DecodeSImm<4>";		let DecoderMethod = "DecodeSImm<4>";
}		}

def simm4s2 : Operand<i64>, ImmLeaf<i64,		def simm4s2 : Operand<i64>, ImmLeaf<i64,
Show All 17 Lines	[{ return Imm >=-32 && Imm <= 28 && (Imm % 4) == 0x0; }]> {
let DecoderMethod = "DecodeSImm<4>";		let DecoderMethod = "DecodeSImm<4>";
}		}
def simm4s16 : Operand<i64>, ImmLeaf<i64,		def simm4s16 : Operand<i64>, ImmLeaf<i64,
[{ return Imm >=-128 && Imm <= 112 && (Imm % 16) == 0x0; }]> {		[{ return Imm >=-128 && Imm <= 112 && (Imm % 16) == 0x0; }]> {
let PrintMethod = "printImmScale<16>";		let PrintMethod = "printImmScale<16>";
let ParserMatchClass = SImm4s16Operand;		let ParserMatchClass = SImm4s16Operand;
let DecoderMethod = "DecodeSImm<4>";		let DecoderMethod = "DecodeSImm<4>";
}		}
		def simm4s32 : Operand<i64>, ImmLeaf<i64,
		[{ return Imm >=-256 && Imm <= 224 && (Imm % 32) == 0x0; }]> {
		let PrintMethod = "printImmScale<32>";
		let ParserMatchClass = SImm4s32Operand;
		let DecoderMethod = "DecodeSImm<4>";
		}

def Imm1_8Operand : AsmImmRange<1, 8>;		def Imm1_8Operand : AsmImmRange<1, 8>;
def Imm1_16Operand : AsmImmRange<1, 16>;		def Imm1_16Operand : AsmImmRange<1, 16>;
def Imm1_32Operand : AsmImmRange<1, 32>;		def Imm1_32Operand : AsmImmRange<1, 32>;
def Imm1_64Operand : AsmImmRange<1, 64>;		def Imm1_64Operand : AsmImmRange<1, 64>;

class BranchTarget<int N> : AsmOperandClass {		class BranchTarget<int N> : AsmOperandClass {
let Name = "BranchTarget" # N;		let Name = "BranchTarget" # N;
▲ Show 20 Lines • Show All 4,990 Lines • ▼ Show 20 Lines	multiclass SIMDLogicalThreeVectorTied<bit U, bits<2> size,
def : Pat<(v2i64 (OpNode (v2i64 V128:$LHS), (v2i64 V128:$MHS),		def : Pat<(v2i64 (OpNode (v2i64 V128:$LHS), (v2i64 V128:$MHS),
(v2i64 V128:$RHS))),		(v2i64 V128:$RHS))),
(!cast<Instruction>(NAME#"v16i8")		(!cast<Instruction>(NAME#"v16i8")
V128:$LHS, V128:$MHS, V128:$RHS)>;		V128:$LHS, V128:$MHS, V128:$RHS)>;
}		}

// ARMv8.2-A Dot Product Instructions (Vector): These instructions extract		// ARMv8.2-A Dot Product Instructions (Vector): These instructions extract
// bytes from S-sized elements.		// bytes from S-sized elements.
class BaseSIMDThreeSameVectorDot<bit Q, bit U, string asm, string kind1,		class BaseSIMDThreeSameVectorDot<bit Q, bit U, bit Mixed, string asm, string kind1,
string kind2, RegisterOperand RegType,		string kind2, RegisterOperand RegType,
ValueType AccumType, ValueType InputType,		ValueType AccumType, ValueType InputType,
SDPatternOperator OpNode> :		SDPatternOperator OpNode> :
BaseSIMDThreeSameVectorTied<Q, U, 0b100, 0b10010, RegType, asm, kind1,		BaseSIMDThreeSameVectorTied<Q, U, 0b100, {0b1001, Mixed}, RegType, asm, kind1,
[(set (AccumType RegType:$dst),		[(set (AccumType RegType:$dst),
(OpNode (AccumType RegType:$Rd),		(OpNode (AccumType RegType:$Rd),
(InputType RegType:$Rn),		(InputType RegType:$Rn),
(InputType RegType:$Rm)))]> {		(InputType RegType:$Rm)))]> {
let AsmString = !strconcat(asm, "{\t$Rd" # kind1 # ", $Rn" # kind2 # ", $Rm" # kind2 # "}");		let AsmString = !strconcat(asm, "{\t$Rd" # kind1 # ", $Rn" # kind2 # ", $Rm" # kind2 # "}");
}		}

multiclass SIMDThreeSameVectorDot<bit U, string asm, SDPatternOperator OpNode> {		multiclass SIMDThreeSameVectorDot<bit U, bit Mixed, string asm, SDPatternOperator OpNode> {
def v8i8 : BaseSIMDThreeSameVectorDot<0, U, asm, ".2s", ".8b", V64,		def v8i8 : BaseSIMDThreeSameVectorDot<0, U, Mixed, asm, ".2s", ".8b", V64,
v2i32, v8i8, OpNode>;		v2i32, v8i8, OpNode>;
def v16i8 : BaseSIMDThreeSameVectorDot<1, U, asm, ".4s", ".16b", V128,		def v16i8 : BaseSIMDThreeSameVectorDot<1, U, Mixed, asm, ".4s", ".16b", V128,
v4i32, v16i8, OpNode>;		v4i32, v16i8, OpNode>;
}		}

// ARMv8.2-A Fused Multiply Add-Long Instructions (Vector): These instructions		// ARMv8.2-A Fused Multiply Add-Long Instructions (Vector): These instructions
// select inputs from 4H vectors and accumulate outputs to a 2S vector (or from		// select inputs from 4H vectors and accumulate outputs to a 2S vector (or from
// 8H to 4S, when Q=1).		// 8H to 4S, when Q=1).
class BaseSIMDThreeSameVectorFML<bit Q, bit U, bit b13, bits<3> size, string asm, string kind1,		class BaseSIMDThreeSameVectorFML<bit Q, bit U, bit b13, bits<3> size, string asm, string kind1,
string kind2, RegisterOperand RegType,		string kind2, RegisterOperand RegType,
▲ Show 20 Lines • Show All 2,321 Lines • ▼ Show 20 Lines	class BF16ToSinglePrecision<string asm>
bits<5> Rd;		bits<5> Rd;
bits<5> Rn;		bits<5> Rn;
let Inst{31-10} = 0b0001111001100011010000;		let Inst{31-10} = 0b0001111001100011010000;
let Inst{9-5} = Rn;		let Inst{9-5} = Rn;
let Inst{4-0} = Rd;		let Inst{4-0} = Rd;
}		}
} // End of let mayStore = 0, mayLoad = 0, hasSideEffects = 0		} // End of let mayStore = 0, mayLoad = 0, hasSideEffects = 0

		//----------------------------------------------------------------------------
		// Armv8.6 Matrix Multiply Extension
		//----------------------------------------------------------------------------

		class SIMDThreeSameVectorMatMul<bit B, bit U, string asm, SDPatternOperator OpNode>
		: BaseSIMDThreeSameVectorTied<1, U, 0b100, {0b1010, B}, V128, asm, ".4s",
		[(set (v4i32 V128:$dst), (OpNode (v4i32 V128:$Rd),
		(v16i8 V128:$Rn),
		(v16i8 V128:$Rm)))]> {
		let AsmString = asm # "{\t$Rd.4s, $Rn.16b, $Rm.16b}";
		}

		//----------------------------------------------------------------------------
// ARMv8.2-A Dot Product Instructions (Indexed)		// ARMv8.2-A Dot Product Instructions (Indexed)
class BaseSIMDThreeSameVectorDotIndex<bit Q, bit U, string asm, string dst_kind,		class BaseSIMDThreeSameVectorDotIndex<bit Q, bit U, bit Mixed, bits<2> size, string asm,
string lhs_kind, string rhs_kind,		string dst_kind, string lhs_kind, string rhs_kind,
RegisterOperand RegType,		RegisterOperand RegType,
ValueType AccumType, ValueType InputType,		ValueType AccumType, ValueType InputType,
SDPatternOperator OpNode> :		SDPatternOperator OpNode> :
BaseSIMDIndexedTied<Q, U, 0b0, 0b10, 0b1110, RegType, RegType, V128,		BaseSIMDIndexedTied<Q, U, 0b0, size, {0b111, Mixed}, RegType, RegType, V128,
VectorIndexS, asm, "", dst_kind, lhs_kind, rhs_kind,		VectorIndexS, asm, "", dst_kind, lhs_kind, rhs_kind,
[(set (AccumType RegType:$dst),		[(set (AccumType RegType:$dst),
(AccumType (OpNode (AccumType RegType:$Rd),		(AccumType (OpNode (AccumType RegType:$Rd),
(InputType RegType:$Rn),		(InputType RegType:$Rn),
(InputType (bitconvert (AccumType		(InputType (bitconvert (AccumType
(AArch64duplane32 (v4i32 V128:$Rm),		(AArch64duplane32 (v4i32 V128:$Rm),
VectorIndexS:$idx)))))))]> {		VectorIndexS:$idx)))))))]> {
bits<2> idx;		bits<2> idx;
let Inst{21} = idx{0}; // L		let Inst{21} = idx{0}; // L
let Inst{11} = idx{1}; // H		let Inst{11} = idx{1}; // H
}		}

multiclass SIMDThreeSameVectorDotIndex<bit U, string asm,		multiclass SIMDThreeSameVectorDotIndex<bit U, bit Mixed, bits<2> size, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
def v8i8 : BaseSIMDThreeSameVectorDotIndex<0, U, asm, ".2s", ".8b", ".4b",		def v8i8 : BaseSIMDThreeSameVectorDotIndex<0, U, Mixed, size, asm, ".2s", ".8b", ".4b",
V64, v2i32, v8i8, OpNode>;		V64, v2i32, v8i8, OpNode>;
def v16i8 : BaseSIMDThreeSameVectorDotIndex<1, U, asm, ".4s", ".16b", ".4b",		def v16i8 : BaseSIMDThreeSameVectorDotIndex<1, U, Mixed, size, asm, ".4s", ".16b", ".4b",
V128, v4i32, v16i8, OpNode>;		V128, v4i32, v16i8, OpNode>;
}		}

// ARMv8.2-A Fused Multiply Add-Long Instructions (Indexed)		// ARMv8.2-A Fused Multiply Add-Long Instructions (Indexed)
class BaseSIMDThreeSameVectorFMLIndex<bit Q, bit U, bits<4> opc, string asm,		class BaseSIMDThreeSameVectorFMLIndex<bit Q, bit U, bits<4> opc, string asm,
string dst_kind, string lhs_kind,		string dst_kind, string lhs_kind,
string rhs_kind, RegisterOperand RegType,		string rhs_kind, RegisterOperand RegType,
ValueType AccumType, ValueType InputType,		ValueType AccumType, ValueType InputType,
▲ Show 20 Lines • Show All 3,228 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
def HasTME : Predicate<"Subtarget->hasTME()">,		def HasTME : Predicate<"Subtarget->hasTME()">,
AssemblerPredicate<(all_of FeatureTME), "tme">;		AssemblerPredicate<(all_of FeatureTME), "tme">;
def HasETE : Predicate<"Subtarget->hasETE()">,		def HasETE : Predicate<"Subtarget->hasETE()">,
AssemblerPredicate<(all_of FeatureETE), "ete">;		AssemblerPredicate<(all_of FeatureETE), "ete">;
def HasTRBE : Predicate<"Subtarget->hasTRBE()">,		def HasTRBE : Predicate<"Subtarget->hasTRBE()">,
AssemblerPredicate<(all_of FeatureTRBE), "trbe">;		AssemblerPredicate<(all_of FeatureTRBE), "trbe">;
def HasBF16 : Predicate<"Subtarget->hasBF16()">,		def HasBF16 : Predicate<"Subtarget->hasBF16()">,
AssemblerPredicate<(all_of FeatureBF16), "bf16">;		AssemblerPredicate<(all_of FeatureBF16), "bf16">;
		def HasMatMulInt8 : Predicate<"Subtarget->hasMatMulInt8()">,
		AssemblerPredicate<(all_of FeatureMatMulInt8), "i8mm">;
		def HasMatMulFP32 : Predicate<"Subtarget->hasMatMulFP32()">,
		AssemblerPredicate<(all_of FeatureMatMulFP32), "f32mm">;
		def HasMatMulFP64 : Predicate<"Subtarget->hasMatMulFP64()">,
		AssemblerPredicate<(all_of FeatureMatMulFP64), "f64mm">;
def IsLE : Predicate<"Subtarget->isLittleEndian()">;		def IsLE : Predicate<"Subtarget->isLittleEndian()">;
def IsBE : Predicate<"!Subtarget->isLittleEndian()">;		def IsBE : Predicate<"!Subtarget->isLittleEndian()">;
def IsWindows : Predicate<"Subtarget->isTargetWindows()">;		def IsWindows : Predicate<"Subtarget->isTargetWindows()">;
def UseAlternateSExtLoadCVTF32		def UseAlternateSExtLoadCVTF32
: Predicate<"Subtarget->useAlternateSExtLoadCVTF32Pattern()">;		: Predicate<"Subtarget->useAlternateSExtLoadCVTF32Pattern()">;

def UseNegativeImmediates		def UseNegativeImmediates
: Predicate<"false">, AssemblerPredicate<(all_of (not FeatureNoNegativeImmediates)),		: Predicate<"false">, AssemblerPredicate<(all_of (not FeatureNoNegativeImmediates)),
▲ Show 20 Lines • Show All 583 Lines • ▼ Show 20 Lines	def TSB : CRmSystemI<barrier_op, 0b010, "tsb", []> {
let CRm = 0b0010;		let CRm = 0b0010;
let Inst{12} = 0;		let Inst{12} = 0;
let Predicates = [HasTRACEV8_4];		let Predicates = [HasTRACEV8_4];
}		}
}		}

// ARMv8.2-A Dot Product		// ARMv8.2-A Dot Product
let Predicates = [HasDotProd] in {		let Predicates = [HasDotProd] in {
defm SDOT : SIMDThreeSameVectorDot<0, "sdot", int_aarch64_neon_sdot>;		defm SDOT : SIMDThreeSameVectorDot<0, 0, "sdot", int_aarch64_neon_sdot>;
defm UDOT : SIMDThreeSameVectorDot<1, "udot", int_aarch64_neon_udot>;		defm UDOT : SIMDThreeSameVectorDot<1, 0, "udot", int_aarch64_neon_udot>;
defm SDOTlane : SIMDThreeSameVectorDotIndex<0, "sdot", int_aarch64_neon_sdot>;		defm SDOTlane : SIMDThreeSameVectorDotIndex<0, 0, 0b10, "sdot", int_aarch64_neon_sdot>;
defm UDOTlane : SIMDThreeSameVectorDotIndex<1, "udot", int_aarch64_neon_udot>;		defm UDOTlane : SIMDThreeSameVectorDotIndex<1, 0, 0b10, "udot", int_aarch64_neon_udot>;
}		}

// ARMv8.6-A BFloat		// ARMv8.6-A BFloat
let Predicates = [HasBF16] in {		let Predicates = [HasBF16] in {
defm BFDOT : SIMDThreeSameVectorBFDot<1, "bfdot">;		defm BFDOT : SIMDThreeSameVectorBFDot<1, "bfdot">;
defm BF16DOTlane : SIMDThreeSameVectorBF16DotI<0, "bfdot">;		defm BF16DOTlane : SIMDThreeSameVectorBF16DotI<0, "bfdot">;
def BFMMLA : SIMDThreeSameVectorBF16MatrixMul<"bfmmla">;		def BFMMLA : SIMDThreeSameVectorBF16MatrixMul<"bfmmla">;
def BFMLALB : SIMDBF16MLAL<0, "bfmlalb">;		def BFMLALB : SIMDBF16MLAL<0, "bfmlalb">;
def BFMLALT : SIMDBF16MLAL<1, "bfmlalt">;		def BFMLALT : SIMDBF16MLAL<1, "bfmlalt">;
def BFMLALBIdx : SIMDBF16MLALIndex<0, "bfmlalb">;		def BFMLALBIdx : SIMDBF16MLALIndex<0, "bfmlalb">;
def BFMLALTIdx : SIMDBF16MLALIndex<1, "bfmlalt">;		def BFMLALTIdx : SIMDBF16MLALIndex<1, "bfmlalt">;
def BFCVTN : SIMD_BFCVTN;		def BFCVTN : SIMD_BFCVTN;
def BFCVTN2 : SIMD_BFCVTN2;		def BFCVTN2 : SIMD_BFCVTN2;
def BFCVT : BF16ToSinglePrecision<"bfcvt">;		def BFCVT : BF16ToSinglePrecision<"bfcvt">;
}		}

		// ARMv8.6A AArch64 matrix multiplication
		let Predicates = [HasMatMulInt8] in {
		def SMMLA : SIMDThreeSameVectorMatMul<0, 0, "smmla", int_aarch64_neon_smmla>;
		def UMMLA : SIMDThreeSameVectorMatMul<0, 1, "ummla", int_aarch64_neon_ummla>;
		def USMMLA : SIMDThreeSameVectorMatMul<1, 0, "usmmla", int_aarch64_neon_usmmla>;
		defm USDOT : SIMDThreeSameVectorDot<0, 1, "usdot", int_aarch64_neon_usdot>;
		defm USDOTlane : SIMDThreeSameVectorDotIndex<0, 1, 0b10, "usdot", int_aarch64_neon_usdot>;

		// sudot lane has a pattern where usdot is expected (there is no sudot).
		// The second operand is used in the dup operation to repeat the indexed
		// element.
		class BaseSIMDSUDOTIndex<bit Q, string dst_kind, string lhs_kind,
		string rhs_kind, RegisterOperand RegType,
		ValueType AccumType, ValueType InputType>
		: BaseSIMDThreeSameVectorDotIndex<Q, 0, 1, 0b00, "sudot", dst_kind,
		lhs_kind, rhs_kind, RegType, AccumType,
		InputType, null_frag> {
		let Pattern = [(set (AccumType RegType:$dst),
		(AccumType (int_aarch64_neon_usdot (AccumType RegType:$Rd),
		(InputType (bitconvert (AccumType
		(AArch64duplane32 (v4i32 V128:$Rm),
		VectorIndexS:$idx)))),
		(InputType RegType:$Rn))))];
		}

		multiclass SIMDSUDOTIndex {
		def v8i8 : BaseSIMDSUDOTIndex<0, ".2s", ".8b", ".4b", V64, v2i32, v8i8>;
		def v16i8 : BaseSIMDSUDOTIndex<1, ".4s", ".16b", ".4b", V128, v4i32, v16i8>;
		}

		defm SUDOTlane : SIMDSUDOTIndex;

		}

// ARMv8.2-A FP16 Fused Multiply-Add Long		// ARMv8.2-A FP16 Fused Multiply-Add Long
let Predicates = [HasNEON, HasFP16FML] in {		let Predicates = [HasNEON, HasFP16FML] in {
defm FMLAL : SIMDThreeSameVectorFML<0, 1, 0b001, "fmlal", int_aarch64_neon_fmlal>;		defm FMLAL : SIMDThreeSameVectorFML<0, 1, 0b001, "fmlal", int_aarch64_neon_fmlal>;
defm FMLSL : SIMDThreeSameVectorFML<0, 1, 0b101, "fmlsl", int_aarch64_neon_fmlsl>;		defm FMLSL : SIMDThreeSameVectorFML<0, 1, 0b101, "fmlsl", int_aarch64_neon_fmlsl>;
defm FMLAL2 : SIMDThreeSameVectorFML<1, 0, 0b001, "fmlal2", int_aarch64_neon_fmlal2>;		defm FMLAL2 : SIMDThreeSameVectorFML<1, 0, 0b001, "fmlal2", int_aarch64_neon_fmlal2>;
defm FMLSL2 : SIMDThreeSameVectorFML<1, 0, 0b101, "fmlsl2", int_aarch64_neon_fmlsl2>;		defm FMLSL2 : SIMDThreeSameVectorFML<1, 0, 0b101, "fmlsl2", int_aarch64_neon_fmlsl2>;
defm FMLALlane : SIMDThreeSameVectorFMLIndex<0, 0b0000, "fmlal", int_aarch64_neon_fmlal>;		defm FMLALlane : SIMDThreeSameVectorFMLIndex<0, 0b0000, "fmlal", int_aarch64_neon_fmlal>;
defm FMLSLlane : SIMDThreeSameVectorFMLIndex<0, 0b0100, "fmlsl", int_aarch64_neon_fmlsl>;		defm FMLSLlane : SIMDThreeSameVectorFMLIndex<0, 0b0100, "fmlsl", int_aarch64_neon_fmlsl>;
▲ Show 20 Lines • Show All 6,592 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 1,628 Lines • ▼ Show 20 Lines	multiclass sve_prefetch<SDPatternOperator prefetch, ValueType PredTy, Instruction RegImmInst, Instruction RegRegInst, int scale, ComplexPattern AddrCP> {
defm : ldff1<LDFF1SB_H, nxv8i16, AArch64ldff1s, nxv8i1, nxv8i8, am_sve_regreg_lsl0>;		defm : ldff1<LDFF1SB_H, nxv8i16, AArch64ldff1s, nxv8i1, nxv8i8, am_sve_regreg_lsl0>;
defm : ldff1<LDFF1H, nxv8i16, AArch64ldff1, nxv8i1, nxv8i16, am_sve_regreg_lsl1>;		defm : ldff1<LDFF1H, nxv8i16, AArch64ldff1, nxv8i1, nxv8i16, am_sve_regreg_lsl1>;
defm : ldff1<LDFF1H, nxv8f16, AArch64ldff1, nxv8i1, nxv8f16, am_sve_regreg_lsl1>;		defm : ldff1<LDFF1H, nxv8f16, AArch64ldff1, nxv8i1, nxv8f16, am_sve_regreg_lsl1>;

// 16-element contiguous first faulting loads		// 16-element contiguous first faulting loads
defm : ldff1<LDFF1B, nxv16i8, AArch64ldff1, nxv16i1, nxv16i8, am_sve_regreg_lsl0>;		defm : ldff1<LDFF1B, nxv16i8, AArch64ldff1, nxv16i1, nxv16i8, am_sve_regreg_lsl0>;
}		}

		let Predicates = [HasSVE, HasMatMulInt8] in {
		def SMMLA_ZZZ : sve_int_matmul<0b00, "smmla">;
		def UMMLA_ZZZ : sve_int_matmul<0b11, "ummla">;
		def USMMLA_ZZZ : sve_int_matmul<0b10, "usmmla">;
		def USDOT_ZZZ : sve_int_dot_mixed<"usdot">;
		def USDOT_ZZZI : sve_int_dot_mixed_indexed<0, "usdot">;
		def SUDOT_ZZZI : sve_int_dot_mixed_indexed<1, "sudot">;
		}

		let Predicates = [HasSVE, HasMatMulFP32] in {
		def FMMLA_ZZZ_S : sve_fp_matrix_mla<0, "fmmla", ZPR32>;
		}

		let Predicates = [HasSVE, HasMatMulFP64] in {
		def FMMLA_ZZZ_D : sve_fp_matrix_mla<1, "fmmla", ZPR64>;
		defm LD1RO_B_IMM : sve_mem_ldor_si<0b00, "ld1rob", Z_b, ZPR8>;
		defm LD1RO_H_IMM : sve_mem_ldor_si<0b01, "ld1roh", Z_h, ZPR16>;
		defm LD1RO_W_IMM : sve_mem_ldor_si<0b10, "ld1row", Z_s, ZPR32>;
		defm LD1RO_D_IMM : sve_mem_ldor_si<0b11, "ld1rod", Z_d, ZPR64>;
		defm LD1RO_B : sve_mem_ldor_ss<0b00, "ld1rob", Z_b, ZPR8, GPR64NoXZRshifted8>;
		defm LD1RO_H : sve_mem_ldor_ss<0b01, "ld1roh", Z_h, ZPR16, GPR64NoXZRshifted16>;
		defm LD1RO_W : sve_mem_ldor_ss<0b10, "ld1row", Z_s, ZPR32, GPR64NoXZRshifted32>;
		defm LD1RO_D : sve_mem_ldor_ss<0b11, "ld1rod", Z_d, ZPR64, GPR64NoXZRshifted64>;
		def ZIP1_ZZZ_128 : sve_int_perm_bin_perm_128_zz<0b00, 0, "zip1">;
		def ZIP2_ZZZ_128 : sve_int_perm_bin_perm_128_zz<0b00, 1, "zip2">;
		def UZP1_ZZZ_128 : sve_int_perm_bin_perm_128_zz<0b01, 0, "uzp1">;
		def UZP2_ZZZ_128 : sve_int_perm_bin_perm_128_zz<0b01, 1, "uzp2">;
		def TRN1_ZZZ_128 : sve_int_perm_bin_perm_128_zz<0b11, 0, "trn1">;
		def TRN2_ZZZ_128 : sve_int_perm_bin_perm_128_zz<0b11, 1, "trn2">;
		}

let Predicates = [HasSVE2] in {		let Predicates = [HasSVE2] in {
// SVE2 integer multiply-add (indexed)		// SVE2 integer multiply-add (indexed)
defm MLA_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b0, "mla", int_aarch64_sve_mla_lane>;		defm MLA_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b0, "mla", int_aarch64_sve_mla_lane>;
defm MLS_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b1, "mls", int_aarch64_sve_mls_lane>;		defm MLS_ZZZI : sve2_int_mla_by_indexed_elem<0b01, 0b1, "mls", int_aarch64_sve_mls_lane>;

// SVE2 saturating multiply-add high (indexed)		// SVE2 saturating multiply-add high (indexed)
defm SQRDMLAH_ZZZI : sve2_int_mla_by_indexed_elem<0b10, 0b0, "sqrdmlah", int_aarch64_sve_sqrdmlah_lane>;		defm SQRDMLAH_ZZZI : sve2_int_mla_by_indexed_elem<0b10, 0b0, "sqrdmlah", int_aarch64_sve_sqrdmlah_lane>;
defm SQRDMLSH_ZZZI : sve2_int_mla_by_indexed_elem<0b10, 0b1, "sqrdmlsh", int_aarch64_sve_sqrdmlsh_lane>;		defm SQRDMLSH_ZZZI : sve2_int_mla_by_indexed_elem<0b10, 0b1, "sqrdmlsh", int_aarch64_sve_sqrdmlsh_lane>;
▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	protected:
bool HasCCDP = false;		bool HasCCDP = false;
bool HasBTI = false;		bool HasBTI = false;
bool HasRandGen = false;		bool HasRandGen = false;
bool HasMTE = false;		bool HasMTE = false;
bool HasTME = false;		bool HasTME = false;

// Armv8.6-A Extensions		// Armv8.6-A Extensions
bool HasBF16 = false;		bool HasBF16 = false;
		bool HasMatMulInt8 = false;
		bool HasMatMulFP32 = false;
		bool HasMatMulFP64 = false;
bool HasAMVS = false;		bool HasAMVS = false;
bool HasFineGrainedTraps = false;		bool HasFineGrainedTraps = false;
bool HasEnhancedCounterVirtualization = false;		bool HasEnhancedCounterVirtualization = false;

// Arm SVE2 extensions		// Arm SVE2 extensions
bool HasSVE2AES = false;		bool HasSVE2AES = false;
bool HasSVE2SM4 = false;		bool HasSVE2SM4 = false;
bool HasSVE2SHA3 = false;		bool HasSVE2SHA3 = false;
▲ Show 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	public:
bool hasRandGen() const { return HasRandGen; }		bool hasRandGen() const { return HasRandGen; }
bool hasMTE() const { return HasMTE; }		bool hasMTE() const { return HasMTE; }
bool hasTME() const { return HasTME; }		bool hasTME() const { return HasTME; }
// Arm SVE2 extensions		// Arm SVE2 extensions
bool hasSVE2AES() const { return HasSVE2AES; }		bool hasSVE2AES() const { return HasSVE2AES; }
bool hasSVE2SM4() const { return HasSVE2SM4; }		bool hasSVE2SM4() const { return HasSVE2SM4; }
bool hasSVE2SHA3() const { return HasSVE2SHA3; }		bool hasSVE2SHA3() const { return HasSVE2SHA3; }
bool hasSVE2BitPerm() const { return HasSVE2BitPerm; }		bool hasSVE2BitPerm() const { return HasSVE2BitPerm; }
		bool hasMatMulInt8() const { return HasMatMulInt8; }
		bool hasMatMulFP32() const { return HasMatMulFP32; }
		bool hasMatMulFP64() const { return HasMatMulFP64; }

// Armv8.6-A Extensions		// Armv8.6-A Extensions
bool hasBF16() const { return HasBF16; }		bool hasBF16() const { return HasBF16; }
bool hasFineGrainedTraps() const { return HasFineGrainedTraps; }		bool hasFineGrainedTraps() const { return HasFineGrainedTraps; }
bool hasEnhancedCounterVirtualization() const {		bool hasEnhancedCounterVirtualization() const {
return HasEnhancedCounterVirtualization;		return HasEnhancedCounterVirtualization;
}		}

▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp

Show First 20 Lines • Show All 4,249 Lines • ▼ Show 20 Lines	bool AArch64AsmParser::showMatchError(SMLoc Loc, unsigned ErrCode,
case Match_InvalidMemoryIndexed2SImm4:		case Match_InvalidMemoryIndexed2SImm4:
return Error(Loc, "index must be a multiple of 2 in range [-16, 14].");		return Error(Loc, "index must be a multiple of 2 in range [-16, 14].");
case Match_InvalidMemoryIndexed3SImm4:		case Match_InvalidMemoryIndexed3SImm4:
return Error(Loc, "index must be a multiple of 3 in range [-24, 21].");		return Error(Loc, "index must be a multiple of 3 in range [-24, 21].");
case Match_InvalidMemoryIndexed4SImm4:		case Match_InvalidMemoryIndexed4SImm4:
return Error(Loc, "index must be a multiple of 4 in range [-32, 28].");		return Error(Loc, "index must be a multiple of 4 in range [-32, 28].");
case Match_InvalidMemoryIndexed16SImm4:		case Match_InvalidMemoryIndexed16SImm4:
return Error(Loc, "index must be a multiple of 16 in range [-128, 112].");		return Error(Loc, "index must be a multiple of 16 in range [-128, 112].");
		case Match_InvalidMemoryIndexed32SImm4:
		return Error(Loc, "index must be a multiple of 32 in range [-256, 224].");
case Match_InvalidMemoryIndexed1SImm6:		case Match_InvalidMemoryIndexed1SImm6:
return Error(Loc, "index must be an integer in range [-32, 31].");		return Error(Loc, "index must be an integer in range [-32, 31].");
case Match_InvalidMemoryIndexedSImm8:		case Match_InvalidMemoryIndexedSImm8:
return Error(Loc, "index must be an integer in range [-128, 127].");		return Error(Loc, "index must be an integer in range [-128, 127].");
case Match_InvalidMemoryIndexedSImm9:		case Match_InvalidMemoryIndexedSImm9:
return Error(Loc, "index must be an integer in range [-256, 255].");		return Error(Loc, "index must be an integer in range [-256, 255].");
case Match_InvalidMemoryIndexed16SImm9:		case Match_InvalidMemoryIndexed16SImm9:
return Error(Loc, "index must be a multiple of 16 in range [-4096, 4080].");		return Error(Loc, "index must be a multiple of 16 in range [-4096, 4080].");
▲ Show 20 Lines • Show All 643 Lines • ▼ Show 20 Lines	bool AArch64AsmParser::MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
case Match_InvalidMemoryXExtend64:		case Match_InvalidMemoryXExtend64:
case Match_InvalidMemoryXExtend128:		case Match_InvalidMemoryXExtend128:
case Match_InvalidMemoryIndexed1SImm4:		case Match_InvalidMemoryIndexed1SImm4:
case Match_InvalidMemoryIndexed2SImm4:		case Match_InvalidMemoryIndexed2SImm4:
case Match_InvalidMemoryIndexed3SImm4:		case Match_InvalidMemoryIndexed3SImm4:
case Match_InvalidMemoryIndexed4SImm4:		case Match_InvalidMemoryIndexed4SImm4:
case Match_InvalidMemoryIndexed1SImm6:		case Match_InvalidMemoryIndexed1SImm6:
case Match_InvalidMemoryIndexed16SImm4:		case Match_InvalidMemoryIndexed16SImm4:
		case Match_InvalidMemoryIndexed32SImm4:
case Match_InvalidMemoryIndexed4SImm7:		case Match_InvalidMemoryIndexed4SImm7:
case Match_InvalidMemoryIndexed8SImm7:		case Match_InvalidMemoryIndexed8SImm7:
case Match_InvalidMemoryIndexed16SImm7:		case Match_InvalidMemoryIndexed16SImm7:
case Match_InvalidMemoryIndexed8UImm5:		case Match_InvalidMemoryIndexed8UImm5:
case Match_InvalidMemoryIndexed4UImm5:		case Match_InvalidMemoryIndexed4UImm5:
case Match_InvalidMemoryIndexed2UImm5:		case Match_InvalidMemoryIndexed2UImm5:
case Match_InvalidMemoryIndexed1UImm6:		case Match_InvalidMemoryIndexed1UImm6:
case Match_InvalidMemoryIndexed2UImm6:		case Match_InvalidMemoryIndexed2UImm6:
▲ Show 20 Lines • Show All 859 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,522 Lines • ▼ Show 20 Lines	: I<(outs ZPR16:$Zd), (ins ZPR16:$_Zd, PPR3bAny:$Pg, ZPR32:$Zn),
let Inst{4-0} = Zd;		let Inst{4-0} = Zd;

let Constraints = "$Zd = $_Zd";		let Constraints = "$Zd = $_Zd";
let DestructiveInstType = DestructiveOther;		let DestructiveInstType = DestructiveOther;
let hasSideEffects = 1;		let hasSideEffects = 1;
let ElementSize = ElementSizeS;		let ElementSize = ElementSizeS;
}		}

		//===----------------------------------------------------------------------===//
		// SVE Integer Matrix Multiply Group
		//===----------------------------------------------------------------------===//

		class sve_int_matmul<bits<2> uns, string asm>
		: I<(outs ZPR32:$Zda), (ins ZPR32:$_Zda, ZPR8:$Zn, ZPR8:$Zm), asm,
		"\t$Zda, $Zn, $Zm", "", []>, Sched<[]> {
		bits<5> Zda;
		bits<5> Zn;
		bits<5> Zm;
		let Inst{31-24} = 0b01000101;
		let Inst{23-22} = uns;
		let Inst{21} = 0;
		let Inst{20-16} = Zm;
		let Inst{15-10} = 0b100110;
		let Inst{9-5} = Zn;
		let Inst{4-0} = Zda;

		let Constraints = "$Zda = $_Zda";
		let DestructiveInstType = DestructiveOther;
		let ElementSize = ZPR32.ElementSize;
		}

		//===----------------------------------------------------------------------===//
		// SVE Integer Dot Product Mixed Sign Group
		//===----------------------------------------------------------------------===//

		class sve_int_dot_mixed<string asm>
		: I<(outs ZPR32:$Zda), (ins ZPR32:$_Zda, ZPR8:$Zn, ZPR8:$Zm), asm,
		"\t$Zda, $Zn, $Zm", "", []>, Sched<[]> {
		bits<5> Zda;
		bits<5> Zn;
		bits<5> Zm;
		let Inst{31-21} = 0b01000100100;
		let Inst{20-16} = Zm;
		let Inst{15-10} = 0b011110;
		let Inst{9-5} = Zn;
		let Inst{4-0} = Zda;

		let Constraints = "$Zda = $_Zda";
		let DestructiveInstType = DestructiveOther;
		let ElementSize = ZPR32.ElementSize;
		}

		//===----------------------------------------------------------------------===//
		// SVE Integer Dot Product Mixed Sign - Indexed Group
		//===----------------------------------------------------------------------===//

		class sve_int_dot_mixed_indexed<bit U, string asm>
		: I<(outs ZPR32:$Zda), (ins ZPR32:$_Zda, ZPR8:$Zn, ZPR3b8:$Zm, VectorIndexS:$idx),
		asm, "\t$Zda, $Zn, $Zm$idx", "", []>, Sched<[]> {
		bits<5> Zda;
		bits<5> Zn;
		bits<3> Zm;
		bits<2> idx;
		let Inst{31-21} = 0b01000100101;
		let Inst{20-19} = idx;
		let Inst{18-16} = Zm;
		let Inst{15-11} = 0b00011;
		let Inst{10} = U;
		let Inst{9-5} = Zn;
		let Inst{4-0} = Zda;

		let Constraints = "$Zda = $_Zda";
		let DestructiveInstType = DestructiveOther;
		let ElementSize = ZPR32.ElementSize;
		}

		//===----------------------------------------------------------------------===//
		// SVE Floating Point Matrix Multiply Accumulate Group
		//===----------------------------------------------------------------------===//

		class sve_fp_matrix_mla<bit sz, string asm, ZPRRegOp zprty>
		: I<(outs zprty:$Zda), (ins zprty:$_Zda, zprty:$Zn, zprty:$Zm),
		asm, "\t$Zda, $Zn, $Zm", "", []>, Sched<[]> {
		bits<5> Zda;
		bits<5> Zn;
		bits<5> Zm;
		let Inst{31-23} = 0b011001001;
		let Inst{22} = sz;
		let Inst{21} = 1;
		let Inst{20-16} = Zm;
		let Inst{15-10} = 0b111001;
		let Inst{9-5} = Zn;
		let Inst{4-0} = Zda;

		let Constraints = "$Zda = $_Zda";
		let DestructiveInstType = DestructiveOther;
		let ElementSize = zprty.ElementSize;
		}

		//===----------------------------------------------------------------------===//
		// SVE Memory - Contiguous Load And Replicate 256-bit Group
		//===----------------------------------------------------------------------===//
		class sve_mem_ldor_si<bits<2> sz, string asm, RegisterOperand VecList>
		: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, simm4s32:$imm4),
		asm, "\t$Zt, $Pg/z, [$Rn, $imm4]", "", []>, Sched<[]> {
		bits<5> Zt;
		bits<5> Rn;
		bits<3> Pg;
		bits<4> imm4;
		let Inst{31-25} = 0b1010010;
		let Inst{24-23} = sz;
		let Inst{22-20} = 0b010;
		let Inst{19-16} = imm4;
		let Inst{15-13} = 0b001;
		let Inst{12-10} = Pg;
		let Inst{9-5} = Rn;
		let Inst{4-0} = Zt;

		let mayLoad = 1;
		}

		multiclass sve_mem_ldor_si<bits<2> sz, string asm, RegisterOperand listty,
		ZPRRegOp zprty> {
		def NAME : sve_mem_ldor_si<sz, asm, listty>;
		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",
		(!cast<Instruction>(NAME) listty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 1>;
		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn]",
		(!cast<Instruction>(NAME) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, 0), 0>;
		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $imm4]",
		(!cast<Instruction>(NAME) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, simm4s32:$imm4), 0>;
		}

		class sve_mem_ldor_ss<bits<2> sz, string asm, RegisterOperand VecList,
		RegisterOperand gprty>
		: I<(outs VecList:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, gprty:$Rm),
		asm, "\t$Zt, $Pg/z, [$Rn, $Rm]", "", []>, Sched<[]> {
		bits<5> Zt;
		bits<3> Pg;
		bits<5> Rn;
		bits<5> Rm;
		let Inst{31-25} = 0b1010010;
		let Inst{24-23} = sz;
		let Inst{22-21} = 0b01;
		let Inst{20-16} = Rm;
		let Inst{15-13} = 0;
		let Inst{12-10} = Pg;
		let Inst{9-5} = Rn;
		let Inst{4-0} = Zt;

		let mayLoad = 1;
		}

		multiclass sve_mem_ldor_ss<bits<2> sz, string asm, RegisterOperand listty,
		ZPRRegOp zprty, RegisterOperand gprty> {
		def NAME : sve_mem_ldor_ss<sz, asm, listty, gprty>;

		def : InstAlias<asm # "\t$Zt, $Pg/z, [$Rn, $Rm]",
		(!cast<Instruction>(NAME) zprty:$Zt, PPR3bAny:$Pg, GPR64sp:$Rn, gprty:$Rm), 0>;
		}

		//===----------------------------------------------------------------------===//
		// SVE Interleave 128-bit Elements Group
		//===----------------------------------------------------------------------===//

		class sve_int_perm_bin_perm_128_zz<bits<2> opc, bit P, string asm>
		: I<(outs ZPR128:$Zd), (ins ZPR128:$Zn, ZPR128:$Zm),
		asm, "\t$Zd, $Zn, $Zm",
		"",
		[]>, Sched<[]> {
		bits<5> Zd;
		bits<5> Zm;
		bits<5> Zn;
		let Inst{31-21} = 0b00000101101;
		let Inst{20-16} = Zm;
		let Inst{15-13} = 0b000;
		let Inst{12-11} = opc;
		let Inst{10} = P;
		let Inst{9-5} = Zn;
		let Inst{4-0} = Zd;
		}


/// Addressing modes		/// Addressing modes
def am_sve_indexed_s4 :ComplexPattern<i64, 2, "SelectAddrModeIndexedSVE<-8,7>", [], [SDNPWantRoot]>;		def am_sve_indexed_s4 :ComplexPattern<i64, 2, "SelectAddrModeIndexedSVE<-8,7>", [], [SDNPWantRoot]>;
def am_sve_indexed_s6 :ComplexPattern<i64, 2, "SelectAddrModeIndexedSVE<-32,31>", [], [SDNPWantRoot]>;		def am_sve_indexed_s6 :ComplexPattern<i64, 2, "SelectAddrModeIndexedSVE<-32,31>", [], [SDNPWantRoot]>;

def am_sve_regreg_lsl0 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<0>", []>;		def am_sve_regreg_lsl0 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<0>", []>;
def am_sve_regreg_lsl1 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<1>", []>;		def am_sve_regreg_lsl1 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<1>", []>;
def am_sve_regreg_lsl2 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<2>", []>;		def am_sve_regreg_lsl2 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<2>", []>;
def am_sve_regreg_lsl3 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<3>", []>;		def am_sve_regreg_lsl3 : ComplexPattern<i64, 2, "SelectSVERegRegAddrMode<3>", []>;

llvm/lib/Target/ARM/ARM.td

Show First 20 Lines • Show All 422 Lines • ▼ Show 20 Lines

def FeatureSB : SubtargetFeature<"sb", "HasSB", "true",		def FeatureSB : SubtargetFeature<"sb", "HasSB", "true",
"Enable v8.5a Speculation Barrier" >;		"Enable v8.5a Speculation Barrier" >;

// Armv8.6-A extensions		// Armv8.6-A extensions
def FeatureBF16 : SubtargetFeature<"bf16", "HasBF16", "true",		def FeatureBF16 : SubtargetFeature<"bf16", "HasBF16", "true",
"Enable support for BFloat16 instructions", [FeatureNEON]>;		"Enable support for BFloat16 instructions", [FeatureNEON]>;

		def FeatureMatMulInt8 : SubtargetFeature<"i8mm", "HasMatMulInt8",
		"true", "Enable Matrix Multiply Int8 Extension", [FeatureNEON]>;

// Armv8.1-M extensions		// Armv8.1-M extensions

def FeatureLOB : SubtargetFeature<"lob", "HasLOB", "true",		def FeatureLOB : SubtargetFeature<"lob", "HasLOB", "true",
"Enable Low Overhead Branch "		"Enable Low Overhead Branch "
"extensions">;		"extensions">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ARM architecture class		// ARM architecture class
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	def HasV8_4aOps : SubtargetFeature<"v8.4a", "HasV8_4aOps", "true",
[HasV8_3aOps, FeatureDotProd]>;		[HasV8_3aOps, FeatureDotProd]>;

def HasV8_5aOps : SubtargetFeature<"v8.5a", "HasV8_5aOps", "true",		def HasV8_5aOps : SubtargetFeature<"v8.5a", "HasV8_5aOps", "true",
"Support ARM v8.5a instructions",		"Support ARM v8.5a instructions",
[HasV8_4aOps, FeatureSB]>;		[HasV8_4aOps, FeatureSB]>;

def HasV8_6aOps : SubtargetFeature<"v8.6a", "HasV8_6aOps", "true",		def HasV8_6aOps : SubtargetFeature<"v8.6a", "HasV8_6aOps", "true",
"Support ARM v8.6a instructions",		"Support ARM v8.6a instructions",
[HasV8_5aOps, FeatureBF16]>;		[HasV8_5aOps, FeatureBF16,
		FeatureMatMulInt8]>;

def HasV8_1MMainlineOps : SubtargetFeature<		def HasV8_1MMainlineOps : SubtargetFeature<
"v8.1m.main", "HasV8_1MMainlineOps", "true",		"v8.1m.main", "HasV8_1MMainlineOps", "true",
"Support ARM v8-1M Mainline instructions",		"Support ARM v8-1M Mainline instructions",
[HasV8MMainlineOps]>;		[HasV8MMainlineOps]>;
def HasMVEIntegerOps : SubtargetFeature<		def HasMVEIntegerOps : SubtargetFeature<
"mve", "HasMVEIntegerOps", "true",		"mve", "HasMVEIntegerOps", "true",
"Support M-Class Vector Extension with integer ops",		"Support M-Class Vector Extension with integer ops",
▲ Show 20 Lines • Show All 767 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrNEON.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,817 Lines • ▼ Show 20 Lines
def : Pat<(v4f32 (fma (fneg QPR:$Vn), QPR:$Vm, QPR:$src1)),		def : Pat<(v4f32 (fma (fneg QPR:$Vn), QPR:$Vm, QPR:$src1)),
(VFMSfq QPR:$src1, QPR:$Vn, QPR:$Vm)>,		(VFMSfq QPR:$src1, QPR:$Vn, QPR:$Vm)>,
Requires<[HasNEON,HasVFP4]>;		Requires<[HasNEON,HasVFP4]>;

// ARMv8.2a dot product instructions.		// ARMv8.2a dot product instructions.
// We put them in the VFPV8 decoder namespace because the ARM and Thumb		// We put them in the VFPV8 decoder namespace because the ARM and Thumb
// encodings are the same and thus no further bit twiddling is necessary		// encodings are the same and thus no further bit twiddling is necessary
// in the disassembler.		// in the disassembler.
class VDOT<bit op6, bit op4, RegisterClass RegTy, string Asm, string AsmTy,		class VDOT<bit op6, bit op4, bit op23, RegisterClass RegTy, string Asm,
ValueType AccumTy, ValueType InputTy,		string AsmTy, ValueType AccumTy, ValueType InputTy,
SDPatternOperator OpNode> :		SDPatternOperator OpNode> :
N3Vnp<0b11000, 0b10, 0b1101, op6, op4, (outs RegTy:$dst),		N3Vnp<{0b1100, op23}, 0b10, 0b1101, op6, op4, (outs RegTy:$dst),
(ins RegTy:$Vd, RegTy:$Vn, RegTy:$Vm), N3RegFrm, IIC_VDOTPROD,		(ins RegTy:$Vd, RegTy:$Vn, RegTy:$Vm), N3RegFrm, IIC_VDOTPROD,
Asm, AsmTy,		Asm, AsmTy,
[(set (AccumTy RegTy:$dst),		[(set (AccumTy RegTy:$dst),
(OpNode (AccumTy RegTy:$Vd),		(OpNode (AccumTy RegTy:$Vd),
(InputTy RegTy:$Vn),		(InputTy RegTy:$Vn),
(InputTy RegTy:$Vm)))]> {		(InputTy RegTy:$Vm)))]> {
let Predicates = [HasDotProd];		let Predicates = [HasDotProd];
let DecoderNamespace = "VFPV8";		let DecoderNamespace = "VFPV8";
let Constraints = "$dst = $Vd";		let Constraints = "$dst = $Vd";
		let hasNoSchedulingInfo = 1;
}		}

def VUDOTD : VDOT<0, 1, DPR, "vudot", "u8", v2i32, v8i8, int_arm_neon_udot>;		def VUDOTD : VDOT<0, 1, 0, DPR, "vudot", "u8", v2i32, v8i8, int_arm_neon_udot>;
def VSDOTD : VDOT<0, 0, DPR, "vsdot", "s8", v2i32, v8i8, int_arm_neon_sdot>;		def VSDOTD : VDOT<0, 0, 0, DPR, "vsdot", "s8", v2i32, v8i8, int_arm_neon_sdot>;
def VUDOTQ : VDOT<1, 1, QPR, "vudot", "u8", v4i32, v16i8, int_arm_neon_udot>;		def VUDOTQ : VDOT<1, 1, 0, QPR, "vudot", "u8", v4i32, v16i8, int_arm_neon_udot>;
def VSDOTQ : VDOT<1, 0, QPR, "vsdot", "s8", v4i32, v16i8, int_arm_neon_sdot>;		def VSDOTQ : VDOT<1, 0, 0, QPR, "vsdot", "s8", v4i32, v16i8, int_arm_neon_sdot>;

// Indexed dot product instructions:		// Indexed dot product instructions:
multiclass DOTI<string opc, string dt, bit Q, bit U, RegisterClass Ty,		multiclass DOTI<string opc, string dt, bit Q, bit U, RegisterClass Ty,
ValueType AccumType, ValueType InputType, SDPatternOperator OpNode,		ValueType AccumType, ValueType InputType, SDPatternOperator OpNode,
dag RHS> {		dag RHS> {
def "" : N3Vnp<0b11100, 0b10, 0b1101, Q, U, (outs Ty:$dst),		def "" : N3Vnp<0b11100, 0b10, 0b1101, Q, U, (outs Ty:$dst),
(ins Ty:$Vd, Ty:$Vn, DPR_VFP2:$Vm, VectorIndex32:$lane),		(ins Ty:$Vd, Ty:$Vn, DPR_VFP2:$Vm, VectorIndex32:$lane),
N3RegFrm, IIC_VDOTPROD, opc, dt, []> {		N3RegFrm, IIC_VDOTPROD, opc, dt, []> {
Show All 18 Lines	defm VUDOTDI : DOTI<"vudot", "u8", 0b0, 0b1, DPR, v2i32, v8i8,
int_arm_neon_udot, (v2i32 DPR_VFP2:$Vm)>;		int_arm_neon_udot, (v2i32 DPR_VFP2:$Vm)>;
defm VSDOTDI : DOTI<"vsdot", "s8", 0b0, 0b0, DPR, v2i32, v8i8,		defm VSDOTDI : DOTI<"vsdot", "s8", 0b0, 0b0, DPR, v2i32, v8i8,
int_arm_neon_sdot, (v2i32 DPR_VFP2:$Vm)>;		int_arm_neon_sdot, (v2i32 DPR_VFP2:$Vm)>;
defm VUDOTQI : DOTI<"vudot", "u8", 0b1, 0b1, QPR, v4i32, v16i8,		defm VUDOTQI : DOTI<"vudot", "u8", 0b1, 0b1, QPR, v4i32, v16i8,
int_arm_neon_udot, (EXTRACT_SUBREG QPR:$Vm, dsub_0)>;		int_arm_neon_udot, (EXTRACT_SUBREG QPR:$Vm, dsub_0)>;
defm VSDOTQI : DOTI<"vsdot", "s8", 0b1, 0b0, QPR, v4i32, v16i8,		defm VSDOTQI : DOTI<"vsdot", "s8", 0b1, 0b0, QPR, v4i32, v16i8,
int_arm_neon_sdot, (EXTRACT_SUBREG QPR:$Vm, dsub_0)>;		int_arm_neon_sdot, (EXTRACT_SUBREG QPR:$Vm, dsub_0)>;

		// v8.6A matrix multiplication extension
		let Predicates = [HasMatMulInt8] in {
		class N3VMatMul<bit B, bit U, string Asm, string AsmTy,
		SDPatternOperator OpNode>
		: N3Vnp<{0b1100, B}, 0b10, 0b1100, 1, U, (outs QPR:$dst),
		(ins QPR:$Vd, QPR:$Vn, QPR:$Vm), N3RegFrm, NoItinerary,
		Asm, AsmTy,
		[(set (v4i32 QPR:$dst), (OpNode (v4i32 QPR:$Vd),
		(v16i8 QPR:$Vn),
		(v16i8 QPR:$Vm)))]> {
		let DecoderNamespace = "VFPV8";
		let Constraints = "$dst = $Vd";
		let hasNoSchedulingInfo = 1;
		}

		multiclass N3VMixedDotLane<bit Q, bit U, string Asm, string AsmTy, RegisterClass RegTy,
		ValueType AccumTy, ValueType InputTy, SDPatternOperator OpNode,
		dag RHS> {

		def "" : N3Vnp<0b11101, 0b00, 0b1101, Q, U, (outs RegTy:$dst),
		(ins RegTy:$Vd, RegTy:$Vn, DPR_VFP2:$Vm, VectorIndex32:$lane), N3RegFrm,
		NoItinerary, Asm, AsmTy, []> {
		bit lane;
		let hasNoSchedulingInfo = 1;
		let Inst{5} = lane;
		let AsmString = !strconcat(Asm, ".", AsmTy, "\t$Vd, $Vn, $Vm$lane");
		let DecoderNamespace = "VFPV8";
		let Constraints = "$dst = $Vd";
		}

		def : Pat<
		(AccumTy (OpNode (AccumTy RegTy:$Vd),
		(InputTy RegTy:$Vn),
		(InputTy (bitconvert (AccumTy
		(ARMvduplane (AccumTy RegTy:$Vm),
		VectorIndex32:$lane)))))),
		(!cast<Instruction>(NAME) RegTy:$Vd, RegTy:$Vn, RHS, VectorIndex32:$lane)>;

		}

		multiclass SUDOTLane<bit Q, RegisterClass RegTy, ValueType AccumTy, ValueType InputTy, dag RHS>
		: N3VMixedDotLane<Q, 1, "vsudot", "u8", RegTy, AccumTy, InputTy, null_frag, null_frag> {
		def : Pat<
		(AccumTy (int_arm_neon_usdot (AccumTy RegTy:$Vd),
		(InputTy (bitconvert (AccumTy
		(ARMvduplane (AccumTy RegTy:$Vm),
		VectorIndex32:$lane)))),
		(InputTy RegTy:$Vn))),
		(!cast<Instruction>(NAME) RegTy:$Vd, RegTy:$Vn, RHS, VectorIndex32:$lane)>;
		}

		def VSMMLA : N3VMatMul<0, 0, "vsmmla", "s8", int_arm_neon_smmla>;
		def VUMMLA : N3VMatMul<0, 1, "vummla", "u8", int_arm_neon_ummla>;
		def VUSMMLA : N3VMatMul<1, 0, "vusmmla", "s8", int_arm_neon_usmmla>;
		def VUSDOTD : VDOT<0, 0, 1, DPR, "vusdot", "s8", v2i32, v8i8, int_arm_neon_usdot>;
		def VUSDOTQ : VDOT<1, 0, 1, QPR, "vusdot", "s8", v4i32, v16i8, int_arm_neon_usdot>;

		defm VUSDOTDI : N3VMixedDotLane<0, 0, "vusdot", "s8", DPR, v2i32, v8i8,
		int_arm_neon_usdot, (v2i32 DPR_VFP2:$Vm)>;
		defm VUSDOTQI : N3VMixedDotLane<1, 0, "vusdot", "s8", QPR, v4i32, v16i8,
		int_arm_neon_usdot, (EXTRACT_SUBREG QPR:$Vm, dsub_0)>;
		defm VSUDOTDI : SUDOTLane<0, DPR, v2i32, v8i8, (v2i32 DPR_VFP2:$Vm)>;
		defm VSUDOTQI : SUDOTLane<1, QPR, v4i32, v16i8, (EXTRACT_SUBREG QPR:$Vm, dsub_0)>;
		}

// ARMv8.3 complex operations		// ARMv8.3 complex operations
class BaseN3VCP8ComplexTied<bit op21, bit op4, bit s, bit q,		class BaseN3VCP8ComplexTied<bit op21, bit op4, bit s, bit q,
InstrItinClass itin, dag oops, dag iops,		InstrItinClass itin, dag oops, dag iops,
string opc, string dt, list<dag> pattern>		string opc, string dt, list<dag> pattern>
: N3VCP8<{?,?}, {op21,s}, q, op4, oops,		: N3VCP8<{?,?}, {op21,s}, q, op4, oops,
iops, itin, opc, dt, "$Vd, $Vn, $Vm, $rot", "$src1 = $Vd", pattern>{		iops, itin, opc, dt, "$Vd, $Vn, $Vm, $rot", "$src1 = $Vd", pattern>{
bits<2> rot;		bits<2> rot;
▲ Show 20 Lines • Show All 4,127 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMPredicates.td

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	def HasFP16 : Predicate<"Subtarget->hasFP16()">,			def HasFP16 : Predicate<"Subtarget->hasFP16()">,
	AssemblerPredicate<(all_of FeatureFP16),"half-float conversions">;			AssemblerPredicate<(all_of FeatureFP16),"half-float conversions">;
	def HasFullFP16 : Predicate<"Subtarget->hasFullFP16()">,			def HasFullFP16 : Predicate<"Subtarget->hasFullFP16()">,
	AssemblerPredicate<(all_of FeatureFullFP16),"full half-float">;			AssemblerPredicate<(all_of FeatureFullFP16),"full half-float">;
	def HasFP16FML : Predicate<"Subtarget->hasFP16FML()">,			def HasFP16FML : Predicate<"Subtarget->hasFP16FML()">,
	AssemblerPredicate<(all_of FeatureFP16FML),"full half-float fml">;			AssemblerPredicate<(all_of FeatureFP16FML),"full half-float fml">;
	def HasBF16 : Predicate<"Subtarget->hasBF16()">,			def HasBF16 : Predicate<"Subtarget->hasBF16()">,
	AssemblerPredicate<(all_of FeatureBF16),"BFloat16 floating point extension">;			AssemblerPredicate<(all_of FeatureBF16),"BFloat16 floating point extension">;
				def HasMatMulInt8 : Predicate<"Subtarget->hasMatMulInt8()">,
				AssemblerPredicate<(all_of FeatureMatMulInt8),"8-bit integer matrix multiply">;
	def HasDivideInThumb : Predicate<"Subtarget->hasDivideInThumbMode()">,			def HasDivideInThumb : Predicate<"Subtarget->hasDivideInThumbMode()">,
	AssemblerPredicate<(all_of FeatureHWDivThumb), "divide in THUMB">;			AssemblerPredicate<(all_of FeatureHWDivThumb), "divide in THUMB">;
	def HasDivideInARM : Predicate<"Subtarget->hasDivideInARMMode()">,			def HasDivideInARM : Predicate<"Subtarget->hasDivideInARMMode()">,
	AssemblerPredicate<(all_of FeatureHWDivARM), "divide in ARM">;			AssemblerPredicate<(all_of FeatureHWDivARM), "divide in ARM">;
	def HasDSP : Predicate<"Subtarget->hasDSP()">,			def HasDSP : Predicate<"Subtarget->hasDSP()">,
	AssemblerPredicate<(all_of FeatureDSP), "dsp">;			AssemblerPredicate<(all_of FeatureDSP), "dsp">;
	def HasDB : Predicate<"Subtarget->hasDataBarrier()">,			def HasDB : Predicate<"Subtarget->hasDataBarrier()">,
	AssemblerPredicate<(all_of FeatureDB),			AssemblerPredicate<(all_of FeatureDB),
	▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMSubtarget.h

Show First 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	protected:
bool HasFullFP16 = false;		bool HasFullFP16 = false;

/// HasFP16FML - True if subtarget supports half-precision FP fml operations		/// HasFP16FML - True if subtarget supports half-precision FP fml operations
bool HasFP16FML = false;		bool HasFP16FML = false;

/// HasBF16 - True if subtarget supports BFloat16 floating point operations		/// HasBF16 - True if subtarget supports BFloat16 floating point operations
bool HasBF16 = false;		bool HasBF16 = false;

		/// HasMatMulInt8 - True if subtarget supports 8-bit integer matrix multiply
		bool HasMatMulInt8 = false;

/// HasD32 - True if subtarget has the full 32 double precision		/// HasD32 - True if subtarget has the full 32 double precision
/// FP registers for VFPv3.		/// FP registers for VFPv3.
bool HasD32 = false;		bool HasD32 = false;

/// HasHardwareDivide - True if subtarget supports [su]div in Thumb mode		/// HasHardwareDivide - True if subtarget supports [su]div in Thumb mode
bool HasHardwareDivideInThumb = false;		bool HasHardwareDivideInThumb = false;

/// HasHardwareDivideInARM - True if subtarget supports [su]div in ARM mode		/// HasHardwareDivideInARM - True if subtarget supports [su]div in ARM mode
▲ Show 20 Lines • Show All 428 Lines • ▼ Show 20 Lines	public:
bool hasFullFP16() const { return HasFullFP16; }		bool hasFullFP16() const { return HasFullFP16; }
bool hasFP16FML() const { return HasFP16FML; }		bool hasFP16FML() const { return HasFP16FML; }

bool hasFuseAES() const { return HasFuseAES; }		bool hasFuseAES() const { return HasFuseAES; }
bool hasFuseLiterals() const { return HasFuseLiterals; }		bool hasFuseLiterals() const { return HasFuseLiterals; }
/// Return true if the CPU supports any kind of instruction fusion.		/// Return true if the CPU supports any kind of instruction fusion.
bool hasFusion() const { return hasFuseAES() \|\| hasFuseLiterals(); }		bool hasFusion() const { return hasFuseAES() \|\| hasFuseLiterals(); }

		bool hasMatMulInt8() const { return HasMatMulInt8; }

const Triple &getTargetTriple() const { return TargetTriple; }		const Triple &getTargetTriple() const { return TargetTriple; }

bool isTargetDarwin() const { return TargetTriple.isOSDarwin(); }		bool isTargetDarwin() const { return TargetTriple.isOSDarwin(); }
bool isTargetIOS() const { return TargetTriple.isiOS(); }		bool isTargetIOS() const { return TargetTriple.isiOS(); }
bool isTargetWatchOS() const { return TargetTriple.isWatchOS(); }		bool isTargetWatchOS() const { return TargetTriple.isWatchOS(); }
bool isTargetWatchABI() const { return TargetTriple.isWatchABI(); }		bool isTargetWatchABI() const { return TargetTriple.isWatchABI(); }
bool isTargetLinux() const { return TargetTriple.isOSLinux(); }		bool isTargetLinux() const { return TargetTriple.isOSLinux(); }
bool isTargetNaCl() const { return TargetTriple.isOSNaCl(); }		bool isTargetNaCl() const { return TargetTriple.isOSNaCl(); }
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,324 Lines • ▼ Show 20 Lines	if ((Mnemonic == "movs" && isThumb()) \|\|
Mnemonic == "vdot" \|\| Mnemonic == "vmmla" \|\|		Mnemonic == "vdot" \|\| Mnemonic == "vmmla" \|\|
Mnemonic == "vudot" \|\| Mnemonic == "vsdot" \|\|		Mnemonic == "vudot" \|\| Mnemonic == "vsdot" \|\|
Mnemonic == "vcmla" \|\| Mnemonic == "vcadd" \|\|		Mnemonic == "vcmla" \|\| Mnemonic == "vcadd" \|\|
Mnemonic == "vfmal" \|\| Mnemonic == "vfmsl" \|\|		Mnemonic == "vfmal" \|\| Mnemonic == "vfmsl" \|\|
Mnemonic == "wls" \|\| Mnemonic == "le" \|\| Mnemonic == "dls" \|\|		Mnemonic == "wls" \|\| Mnemonic == "le" \|\| Mnemonic == "dls" \|\|
Mnemonic == "csel" \|\| Mnemonic == "csinc" \|\|		Mnemonic == "csel" \|\| Mnemonic == "csinc" \|\|
Mnemonic == "csinv" \|\| Mnemonic == "csneg" \|\| Mnemonic == "cinc" \|\|		Mnemonic == "csinv" \|\| Mnemonic == "csneg" \|\| Mnemonic == "cinc" \|\|
Mnemonic == "cinv" \|\| Mnemonic == "cneg" \|\| Mnemonic == "cset" \|\|		Mnemonic == "cinv" \|\| Mnemonic == "cneg" \|\| Mnemonic == "cset" \|\|
Mnemonic == "csetm")		Mnemonic == "csetm" \|\|
		Mnemonic == "vsmmla" \|\| Mnemonic == "vummla" \|\|
		Mnemonic == "vusmmla" \|\| Mnemonic == "vsudot" \|\|
		Mnemonic == "vusdot")
return Mnemonic;		return Mnemonic;

// First, split out any predication code. Ignore mnemonics we know aren't		// First, split out any predication code. Ignore mnemonics we know aren't
// predicated but do have a carry-set and so weren't caught above.		// predicated but do have a carry-set and so weren't caught above.
if (Mnemonic != "adcs" && Mnemonic != "bics" && Mnemonic != "movs" &&		if (Mnemonic != "adcs" && Mnemonic != "bics" && Mnemonic != "movs" &&
Mnemonic != "muls" && Mnemonic != "smlals" && Mnemonic != "smulls" &&		Mnemonic != "muls" && Mnemonic != "smlals" && Mnemonic != "smulls" &&
Mnemonic != "umlals" && Mnemonic != "umulls" && Mnemonic != "lsls" &&		Mnemonic != "umlals" && Mnemonic != "umulls" && Mnemonic != "lsls" &&
Mnemonic != "sbcs" && Mnemonic != "rscs" &&		Mnemonic != "sbcs" && Mnemonic != "rscs" &&
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	if (Mnemonic == "bkpt" \|\| Mnemonic == "cbnz" \|\| Mnemonic == "setend" \|\|
(FullInst.startswith("vmull") && FullInst.endswith(".p64")) \|\|		(FullInst.startswith("vmull") && FullInst.endswith(".p64")) \|\|
Mnemonic == "vmovx" \|\| Mnemonic == "vins" \|\|		Mnemonic == "vmovx" \|\| Mnemonic == "vins" \|\|
Mnemonic == "vudot" \|\| Mnemonic == "vsdot" \|\|		Mnemonic == "vudot" \|\| Mnemonic == "vsdot" \|\|
Mnemonic == "vcmla" \|\| Mnemonic == "vcadd" \|\|		Mnemonic == "vcmla" \|\| Mnemonic == "vcadd" \|\|
Mnemonic == "vfmal" \|\| Mnemonic == "vfmsl" \|\|		Mnemonic == "vfmal" \|\| Mnemonic == "vfmsl" \|\|
Mnemonic == "vfmat" \|\| Mnemonic == "vfmab" \|\|		Mnemonic == "vfmat" \|\| Mnemonic == "vfmab" \|\|
Mnemonic == "vdot" \|\| Mnemonic == "vmmla" \|\|		Mnemonic == "vdot" \|\| Mnemonic == "vmmla" \|\|
Mnemonic == "sb" \|\| Mnemonic == "ssbb" \|\|		Mnemonic == "sb" \|\| Mnemonic == "ssbb" \|\|
Mnemonic == "pssbb" \|\|		Mnemonic == "pssbb" \|\| Mnemonic == "vsmmla" \|\|
		Mnemonic == "vummla" \|\| Mnemonic == "vusmmla" \|\|
		Mnemonic == "vusdot" \|\| Mnemonic == "vsudot" \|\|
Mnemonic == "bfcsel" \|\| Mnemonic == "wls" \|\|		Mnemonic == "bfcsel" \|\| Mnemonic == "wls" \|\|
Mnemonic == "dls" \|\| Mnemonic == "le" \|\| Mnemonic == "csel" \|\|		Mnemonic == "dls" \|\| Mnemonic == "le" \|\| Mnemonic == "csel" \|\|
Mnemonic == "csinc" \|\| Mnemonic == "csinv" \|\| Mnemonic == "csneg" \|\|		Mnemonic == "csinc" \|\| Mnemonic == "csinv" \|\| Mnemonic == "csneg" \|\|
Mnemonic == "cinc" \|\| Mnemonic == "cinv" \|\| Mnemonic == "cneg" \|\|		Mnemonic == "cinc" \|\| Mnemonic == "cinv" \|\| Mnemonic == "cneg" \|\|
Mnemonic == "cset" \|\| Mnemonic == "csetm" \|\|		Mnemonic == "cset" \|\| Mnemonic == "csetm" \|\|
Mnemonic.startswith("vpt") \|\| Mnemonic.startswith("vpst") \|\|		Mnemonic.startswith("vpt") \|\| Mnemonic.startswith("vpst") \|\|
(hasCDE() && MS.isCDEInstr(Mnemonic) &&		(hasCDE() && MS.isCDEInstr(Mnemonic) &&
!MS.isITPredicableCDEInstr(Mnemonic)) \|\|		!MS.isITPredicableCDEInstr(Mnemonic)) \|\|
▲ Show 20 Lines • Show All 5,787 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-matmul.ll

This file was added.

				; RUN: llc -mtriple=aarch64-none-linux-gnu -mattr=+neon,+i8mm < %s -o -\| FileCheck %s

				define <4 x i32> @smmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: smmla.v4i32.v16i8
				; CHECK: smmla v0.4s, v1.16b, v2.16b
				%vmmla1.i = tail call <4 x i32> @llvm.aarch64.neon.smmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b)
				ret <4 x i32> %vmmla1.i
				}

				define <4 x i32> @ummla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: ummla.v4i32.v16i8
				; CHECK: ummla v0.4s, v1.16b, v2.16b
				%vmmla1.i = tail call <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b)
				ret <4 x i32> %vmmla1.i
				}

				define <4 x i32> @usmmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: usmmla.v4i32.v16i8
				; CHECK: usmmla v0.4s, v1.16b, v2.16b
				%vusmmla1.i = tail call <4 x i32> @llvm.aarch64.neon.usmmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) #3
				ret <4 x i32> %vusmmla1.i
				}

				define <2 x i32> @usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: usdot.v2i32.v8i8
				; CHECK: usdot v0.2s, v1.8b, v2.8b
				%vusdot1.i = tail call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b)
				ret <2 x i32> %vusdot1.i
				}

				define <2 x i32> @usdot_lane.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: usdot_lane.v2i32.v8i8
				; CHECK: usdot v0.2s, v1.8b, v2.4b[0]
				%0 = bitcast <8 x i8> %b to <2 x i32>
				%shuffle = shufflevector <2 x i32> %0, <2 x i32> undef, <2 x i32> zeroinitializer
				%1 = bitcast <2 x i32> %shuffle to <8 x i8>
				%vusdot1.i = tail call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %1)
				ret <2 x i32> %vusdot1.i
				}

				define <2 x i32> @sudot_lane.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: sudot_lane.v2i32.v8i8
				; CHECK: sudot v0.2s, v1.8b, v2.4b[0]
				%0 = bitcast <8 x i8> %b to <2 x i32>
				%shuffle = shufflevector <2 x i32> %0, <2 x i32> undef, <2 x i32> zeroinitializer
				%1 = bitcast <2 x i32> %shuffle to <8 x i8>
				%vusdot1.i = tail call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %1, <8 x i8> %a)
				ret <2 x i32> %vusdot1.i
				}

				define <2 x i32> @usdot_lane.v2i32.v16i8(<2 x i32> %r, <8 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: usdot_lane.v2i32.v16i8
				; CHECK: usdot v0.2s, v1.8b, v2.4b[0]
				%0 = bitcast <16 x i8> %b to <4 x i32>
				%shuffle = shufflevector <4 x i32> %0, <4 x i32> undef, <2 x i32> zeroinitializer
				%1 = bitcast <2 x i32> %shuffle to <8 x i8>
				%vusdot1.i = tail call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %1)
				ret <2 x i32> %vusdot1.i
				}

				define <2 x i32> @sudot_lane.v2i32.v16i8(<2 x i32> %r, <8 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: sudot_lane.v2i32.v16i8
				; CHECK: sudot v0.2s, v1.8b, v2.4b[0]
				%0 = bitcast <16 x i8> %b to <4 x i32>
				%shuffle = shufflevector <4 x i32> %0, <4 x i32> undef, <2 x i32> zeroinitializer
				%1 = bitcast <2 x i32> %shuffle to <8 x i8>
				%vusdot1.i = tail call <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %1, <8 x i8> %a) #3
				ret <2 x i32> %vusdot1.i
				}

				define <4 x i32> @usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: usdot.v4i32.v16i8
				; CHECK: usdot v0.4s, v1.16b, v2.16b
				%vusdot1.i = tail call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) #3
				ret <4 x i32> %vusdot1.i
				}

				define <4 x i32> @usdot_lane.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: usdot_lane.v4i32.v16i8
				; CHECK: usdot v0.4s, v1.16b, v2.4b[0]
				%0 = bitcast <8 x i8> %b to <2 x i32>
				%shuffle = shufflevector <2 x i32> %0, <2 x i32> undef, <4 x i32> zeroinitializer
				%1 = bitcast <4 x i32> %shuffle to <16 x i8>
				%vusdot1.i = tail call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %1) #3
				ret <4 x i32> %vusdot1.i
				}

				define <4 x i32> @sudot_lane.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: sudot_lane.v4i32.v16i8
				; CHECK: sudot v0.4s, v1.16b, v2.4b[0]
				%0 = bitcast <8 x i8> %b to <2 x i32>
				%shuffle = shufflevector <2 x i32> %0, <2 x i32> undef, <4 x i32> zeroinitializer
				%1 = bitcast <4 x i32> %shuffle to <16 x i8>
				%vusdot1.i = tail call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %1, <16 x i8> %a) #3
				ret <4 x i32> %vusdot1.i
				}

				define <4 x i32> @usdot_laneq.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: usdot_laneq.v4i32.v16i8
				; CHECK: usdot v0.4s, v1.16b, v2.4b[0]
				%0 = bitcast <16 x i8> %b to <4 x i32>
				%shuffle = shufflevector <4 x i32> %0, <4 x i32> undef, <4 x i32> zeroinitializer
				%1 = bitcast <4 x i32> %shuffle to <16 x i8>
				%vusdot1.i = tail call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %1) #3
				ret <4 x i32> %vusdot1.i
				}

				define <4 x i32> @sudot_laneq.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: sudot_laneq.v4i32.v16i8
				; CHECK: sudot v0.4s, v1.16b, v2.4b[0]
				%0 = bitcast <16 x i8> %b to <4 x i32>
				%shuffle = shufflevector <4 x i32> %0, <4 x i32> undef, <4 x i32> zeroinitializer
				%1 = bitcast <4 x i32> %shuffle to <16 x i8>
				%vusdot1.i = tail call <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %1, <16 x i8> %a) #3
				ret <4 x i32> %vusdot1.i
				}

				declare <4 x i32> @llvm.aarch64.neon.smmla.v4i32.v16i8(<4 x i32>, <16 x i8>, <16 x i8>) #2
				declare <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8(<4 x i32>, <16 x i8>, <16 x i8>) #2
				declare <4 x i32> @llvm.aarch64.neon.usmmla.v4i32.v16i8(<4 x i32>, <16 x i8>, <16 x i8>) #2
				declare <2 x i32> @llvm.aarch64.neon.usdot.v2i32.v8i8(<2 x i32>, <8 x i8>, <8 x i8>) #2
				declare <4 x i32> @llvm.aarch64.neon.usdot.v4i32.v16i8(<4 x i32>, <16 x i8>, <16 x i8>) #2

llvm/test/CodeGen/ARM/arm-matmul.ll

This file was added.

				; RUN: llc -mtriple=arm-none-linux-gnu -mattr=+neon,+i8mm -float-abi=hard < %s -o -\| FileCheck %s

				define <4 x i32> @smmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: smmla.v4i32.v16i8
				; CHECK: vsmmla.s8 q0, q1, q2
				%vmmla1.i = tail call <4 x i32> @llvm.arm.neon.smmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) #3
				ret <4 x i32> %vmmla1.i
				}

				define <4 x i32> @ummla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: ummla.v4i32.v16i8
				; CHECK: vummla.u8 q0, q1, q2
				%vmmla1.i = tail call <4 x i32> @llvm.arm.neon.ummla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) #3
				ret <4 x i32> %vmmla1.i
				}

				define <4 x i32> @usmmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) {
				entry:
				; CHECK-LABEL: usmmla.v4i32.v16i8
				; CHECK: vusmmla.s8 q0, q1, q2
				%vusmmla1.i = tail call <4 x i32> @llvm.arm.neon.usmmla.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %b) #3
				ret <4 x i32> %vusmmla1.i
				}

				define <2 x i32> @usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: usdot.v2i32.v8i8
				; CHECK: vusdot.s8 d0, d1, d2
				%vusdot1.i = tail call <2 x i32> @llvm.arm.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b) #3
				ret <2 x i32> %vusdot1.i
				}

				define <2 x i32> @usdot_lane.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: usdot_lane.v2i32.v8i8
				; CHECK: vusdot.s8 d0, d1, d2[0]
				%0 = bitcast <8 x i8> %b to <2 x i32>
				%shuffle = shufflevector <2 x i32> %0, <2 x i32> undef, <2 x i32> zeroinitializer
				%1 = bitcast <2 x i32> %shuffle to <8 x i8>
				%vusdot1.i = tail call <2 x i32> @llvm.arm.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %1) #3
				ret <2 x i32> %vusdot1.i
				}

				define <2 x i32> @sudot_lane.v2i32.v8i8(<2 x i32> %r, <8 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: sudot_lane.v2i32.v8i8
				; CHECK: vsudot.u8 d0, d1, d2[0]
				%0 = bitcast <8 x i8> %b to <2 x i32>
				%shuffle = shufflevector <2 x i32> %0, <2 x i32> undef, <2 x i32> zeroinitializer
				%1 = bitcast <2 x i32> %shuffle to <8 x i8>
				%vusdot1.i = tail call <2 x i32> @llvm.arm.neon.usdot.v2i32.v8i8(<2 x i32> %r, <8 x i8> %1, <8 x i8> %a) #3
				ret <2 x i32> %vusdot1.i
				}

				define <4 x i32> @usdotq_lane.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: usdotq_lane.v4i32.v16i8
				; CHECK: vusdot.s8 q0, q1, d4[0]
				%0 = bitcast <8 x i8> %b to <2 x i32>
				%shuffle = shufflevector <2 x i32> %0, <2 x i32> undef, <4 x i32> zeroinitializer
				%1 = bitcast <4 x i32> %shuffle to <16 x i8>
				%vusdot1.i = tail call <4 x i32> @llvm.arm.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <16 x i8> %1) #3
				ret <4 x i32> %vusdot1.i
				}

				define <4 x i32> @sudotq_lane.v4i32.v16i8(<4 x i32> %r, <16 x i8> %a, <8 x i8> %b) {
				entry:
				; CHECK-LABEL: sudotq_lane.v4i32.v16i8
				; CHECK: vsudot.u8 q0, q1, d4[0]
				%0 = bitcast <8 x i8> %b to <2 x i32>
				%shuffle = shufflevector <2 x i32> %0, <2 x i32> undef, <4 x i32> zeroinitializer
				%1 = bitcast <4 x i32> %shuffle to <16 x i8>
				%vusdot1.i = tail call <4 x i32> @llvm.arm.neon.usdot.v4i32.v16i8(<4 x i32> %r, <16 x i8> %1, <16 x i8> %a) #3
				ret <4 x i32> %vusdot1.i
				}

				declare <4 x i32> @llvm.arm.neon.smmla.v4i32.v16i8(<4 x i32>, <16 x i8>, <16 x i8>) #2
				declare <4 x i32> @llvm.arm.neon.ummla.v4i32.v16i8(<4 x i32>, <16 x i8>, <16 x i8>) #2
				declare <4 x i32> @llvm.arm.neon.usmmla.v4i32.v16i8(<4 x i32>, <16 x i8>, <16 x i8>) #2
				declare <2 x i32> @llvm.arm.neon.usdot.v2i32.v8i8(<2 x i32>, <8 x i8>, <8 x i8>) #2
				declare <4 x i32> @llvm.arm.neon.usdot.v4i32.v16i8(<4 x i32>, <16 x i8>, <16 x i8>) #2

llvm/test/MC/AArch64/SVE/matrix-multiply-fp-diagnostics.s

This file was added.

				// RUN: not llvm-mc -triple=aarch64 -show-encoding -mattr=+sve,+f32mm,+f64mm 2>&1 < %s \| FileCheck %s

				// --------------------------------------------------------------------------//
				// FMMLA (SVE)

				// Invalid element size

				fmmla z0.h, z1.h, z2.h
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width

				// Mis-matched element size

				fmmla z0.d, z1.s, z2.s
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width
				fmmla z0.s, z1.d, z2.s
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width
				fmmla z0.s, z1.s, z2.d
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width


				// --------------------------------------------------------------------------//
				// LD1RO (SVE, scalar plus immediate)

				// Immediate too high (>224)
				ld1rob { z0.b }, p1/z, [x2, #256]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].
				ld1roh { z0.h }, p1/z, [x2, #256]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].
				ld1row { z0.s }, p1/z, [x2, #256]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].
				ld1rod { z0.d }, p1/z, [x2, #256]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].

				// Immediate too low (<-256)
				ld1rob { z0.b }, p1/z, [x2, #-288]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].
				ld1roh { z0.h }, p1/z, [x2, #-288]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].
				ld1row { z0.s }, p1/z, [x2, #-288]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].
				ld1rod { z0.d }, p1/z, [x2, #-288]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].

				// Immediate not a multiple of 32
				ld1rob { z0.b }, p1/z, [x2, #16]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].
				ld1roh { z0.h }, p1/z, [x2, #16]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].
				ld1row { z0.s }, p1/z, [x2, #16]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].
				ld1rod { z0.d }, p1/z, [x2, #16]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: index must be a multiple of 32 in range [-256, 224].

				// Prediate register too high
				ld1rob { z0.b }, p8/z, [x2]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid restricted predicate register, expected p0..p7 (without element suffix)
				ld1roh { z0.h }, p8/z, [x2]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid restricted predicate register, expected p0..p7 (without element suffix)
				ld1row { z0.s }, p8/z, [x2]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid restricted predicate register, expected p0..p7 (without element suffix)
				ld1rod { z0.d }, p8/z, [x2]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid restricted predicate register, expected p0..p7 (without element suffix)


				// --------------------------------------------------------------------------//
				// LD1RO (SVE, scalar plus scalar)

				// Shift ammount not matched to data width
				ld1rob { z0.b }, p1/z, [x2, x3, lsl #1]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: register must be x0..x30 without shift
				ld1roh { z0.h }, p1/z, [x2, x3, lsl #0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: register must be x0..x30 with required shift 'lsl #1'
				ld1row { z0.s }, p1/z, [x2, x3, lsl #3]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: register must be x0..x30 with required shift 'lsl #2'
				ld1rod { z0.d }, p1/z, [x2, x3, lsl #2]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: register must be x0..x30 with required shift 'lsl #3'

				// Prediate register too high
				ld1rob { z0.b }, p8/z, [x2, x3, lsl #0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid restricted predicate register, expected p0..p7 (without element suffix)
				ld1roh { z0.h }, p8/z, [x2, x3, lsl #1]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid restricted predicate register, expected p0..p7 (without element suffix)
				ld1row { z0.s }, p8/z, [x2, x3, lsl #2]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid restricted predicate register, expected p0..p7 (without element suffix)
				ld1rod { z0.d }, p8/z, [x2, x3, lsl #3]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid restricted predicate register, expected p0..p7 (without element suffix)

llvm/test/MC/AArch64/SVE/matrix-multiply-fp32.s

This file was added.

				// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sve,+f32mm < %s \
				// RUN: \| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
				// RUN: not llvm-mc -triple=aarch64 -show-encoding -mattr=+sve,+i8mm,+f64mm < %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=CHECK-ERROR
				// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+sve,+f32mm < %s \
				// RUN: \| llvm-objdump -d --mattr=+sve,+f32mm - \| FileCheck %s --check-prefix=CHECK-INST
				// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+sve,+f32mm < %s \
				// RUN: \| llvm-objdump -d - \| FileCheck %s --check-prefix=CHECK-UNKNOWN

				// --------------------------------------------------------------------------//
				// FMMLA (SVE)

				fmmla z0.s, z1.s, z2.s
				// CHECK-INST: fmmla z0.s, z1.s, z2.s
				// CHECK-ENCODING: [0x20,0xe4,0xa2,0x64]
				// CHECK-ERROR: instruction requires: f32mm
				// CHECK-UNKNOWN: 20 e4 a2 64 <unknown>

llvm/test/MC/AArch64/SVE/matrix-multiply-fp64.s

This file was added.

				// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sve,+f64mm < %s \
				// RUN: \| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
				// RUN: not llvm-mc -triple=aarch64 -show-encoding -mattr=+sve,+i8mm,+f32mm < %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=CHECK-ERROR
				// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+sve,+f64mm < %s \
				// RUN: \| llvm-objdump -d --mattr=+sve,+f64mm - \| FileCheck %s --check-prefix=CHECK-INST
				// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+sve,+f64mm < %s \
				// RUN: \| llvm-objdump -d - \| FileCheck %s --check-prefix=CHECK-UNKNOWN

				// --------------------------------------------------------------------------//
				// FMMLA (SVE)

				fmmla z0.d, z1.d, z2.d
				// CHECK-INST: fmmla z0.d, z1.d, z2.d
				// CHECK-ENCODING: [0x20,0xe4,0xe2,0x64]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 20 e4 e2 64 <unknown>

				// --------------------------------------------------------------------------//
				// LD1RO (SVE, scalar plus immediate)

				// With maximum immediate (224)

				ld1rob { z0.b }, p1/z, [x2, #224]
				// CHECK-INST: ld1rob { z0.b }, p1/z, [x2, #224]
				// CHECK-ENCODING: [0x40,0x24,0x27,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 27 a4 <unknown>

				ld1roh { z0.h }, p1/z, [x2, #224]
				// CHECK-INST: ld1roh { z0.h }, p1/z, [x2, #224]
				// CHECK-ENCODING: [0x40,0x24,0xa7,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 a7 a4 <unknown>

				ld1row { z0.s }, p1/z, [x2, #224]
				// CHECK-INST: ld1row { z0.s }, p1/z, [x2, #224]
				// CHECK-ENCODING: [0x40,0x24,0x27,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 27 a5 <unknown>

				ld1rod { z0.d }, p1/z, [x2, #224]
				// CHECK-INST: ld1rod { z0.d }, p1/z, [x2, #224]
				// CHECK-ENCODING: [0x40,0x24,0xa7,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 a7 a5 <unknown>

				// With minimum immediate (-256)

				ld1rob { z0.b }, p1/z, [x2, #-256]
				// CHECK-INST: ld1rob { z0.b }, p1/z, [x2, #-256]
				// CHECK-ENCODING: [0x40,0x24,0x28,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 28 a4 <unknown>

				ld1roh { z0.h }, p1/z, [x2, #-256]
				// CHECK-INST: ld1roh { z0.h }, p1/z, [x2, #-256]
				// CHECK-ENCODING: [0x40,0x24,0xa8,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 a8 a4 <unknown>

				ld1row { z0.s }, p1/z, [x2, #-256]
				// CHECK-INST: ld1row { z0.s }, p1/z, [x2, #-256]
				// CHECK-ENCODING: [0x40,0x24,0x28,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 28 a5 <unknown>

				ld1rod { z0.d }, p1/z, [x2, #-256]
				// CHECK-INST: ld1rod { z0.d }, p1/z, [x2, #-256]
				// CHECK-ENCODING: [0x40,0x24,0xa8,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 a8 a5 <unknown>

				// Aliases with a plain (non-list) first operand, and omitted offset.

				ld1rob z0.b, p1/z, [x2]
				// CHECK-INST: ld1rob { z0.b }, p1/z, [x2]
				// CHECK-ENCODING: [0x40,0x24,0x20,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 20 a4 <unknown>

				ld1roh z0.h, p1/z, [x2]
				// CHECK-INST: ld1roh { z0.h }, p1/z, [x2]
				// CHECK-ENCODING: [0x40,0x24,0xa0,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 a0 a4 <unknown>

				ld1row z0.s, p1/z, [x2]
				// CHECK-INST: ld1row { z0.s }, p1/z, [x2]
				// CHECK-ENCODING: [0x40,0x24,0x20,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 20 a5 <unknown>

				ld1rod z0.d, p1/z, [x2]
				// CHECK-INST: ld1rod { z0.d }, p1/z, [x2]
				// CHECK-ENCODING: [0x40,0x24,0xa0,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 24 a0 a5 <unknown>


				// --------------------------------------------------------------------------//
				// LD1RO (SVE, scalar plus scalar)

				ld1rob { z0.b }, p1/z, [x2, x3, lsl #0]
				// CHECK-INST: ld1rob { z0.b }, p1/z, [x2, x3]
				// CHECK-ENCODING: [0x40,0x04,0x23,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 04 23 a4 <unknown>

				ld1roh { z0.h }, p1/z, [x2, x3, lsl #1]
				// CHECK-INST: ld1roh { z0.h }, p1/z, [x2, x3, lsl #1]
				// CHECK-ENCODING: [0x40,0x04,0xa3,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 04 a3 a4 <unknown>

				ld1row { z0.s }, p1/z, [x2, x3, lsl #2]
				// CHECK-INST: ld1row { z0.s }, p1/z, [x2, x3, lsl #2]
				// CHECK-ENCODING: [0x40,0x04,0x23,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 04 23 a5 <unknown>

				ld1rod { z0.d }, p1/z, [x2, x3, lsl #3]
				// CHECK-INST: ld1rod { z0.d }, p1/z, [x2, x3, lsl #3]
				// CHECK-ENCODING: [0x40,0x04,0xa3,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 04 a3 a5 <unknown>

				// Aliases with a plain (non-list) first operand, and omitted shift for the
				// byte variant.

				ld1rob z0.b, p1/z, [x2, x3]
				// CHECK-INST: ld1rob { z0.b }, p1/z, [x2, x3]
				// CHECK-ENCODING: [0x40,0x04,0x23,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 04 23 a4 <unknown>

				ld1roh z0.h, p1/z, [x2, x3, lsl #1]
				// CHECK-INST: ld1roh { z0.h }, p1/z, [x2, x3, lsl #1]
				// CHECK-ENCODING: [0x40,0x04,0xa3,0xa4]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 04 a3 a4 <unknown>

				ld1row z0.s, p1/z, [x2, x3, lsl #2]
				// CHECK-INST: ld1row { z0.s }, p1/z, [x2, x3, lsl #2]
				// CHECK-ENCODING: [0x40,0x04,0x23,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 04 23 a5 <unknown>

				ld1rod z0.d, p1/z, [x2, x3, lsl #3]
				// CHECK-INST: ld1rod { z0.d }, p1/z, [x2, x3, lsl #3]
				// CHECK-ENCODING: [0x40,0x04,0xa3,0xa5]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 40 04 a3 a5 <unknown>


				// --------------------------------------------------------------------------//
				// ZIP1, ZIP2 (SVE, 128-bit element)

				zip1 z0.q, z1.q, z2.q
				// CHECK-INST: zip1 z0.q, z1.q, z2.q
				// CHECK-ENCODING: [0x20,0x00,0xa2,0x05]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 20 00 a2 05 <unknown>

				zip2 z0.q, z1.q, z2.q
				// CHECK-INST: zip2 z0.q, z1.q, z2.q
				// CHECK-ENCODING: [0x20,0x04,0xa2,0x05]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 20 04 a2 05 <unknown>


				// --------------------------------------------------------------------------//
				// TRN1, TRN2 (SVE, 128-bit element)

				trn1 z0.q, z1.q, z2.q
				// CHECK-INST: trn1 z0.q, z1.q, z2.q
				// CHECK-ENCODING: [0x20,0x18,0xa2,0x05]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 20 18 a2 05 <unknown>

				trn2 z0.q, z1.q, z2.q
				// CHECK-INST: trn2 z0.q, z1.q, z2.q
				// CHECK-ENCODING: [0x20,0x1c,0xa2,0x05]
				// CHECK-ERROR: instruction requires: f64mm
				// CHECK-UNKNOWN: 20 1c a2 05 <unknown>

llvm/test/MC/AArch64/SVE/matrix-multiply-int8-diagnostics.s

This file was added.

				// RUN: not llvm-mc -triple=aarch64 -show-encoding -mattr=+sve,+i8mm 2>&1 < %s \| FileCheck %s

				// --------------------------------------------------------------------------//
				// SMMLA, UMMLA, USMMLA (SVE)

				// Invalid element size

				ummla z0.h, z1.b, z2.b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width
				ummla z0.s, z1.h, z2.b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width
				ummla z0.s, z1.b, z2.d
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width

				// Negative tests for instructions that are incompatible with movprfx

				movprfx z0.d, p0/z, z7.d
				ummla z0.s, z1.b, z2.b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: instruction is unpredictable when following a predicated movprfx, suggest using unpredicated movprfx
				movprfx z0.d, p0/z, z7.d
				smmla z0.s, z1.b, z2.b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: instruction is unpredictable when following a predicated movprfx, suggest using unpredicated movprfx
				movprfx z0.d, p0/z, z7.d
				usmmla z0.s, z1.b, z2.b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: instruction is unpredictable when following a predicated movprfx, suggest using unpredicated movprfx


				// --------------------------------------------------------------------------//
				// USDOT (SVE, vectors)

				// Invalid element size

				usdot z0.d, z1.b, z2.b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width
				usdot z0.s, z1.s, z2.b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width
				usdot z0.s, z1.b, z2.h
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: Invalid restricted vector register, expected z0.b..z7.b

				// Negative tests for instructions that are incompatible with movprfx

				movprfx z0.d, p0/z, z7.d
				usdot z0.s, z1.b, z2.b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: instruction is unpredictable when following a predicated movprfx, suggest using unpredicated movprfx


				// --------------------------------------------------------------------------//
				// USDOT, SUDOT (SVE, indexed)

				// Invalid element size

				usdot z0.h, z1.b, z2.b[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width
				sudot z0.s, z1.h, z2.b[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid element width
				usdot z0.s, z1.b, z2.s[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: Invalid restricted vector register, expected z0.b..z7.b

				// Invalid restricted register for indexed vector.
				usdot z0.s, z1.b, z9.b[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				sudot z0.s, z1.b, z9.b[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: Invalid restricted vector register, expected z0.b..z7.b

				// Invalid element index
				usdot z0.s, z1.b, z2.b[4]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: vector lane must be an integer in range [0, 3].
				sudot z0.s, z1.b, z2.b[4]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: vector lane must be an integer in range [0, 3].

				// Negative tests for instructions that are incompatible with movprfx

				movprfx z0.d, p0/z, z7.d
				usdot z0.s, z1.b, z2.b[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: instruction is unpredictable when following a predicated movprfx, suggest using unpredicated movprfx
				movprfx z0.d, p0/z, z7.d
				sudot z0.s, z1.b, z2.b[3]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: instruction is unpredictable when following a predicated movprfx, suggest using unpredicated movprfx

llvm/test/MC/AArch64/SVE/matrix-multiply-int8.s

This file was added.

				// RUN: llvm-mc -triple=aarch64 -show-encoding -mattr=+sve,+i8mm < %s \
				// RUN: \| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
				// RUN: not llvm-mc -triple=aarch64 -show-encoding -mattr=+sve < %s 2>&1 \
				// RUN: \| FileCheck %s --check-prefix=CHECK-ERROR
				// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+sve,+i8mm < %s \
				// RUN: \| llvm-objdump -d --mattr=+sve,+i8mm - \| FileCheck %s --check-prefix=CHECK-INST
				// RUN: llvm-mc -triple=aarch64 -filetype=obj -mattr=+sve,+i8mm < %s \
				// RUN: \| llvm-objdump -d - \| FileCheck %s --check-prefix=CHECK-UNKNOWN


				// --------------------------------------------------------------------------//
				// SMMLA, UMMLA, USMMLA (SVE)

				ummla z0.s, z1.b, z2.b
				// CHECK-INST: ummla z0.s, z1.b, z2.b
				// CHECK-ENCODING: [0x20,0x98,0xc2,0x45]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 98 c2 45 <unknown>

				smmla z0.s, z1.b, z2.b
				// CHECK-INST: smmla z0.s, z1.b, z2.b
				// CHECK-ENCODING: [0x20,0x98,0x02,0x45]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 98 02 45 <unknown>

				usmmla z0.s, z1.b, z2.b
				// CHECK-INST: usmmla z0.s, z1.b, z2.b
				// CHECK-ENCODING: [0x20,0x98,0x82,0x45]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 98 82 45 <unknown>


				// Test compatibility with MOVPRFX instruction.

				movprfx z0, z7
				// CHECK-INST: movprfx z0, z7
				// CHECK-ENCODING: [0xe0,0xbc,0x20,0x04]
				// CHECK-UNKNOWN: e0 bc 20 04 <unknown>

				ummla z0.s, z1.b, z2.b
				// CHECK-INST: ummla z0.s, z1.b, z2.b
				// CHECK-ENCODING: [0x20,0x98,0xc2,0x45]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 98 c2 45 <unknown>

				movprfx z0, z7
				// CHECK-INST: movprfx z0, z7
				// CHECK-ENCODING: [0xe0,0xbc,0x20,0x04]
				// CHECK-UNKNOWN: e0 bc 20 04 <unknown>

				smmla z0.s, z1.b, z2.b
				// CHECK-INST: smmla z0.s, z1.b, z2.b
				// CHECK-ENCODING: [0x20,0x98,0x02,0x45]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 98 02 45 <unknown>

				movprfx z0, z7
				// CHECK-INST: movprfx z0, z7
				// CHECK-ENCODING: [0xe0,0xbc,0x20,0x04]
				// CHECK-UNKNOWN: e0 bc 20 04 <unknown>

				usmmla z0.s, z1.b, z2.b
				// CHECK-INST: usmmla z0.s, z1.b, z2.b
				// CHECK-ENCODING: [0x20,0x98,0x82,0x45]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 98 82 45 <unknown>


				// --------------------------------------------------------------------------//
				// USDOT (SVE, vectors)

				usdot z0.s, z1.b, z2.b
				// CHECK-INST: usdot z0.s, z1.b, z2.b
				// CHECK-ENCODING: [0x20,0x78,0x82,0x44]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 78 82 44 <unknown>

				// Test compatibility with MOVPRFX instruction.

				movprfx z0, z7
				// CHECK-INST: movprfx z0, z7
				// CHECK-ENCODING: [0xe0,0xbc,0x20,0x04]
				// CHECK-UNKNOWN: e0 bc 20 04 <unknown>

				usdot z0.s, z1.b, z2.b
				// CHECK-INST: usdot z0.s, z1.b, z2.b
				// CHECK-ENCODING: [0x20,0x78,0x82,0x44]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 78 82 44 <unknown>


				// --------------------------------------------------------------------------//
				// USDOT, SUDOT (SVE, indexed)

				usdot z0.s, z1.b, z2.b[0]
				// CHECK-INST: usdot z0.s, z1.b, z2.b[0]
				// CHECK-ENCODING: [0x20,0x18,0xa2,0x44]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 18 a2 44 <unknown>

				sudot z0.s, z1.b, z2.b[3]
				// CHECK-INST: sudot z0.s, z1.b, z2.b[3]
				// CHECK-ENCODING: [0x20,0x1c,0xba,0x44]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 1c ba 44 <unknown>

				// Test compatibility with MOVPRFX instruction.

				movprfx z0, z7
				// CHECK-INST: movprfx z0, z7
				// CHECK-ENCODING: [0xe0,0xbc,0x20,0x04]
				// CHECK-UNKNOWN: e0 bc 20 04 <unknown>

				usdot z0.s, z1.b, z2.b[0]
				// CHECK-INST: usdot z0.s, z1.b, z2.b[0]
				// CHECK-ENCODING: [0x20,0x18,0xa2,0x44]
				// CHECK-ERROR: instruction requires: i8mm
				// CHECK-UNKNOWN: 20 18 a2 44 <unknown>

llvm/test/MC/AArch64/armv8.6a-simd-matmul-error.s

This file was added.

				// RUN: not llvm-mc -triple aarch64 -show-encoding -mattr=+i8mm < %s 2>&1 \| FileCheck %s

				// No interesting edge cases for [US]MMLA, except for the fact that the data
				// types are fixed (no 64-bit version), and USMMLA exists, but SUMMLA does not.
				smmla v1.2s, v16.8b, v31.8b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				summla v1.4s, v16.16b, v31.16b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: unrecognized instruction mnemonic, did you mean: smmla, ummla, usmmla?

				// USDOT (vector) has two valid data type combinations, others are rejected.
				usdot v3.4s, v15.8b, v30.8b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				usdot v3.2s, v15.16b, v30.16b
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction

				// For USDOT and SUDOT (indexed), the index is in range [0,3] (regardless of data types)
				usdot v31.2s, v1.8b, v2.4b[4]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: vector lane must be an integer in range [0, 3].
				usdot v31.4s, v1.16b, v2.4b[4]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: vector lane must be an integer in range [0, 3].
				sudot v31.2s, v1.8b, v2.4b[4]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: vector lane must be an integer in range [0, 3].
				sudot v31.4s, v1.16b, v2.4b[4]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: vector lane must be an integer in range [0, 3].

				// The arrangement specifiers of the first two operands muct match.
				usdot v31.4s, v1.8b, v2.4b[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				usdot v31.2s, v1.16b, v2.4b[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				sudot v31.4s, v1.8b, v2.4b[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				sudot v31.2s, v1.16b, v2.4b[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction

llvm/test/MC/AArch64/armv8.6a-simd-matmul.s

This file was added.

				// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+i8mm < %s \| FileCheck %s
				// RUN: llvm-mc -triple aarch64 -show-encoding -mattr=+v8.6a < %s \| FileCheck %s
				// RUN: not llvm-mc -triple aarch64 -show-encoding -mattr=+v8.6a-i8mm < %s 2>&1 \| FileCheck %s --check-prefix=NOMATMUL

				smmla v1.4s, v16.16b, v31.16b
				ummla v1.4s, v16.16b, v31.16b
				usmmla v1.4s, v16.16b, v31.16b
				// CHECK: smmla v1.4s, v16.16b, v31.16b // encoding: [0x01,0xa6,0x9f,0x4e]
				// CHECK: ummla v1.4s, v16.16b, v31.16b // encoding: [0x01,0xa6,0x9f,0x6e]
				// CHECK: usmmla v1.4s, v16.16b, v31.16b // encoding: [0x01,0xae,0x9f,0x4e]
				// NOMATMUL: instruction requires: i8mm
				// NOMATMUL-NEXT: smmla v1.4s, v16.16b, v31.16b
				// NOMATMUL: instruction requires: i8mm
				// NOMATMUL-NEXT: ummla v1.4s, v16.16b, v31.16b
				// NOMATMUL: instruction requires: i8mm
				// NOMATMUL-NEXT: usmmla v1.4s, v16.16b, v31.16b

				usdot v3.2s, v15.8b, v30.8b
				usdot v3.4s, v15.16b, v30.16b
				// CHECK: usdot v3.2s, v15.8b, v30.8b // encoding: [0xe3,0x9d,0x9e,0x0e]
				// CHECK: usdot v3.4s, v15.16b, v30.16b // encoding: [0xe3,0x9d,0x9e,0x4e]
				// NOMATMUL: instruction requires: i8mm
				// NOMATMUL-NEXT: usdot v3.2s, v15.8b, v30.8b
				// NOMATMUL: instruction requires: i8mm
				// NOMATMUL-NEXT: usdot v3.4s, v15.16b, v30.16b

				usdot v31.2s, v1.8b, v2.4b[3]
				usdot v31.4s, v1.16b, v2.4b[3]
				// CHECK: usdot v31.2s, v1.8b, v2.4b[3] // encoding: [0x3f,0xf8,0xa2,0x0f]
				// CHECK: usdot v31.4s, v1.16b, v2.4b[3] // encoding: [0x3f,0xf8,0xa2,0x4f]
				// NOMATMUL: instruction requires: i8mm
				// NOMATMUL-NEXT: usdot v31.2s, v1.8b, v2.4b[3]
				// NOMATMUL: instruction requires: i8mm
				// NOMATMUL-NEXT: usdot v31.4s, v1.16b, v2.4b[3]

				sudot v31.2s, v1.8b, v2.4b[3]
				sudot v31.4s, v1.16b, v2.4b[3]
				// CHECK: sudot v31.2s, v1.8b, v2.4b[3] // encoding: [0x3f,0xf8,0x22,0x0f]
				// CHECK: sudot v31.4s, v1.16b, v2.4b[3] // encoding: [0x3f,0xf8,0x22,0x4f]
				// NOMATMUL: instruction requires: i8mm
				// NOMATMUL-NEXT: sudot v31.2s, v1.8b, v2.4b[3]
				// NOMATMUL: instruction requires: i8mm
				// NOMATMUL-NEXT: sudot v31.4s, v1.16b, v2.4b[3]

llvm/test/MC/ARM/armv8.6a-matmul-error.s

This file was added.

				// RUN: not llvm-mc -triple armv8a -show-encoding -mattr=+i8mm < %s 2>&1 \| FileCheck %s
				// RUN: not llvm-mc -triple thumbv8a -show-encoding -mattr=+i8mm < %s 2>&1 \| FileCheck %s


				// VSMMLA, VUMMLA, VUSMMLA

				// Data type specifier must match instruction

				vsmmla.u8 q0, q1, q2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				// CHECK-NEXT: vsmmla.u8 q0, q1, q2
				// CHECK-NEXT: {{^ \^}}

				vummla.s8 q0, q1, q2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				// CHECK-NEXT: vummla.s8 q0, q1, q2
				// CHECK-NEXT: {{^ \^}}

				vusmmla.u8 q0, q1, q2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				// CHECK-NEXT: vusmmla.u8 q0, q1, q2
				// CHECK-NEXT: {{^ \^}}


				// Incorrect register type

				vsmmla.s8 d0, q1, q2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: operand must be a register in range [q0, q15]
				// CHECK-NEXT: vsmmla.s8 d0, q1, q2
				// CHECK-NEXT: {{^ \^}}

				vummla.u8 q0, d1, q2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: operand must be a register in range [q0, q15]
				// CHECK-NEXT: vummla.u8 q0, d1, q2
				// CHECK-NEXT: {{^ \^}}

				vusmmla.s8 q0, q1, d2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: operand must be a register in range [q0, q15]
				// CHECK-NEXT: vusmmla.s8 q0, q1, d2
				// CHECK-NEXT: {{^ \^}}


				// VUSDOT (vector)

				// Data type specifier must match instruction

				vusdot.u8 q0, q1, q2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				// CHECK-NEXT: vusdot.u8 q0, q1, q2
				// CHECK-NEXT: {{^ \^}}

				// Mis-matched register types

				vusdot.s8 q0, d1, d2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: operand must be a register in range [d0, d31]
				vusdot.s8 d0, q1, d2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: operand must be a register in range [d0, d31]
				vusdot.s8 d0, d1, q2
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: operand must be a register in range [d0, d31]


				// VUSDOT, VSUDOT (by scalar)

				// Data type specifier must match instruction

				vusdot.u8 d0, d1, d2[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				// CHECK-NEXT: vusdot.u8 d0, d1, d2[0]
				// CHECK-NEXT: {{^ \^}}

				vsudot.s8 d0, d1, d2[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				// CHECK-NEXT: vsudot.s8 d0, d1, d2[0]
				// CHECK-NEXT: {{^ \^}}

				// Incorrect register types

				vusdot.s8 q0, d1, d2[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid instruction, any one of the following would fix this:
				// CHECK-NEXT: vusdot.s8 q0, d1, d2[0]
				// CHECK: [[@LINE-3]]:{{[0-9]+}}: note: operand must be a register in range [d0, d31]
				// CHECK-NEXT: vusdot.s8 q0, d1, d2[0]
				// CHECK-NEXT: {{^ \^}}
				// CHECK: [[@LINE-6]]:{{[0-9]+}}: note: operand must be a register in range [q0, q15]
				// CHECK-NEXT: vusdot.s8 q0, d1, d2[0]
				// CHECK-NEXT: {{^ \^}}

				vusdot.s8 d0, q1, d2[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid instruction, any one of the following would fix this:
				// CHECK-NEXT: vusdot.s8 d0, q1, d2[0]
				// CHECK: [[@LINE-3]]:{{[0-9]+}}: note: operand must be a register in range [d0, d31]
				// CHECK-NEXT: vusdot.s8 d0, q1, d2[0]
				// CHECK-NEXT: {{^ \^}}
				// CHECK: [[@LINE-6]]:{{[0-9]+}}: note: operand must be a register in range [q0, q15]
				// CHECK-NEXT: vusdot.s8 d0, q1, d2[0]
				// CHECK-NEXT: {{^ \^}}

				vusdot.s8 q0, q1, q2[0]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid instruction, any one of the following would fix this:
				// CHECK-NEXT: vusdot.s8 q0, q1, q2[0]
				// CHECK: [[@LINE-3]]:{{[0-9]+}}: note: operand must be a register in range [d0, d15]
				// CHECK-NEXT: vusdot.s8 q0, q1, q2[0]
				// CHECK-NEXT: {{^ \^}}
				// CHECK: [[@LINE-6]]:{{[0-9]+}}: note: too many operands for instruction
				// CHECK-NEXT: vusdot.s8 q0, q1, q2[0]
				// CHECK-NEXT: {{^ \^}}

				// Out of range lane index

				vusdot.s8 d0, d1, d2[2]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction
				vsudot.u8 q0, q1, d2[2]
				// CHECK: [[@LINE-1]]:{{[0-9]+}}: error: invalid operand for instruction

llvm/test/MC/ARM/armv8.6a-matmul.s

This file was added.

				// RUN: llvm-mc -triple armv8a -show-encoding -mattr=+i8mm < %s \| FileCheck %s --check-prefix=ARM
				// RUN: llvm-mc -triple thumbv8a -show-encoding -mattr=+i8mm < %s \| FileCheck %s --check-prefix=THUMB
				// RUN: not llvm-mc -triple armv8a -show-encoding -mattr=v8.5a < %s 2>&1 \| FileCheck %s --check-prefix=NOMATMUL
				// RUN: not llvm-mc -triple thumbv8a -show-encoding -mattr=v8.5a < %s 2>&1 \| FileCheck %s --check-prefix=NOMATMUL

				vsmmla.s8 q0, q1, q2
				// ARM: vsmmla.s8 q0, q1, q2 @ encoding: [0x44,0x0c,0x22,0xfc]
				// THUMB: vsmmla.s8 q0, q1, q2 @ encoding: [0x22,0xfc,0x44,0x0c]
				// NOMATMUL: [[@LINE-3]]:{{[0-9]+}}: error: instruction requires: 8-bit integer matrix multiply

				vummla.u8 q0, q1, q2
				// ARM: vummla.u8 q0, q1, q2 @ encoding: [0x54,0x0c,0x22,0xfc]
				// THUMB: vummla.u8 q0, q1, q2 @ encoding: [0x22,0xfc,0x54,0x0c]
				// NOMATMUL: [[@LINE-3]]:{{[0-9]+}}: error: instruction requires: 8-bit integer matrix multiply

				vusmmla.s8 q0, q1, q2
				// ARM: vusmmla.s8 q0, q1, q2 @ encoding: [0x44,0x0c,0xa2,0xfc]
				// THUMB: vusmmla.s8 q0, q1, q2 @ encoding: [0xa2,0xfc,0x44,0x0c]
				// NOMATMUL: [[@LINE-3]]:{{[0-9]+}}: error: instruction requires: 8-bit integer matrix multiply

				vusdot.s8 d0, d1, d2
				// ARM: vusdot.s8 d0, d1, d2 @ encoding: [0x02,0x0d,0xa1,0xfc]
				// THUMB: vusdot.s8 d0, d1, d2 @ encoding: [0xa1,0xfc,0x02,0x0d]
				// NOMATMUL: [[@LINE-3]]:{{[0-9]+}}: error: instruction requires: 8-bit integer matrix multiply

				vusdot.s8 q0, q1, q2
				// ARM: vusdot.s8 q0, q1, q2 @ encoding: [0x44,0x0d,0xa2,0xfc]
				// THUMB: vusdot.s8 q0, q1, q2 @ encoding: [0xa2,0xfc,0x44,0x0d]
				// NOMATMUL: [[@LINE-3]]:{{[0-9]+}}: error: instruction requires: 8-bit integer matrix multiply

				vusdot.s8 d0, d1, d2[0]
				// ARM: vusdot.s8 d0, d1, d2[0] @ encoding: [0x02,0x0d,0x81,0xfe]
				// THUMB: vusdot.s8 d0, d1, d2[0] @ encoding: [0x81,0xfe,0x02,0x0d]
				// NOMATMUL: [[@LINE-3]]:{{[0-9]+}}: error: instruction requires: 8-bit integer matrix multiply

				vsudot.u8 d0, d1, d2[1]
				// ARM: vsudot.u8 d0, d1, d2[1] @ encoding: [0x32,0x0d,0x81,0xfe]
				// THUMB: vsudot.u8 d0, d1, d2[1] @ encoding: [0x81,0xfe,0x32,0x0d]
				// NOMATMUL: [[@LINE-3]]:{{[0-9]+}}: error: instruction requires: 8-bit integer matrix multiply

				vusdot.s8 q0, q1, d2[0]
				// ARM: vusdot.s8 q0, q1, d2[0] @ encoding: [0x42,0x0d,0x82,0xfe]
				// THUMB: vusdot.s8 q0, q1, d2[0] @ encoding: [0x82,0xfe,0x42,0x0d]
				// NOMATMUL: [[@LINE-3]]:{{[0-9]+}}: error: instruction requires: 8-bit integer matrix multiply

				vsudot.u8 q0, q1, d2[1]
				// ARM: vsudot.u8 q0, q1, d2[1] @ encoding: [0x72,0x0d,0x82,0xfe]
				// THUMB: vsudot.u8 q0, q1, d2[1] @ encoding: [0x82,0xfe,0x72,0x0d]
				// NOMATMUL: [[@LINE-3]]:{{[0-9]+}}: error: instruction requires: 8-bit integer matrix multiply

llvm/test/MC/Disassembler/AArch64/armv8.6a-simd-matmul.txt

This file was added.

				# RUN: llvm-mc -triple=aarch64 -mattr=+i8mm -disassemble < %s \| FileCheck %s
				# RUN: llvm-mc -triple=aarch64 -mattr=+v8.6a -disassemble < %s \| FileCheck %s
				# RUN: not llvm-mc -triple=aarch64 -mattr=+v8.5a -disassemble < %s 2>&1 \| FileCheck %s --check-prefix=NOI8MM

				[0x01,0xa6,0x9f,0x4e]
				[0x01,0xa6,0x9f,0x6e]
				[0x01,0xae,0x9f,0x4e]
				# CHECK: smmla v1.4s, v16.16b, v31.16b
				# CHECK: ummla v1.4s, v16.16b, v31.16b
				# CHECK: usmmla v1.4s, v16.16b, v31.16b
				# NOI8MM: [[@LINE-6]]:{{[0-9]+}}: warning: invalid instruction encoding
				# NOI8MM: [[@LINE-6]]:{{[0-9]+}}: warning: invalid instruction encoding
				# NOI8MM: [[@LINE-6]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0xe3,0x9d,0x9e,0x0e]
				[0xe3,0x9d,0x9e,0x4e]
				# CHECK: usdot v3.2s, v15.8b, v30.8b
				# CHECK: usdot v3.4s, v15.16b, v30.16b
				# NOI8MM: [[@LINE-4]]:{{[0-9]+}}: warning: invalid instruction encoding
				# NOI8MM: [[@LINE-4]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x3f,0xf8,0xa2,0x0f]
				[0x3f,0xf8,0xa2,0x4f]
				# CHECK: usdot v31.2s, v1.8b, v2.4b[3]
				# CHECK: usdot v31.4s, v1.16b, v2.4b[3]
				# NOI8MM: [[@LINE-4]]:{{[0-9]+}}: warning: invalid instruction encoding
				# NOI8MM: [[@LINE-4]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x3f,0xf8,0x22,0x0f]
				[0x3f,0xf8,0x22,0x4f]
				# CHECK: sudot v31.2s, v1.8b, v2.4b[3]
				# CHECK: sudot v31.4s, v1.16b, v2.4b[3]
				# NOI8MM: [[@LINE-4]]:{{[0-9]+}}: warning: invalid instruction encoding
				# NOI8MM: [[@LINE-4]]:{{[0-9]+}}: warning: invalid instruction encoding

llvm/test/MC/Disassembler/ARM/armv8.6a-matmul-arm.txt

This file was added.

				# RUN: llvm-mc -triple=armv8 -mattr=+i8mm -disassemble < %s \| FileCheck %s
				# RUN: not llvm-mc -triple=armv8 -mattr=+v8.4a -disassemble < %s 2>&1 \| FileCheck %s --check-prefix=NOMATMUL

				[0x44,0x0c,0x22,0xfc]
				# CHECK: vsmmla.s8 q0, q1, q2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x54,0x0c,0x22,0xfc]
				# CHECK: vummla.u8 q0, q1, q2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x44,0x0c,0xa2,0xfc]
				# CHECK: vusmmla.s8 q0, q1, q2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x02,0x0d,0xa1,0xfc]
				# CHECK: vusdot.s8 d0, d1, d2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x44,0x0d,0xa2,0xfc]
				# CHECK: vusdot.s8 q0, q1, q2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x02,0x0d,0x81,0xfe]
				# CHECK: vusdot.s8 d0, d1, d2[0]
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x32,0x0d,0x81,0xfe]
				# CHECK: vsudot.u8 d0, d1, d2[1]
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x42,0x0d,0x82,0xfe]
				# CHECK: vusdot.s8 q0, q1, d2[0]
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x72,0x0d,0x82,0xfe]
				# CHECK: vsudot.u8 q0, q1, d2[1]
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

llvm/test/MC/Disassembler/ARM/armv8.6a-matmul-thumb.txt

This file was added.

				# RUN: llvm-mc -triple=thumbv8a -mattr=+i8mm -disassemble < %s \| FileCheck %s
				# RUN: not llvm-mc -triple=thumbv8a -mattr=+v8.4a -disassemble < %s 2>&1 \| FileCheck %s --check-prefix=NOMATMUL

				[0x22,0xfc,0x44,0x0c]
				# CHECK: vsmmla.s8 q0, q1, q2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x22,0xfc,0x54,0x0c]
				# CHECK: vummla.u8 q0, q1, q2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0xa2,0xfc,0x44,0x0c]
				# CHECK: vusmmla.s8 q0, q1, q2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0xa1,0xfc,0x02,0x0d]
				# CHECK: vusdot.s8 d0, d1, d2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0xa2,0xfc,0x44,0x0d]
				# CHECK: vusdot.s8 q0, q1, q2
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x81,0xfe,0x02,0x0d]
				# CHECK: vusdot.s8 d0, d1, d2[0]
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x81,0xfe,0x32,0x0d]
				# CHECK: vsudot.u8 d0, d1, d2[1]
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x82,0xfe,0x42,0x0d]
				# CHECK: vusdot.s8 q0, q1, d2[0]
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

				[0x82,0xfe,0x72,0x0d]
				# CHECK: vsudot.u8 q0, q1, d2[1]
				# NOMATMUL: :[[@LINE-2]]:{{[0-9]+}}: warning: invalid instruction encoding

llvm/unittests/Support/TargetParserTest.cpp

Show First 20 Lines • Show All 630 Lines • ▼ Show 20 Lines	const char *ArchExt[][4] = {{"crc", "nocrc", "+crc", "-crc"},
{"ras", "noras", "+ras", "-ras"},		{"ras", "noras", "+ras", "-ras"},
{"dotprod", "nodotprod", "+dotprod", "-dotprod"},		{"dotprod", "nodotprod", "+dotprod", "-dotprod"},
{"os", "noos", nullptr, nullptr},		{"os", "noos", nullptr, nullptr},
{"iwmmxt", "noiwmmxt", nullptr, nullptr},		{"iwmmxt", "noiwmmxt", nullptr, nullptr},
{"iwmmxt2", "noiwmmxt2", nullptr, nullptr},		{"iwmmxt2", "noiwmmxt2", nullptr, nullptr},
{"maverick", "maverick", nullptr, nullptr},		{"maverick", "maverick", nullptr, nullptr},
{"xscale", "noxscale", nullptr, nullptr},		{"xscale", "noxscale", nullptr, nullptr},
{"sb", "nosb", "+sb", "-sb"},		{"sb", "nosb", "+sb", "-sb"},
		{"i8mm", "noi8mm", "+i8mm", "-i8mm"},
{"mve", "nomve", "+mve", "-mve"},		{"mve", "nomve", "+mve", "-mve"},
{"mve.fp", "nomve.fp", "+mve.fp", "-mve.fp"}};		{"mve.fp", "nomve.fp", "+mve.fp", "-mve.fp"}};

for (unsigned i = 0; i < array_lengthof(ArchExt); i++) {		for (unsigned i = 0; i < array_lengthof(ArchExt); i++) {
EXPECT_EQ(StringRef(ArchExt[i][2]), ARM::getArchExtFeature(ArchExt[i][0]));		EXPECT_EQ(StringRef(ArchExt[i][2]), ARM::getArchExtFeature(ArchExt[i][0]));
EXPECT_EQ(StringRef(ArchExt[i][3]), ARM::getArchExtFeature(ArchExt[i][1]));		EXPECT_EQ(StringRef(ArchExt[i][3]), ARM::getArchExtFeature(ArchExt[i][1]));
}		}
}		}
▲ Show 20 Lines • Show All 578 Lines • ▼ Show 20 Lines	const char *ArchExt[][4] = {{"crc", "nocrc", "+crc", "-crc"},
"+sve2-bitperm", "-sve2-bitperm"},		"+sve2-bitperm", "-sve2-bitperm"},
{"dotprod", "nodotprod", "+dotprod", "-dotprod"},		{"dotprod", "nodotprod", "+dotprod", "-dotprod"},
{"rcpc", "norcpc", "+rcpc", "-rcpc" },		{"rcpc", "norcpc", "+rcpc", "-rcpc" },
{"rng", "norng", "+rand", "-rand"},		{"rng", "norng", "+rand", "-rand"},
{"memtag", "nomemtag", "+mte", "-mte"},		{"memtag", "nomemtag", "+mte", "-mte"},
{"tme", "notme", "+tme", "-tme"},		{"tme", "notme", "+tme", "-tme"},
{"ssbs", "nossbs", "+ssbs", "-ssbs"},		{"ssbs", "nossbs", "+ssbs", "-ssbs"},
{"sb", "nosb", "+sb", "-sb"},		{"sb", "nosb", "+sb", "-sb"},
{"predres", "nopredres", "+predres", "-predres"}		{"predres", "nopredres", "+predres", "-predres"},
		{"i8mm", "noi8mm", "+i8mm", "-i8mm"},
		{"f32mm", "nof32mm", "+f32mm", "-f32mm"},
		{"f64mm", "nof64mm", "+f64mm", "-f64mm"},
};		};

for (unsigned i = 0; i < array_lengthof(ArchExt); i++) {		for (unsigned i = 0; i < array_lengthof(ArchExt); i++) {
EXPECT_EQ(StringRef(ArchExt[i][2]),		EXPECT_EQ(StringRef(ArchExt[i][2]),
AArch64::getArchExtFeature(ArchExt[i][0]));		AArch64::getArchExtFeature(ArchExt[i][0]));
EXPECT_EQ(StringRef(ArchExt[i][3]),		EXPECT_EQ(StringRef(ArchExt[i][3]),
AArch64::getArchExtFeature(ArchExt[i][1]));		AArch64::getArchExtFeature(ArchExt[i][1]));
}		}
}		}
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] [ARM]: Armv8.6-a Matrix Mul Asm and Intrinsics SupportAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 256032

clang/include/clang/Basic/arm_neon.td

clang/lib/Basic/Targets/AArch64.h

clang/lib/Basic/Targets/AArch64.cpp

clang/lib/Basic/Targets/ARM.h

clang/lib/Basic/Targets/ARM.cpp

clang/lib/CodeGen/CGBuiltin.cpp

clang/lib/Driver/ToolChains/Arch/AArch64.cpp

clang/test/CodeGen/aarch64-matmul.cpp

clang/test/CodeGen/aarch64-v8.6a-neon-intrinsics.c

clang/test/CodeGen/arm-v8.6a-neon-intrinsics.c

clang/test/Driver/aarch64-cpus.c

clang/test/Driver/arm-matrix-multiply.c

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/include/llvm/IR/IntrinsicsARM.td

llvm/include/llvm/Support/AArch64TargetParser.h

llvm/include/llvm/Support/AArch64TargetParser.def

llvm/include/llvm/Support/ARMTargetParser.h

llvm/include/llvm/Support/ARMTargetParser.def

llvm/lib/Target/AArch64/AArch64.td

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/lib/Target/AArch64/AArch64Subtarget.h

llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/lib/Target/ARM/ARM.td

llvm/lib/Target/ARM/ARMInstrNEON.td

llvm/lib/Target/ARM/ARMPredicates.td

llvm/lib/Target/ARM/ARMSubtarget.h

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp

llvm/test/CodeGen/AArch64/aarch64-matmul.ll

llvm/test/CodeGen/ARM/arm-matmul.ll

llvm/test/MC/AArch64/SVE/matrix-multiply-fp-diagnostics.s

llvm/test/MC/AArch64/SVE/matrix-multiply-fp32.s

llvm/test/MC/AArch64/SVE/matrix-multiply-fp64.s

llvm/test/MC/AArch64/SVE/matrix-multiply-int8-diagnostics.s

llvm/test/MC/AArch64/SVE/matrix-multiply-int8.s

llvm/test/MC/AArch64/armv8.6a-simd-matmul-error.s

llvm/test/MC/AArch64/armv8.6a-simd-matmul.s

llvm/test/MC/ARM/armv8.6a-matmul-error.s

llvm/test/MC/ARM/armv8.6a-matmul.s

llvm/test/MC/Disassembler/AArch64/armv8.6a-simd-matmul.txt

llvm/test/MC/Disassembler/ARM/armv8.6a-matmul-arm.txt

llvm/test/MC/Disassembler/ARM/armv8.6a-matmul-thumb.txt

llvm/unittests/Support/TargetParserTest.cpp

[PATCH] [ARM]: Armv8.6-a Matrix Mul Asm and Intrinsics Support
AbandonedPublic