This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implement FP16FML intrinsics
ClosedPublic

Authored by bryanpkc on Oct 23 2018, 9:41 PM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
bogden
efriedma
t.p.northover

Commits

rG223307b3dc0c: [AArch64] Implement FP16FML intrinsics
rC345344: [AArch64] Implement FP16FML intrinsics
rL345344: [AArch64] Implement FP16FML intrinsics

Summary

Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now).
Add two new type modifiers to NeonEmitter to handle the new prototypes.
Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the
intrinsics with the macro in arm_neon.h.

Based on a patch by Gao Yiling.

Diff Detail

Event Timeline

bryanpkc created this revision.Oct 23 2018, 9:41 PM

Herald added subscribers: cfe-commits, kristof.beyls, javed.absar. · View Herald TranscriptOct 23 2018, 9:41 PM

I think this is reasonable.

This revision is now accepted and ready to land.Oct 24 2018, 10:52 AM

In D53633#1274621, @t.p.northover wrote:

I think this is reasonable.

Thanks Tim. Could you also review D53632, which is the LLVM part of this implementation?

Closed by commit rL345344: [AArch64] Implement FP16FML intrinsics (authored by bryanpkc). · Explain WhyOct 25 2018, 4:50 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptOct 25 2018, 4:50 PM

ab added a subscriber: ab.Feb 14 2019, 4:30 PM

ab added inline comments.

cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12 ↗	(On Diff #171230)	Hey folks, I'm curious: where does the "_u32" suffix come from? Should it be _f16? Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been any release since IHI0073B/IHI0053D.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 4:30 PM

Herald added a subscriber: jdoerfert. · View Herald Transcript

SjoerdMeijer added inline comments.Feb 15 2019, 2:32 AM

cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12 ↗	(On Diff #171230)	Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been any release since IHI0073B/IHI0053D. I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon. where does the "_u32" suffix come from? Should it be _f16? Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to make much sense. Looks like the spec says _u32, and that's also what GCC has implemented. I think we want to update the spec and fix the name before the updated spec is available. Will chase this, and let you know once I know more.

SjoerdMeijer added inline comments.Feb 15 2019, 6:52 AM

cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12 ↗	(On Diff #171230)	An update on this: we should change this to _f32 (because the first suffixes were refering to the ouput type). The ACLE will be updated accordingly, and also GCC will change its current implementation (from _u32 to _f32). Many thanks for raising this issue. Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week.

ab added inline comments.Feb 15 2019, 2:09 PM

cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12 ↗	(On Diff #171230)	I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon. Great, thanks! An update on this: we should change this to _f32 (because the first suffixes were refering to the ouput type). Hmm, I was thinking _f16 based on the vmlal intrinsics: they seem to be named after the multiplication type rather than that of the accumulator/output. Either way seems fine to me though, I'll defer to you folks. The ACLE will be updated accordingly, and also GCC will change its current implementation (from _u32 to _f32). Many thanks for raising this issue. Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week. Sure: D58306 (with _f16 though, let me know what you think of vmlal) Thanks for checking!

FYI: a new ACLE version has been published, please find it here: https://developer.arm.com/architectures/system-architectures/software-standards/acle

The "Neon Intrinsics" section contains these new intrinsics.

Revision Contents

Path

Size

include/

clang/

Basic/

arm_neon.td

27 lines

arm_neon_incl.td

7 lines

lib/

Basic/

Targets/

AArch64.h

1 line

AArch64.cpp

6 lines

CodeGen/

CGBuiltin.cpp

36 lines

test/

CodeGen/

aarch64-neon-fp16fml.c

196 lines

Preprocessor/

aarch64-target-features.c

30 lines

utils/

TableGen/

NeonEmitter.cpp

37 lines

Diff 170811

include/clang/Basic/arm_neon.td

	Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines

	def OP_DOT_LN			def OP_DOT_LN
	: Op<(call "vdot", $p0, $p1,			: Op<(call "vdot", $p0, $p1,
	(bitcast $p1, (splat(bitcast "uint32x2_t", $p2), $p3)))>;			(bitcast $p1, (splat(bitcast "uint32x2_t", $p2), $p3)))>;
	def OP_DOT_LNQ			def OP_DOT_LNQ
	: Op<(call "vdot", $p0, $p1,			: Op<(call "vdot", $p0, $p1,
	(bitcast $p1, (splat(bitcast "uint32x4_t", $p2), $p3)))>;			(bitcast $p1, (splat(bitcast "uint32x4_t", $p2), $p3)))>;

				def OP_FMLAL_LN : Op<(call "vfmlal_low", $p0, $p1,
				(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
				def OP_FMLSL_LN : Op<(call "vfmlsl_low", $p0, $p1,
				(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
				def OP_FMLAL_LN_Hi : Op<(call "vfmlal_high", $p0, $p1,
				(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
				def OP_FMLSL_LN_Hi : Op<(call "vfmlsl_high", $p0, $p1,
				(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Instructions			// Instructions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// E.3.1 Addition			// E.3.1 Addition
	def VADD : IOpInst<"vadd", "ddd",			def VADD : IOpInst<"vadd", "ddd",
	"csilfUcUsUiUlQcQsQiQlQfQUcQUsQUiQUl", OP_ADD>;			"csilfUcUsUiUlQcQsQiQlQfQUcQUsQUiQUl", OP_ADD>;
	▲ Show 20 Lines • Show All 1,418 Lines • ▼ Show 20 Lines
	let ArchGuard = "defined(__ARM_FEATURE_DOTPROD)" in {			let ArchGuard = "defined(__ARM_FEATURE_DOTPROD)" in {
	def DOT : SInst<"vdot", "dd88", "iQiUiQUi">;			def DOT : SInst<"vdot", "dd88", "iQiUiQUi">;
	def DOT_LANE : SOpInst<"vdot_lane", "dd87i", "iUiQiQUi", OP_DOT_LN>;			def DOT_LANE : SOpInst<"vdot_lane", "dd87i", "iUiQiQUi", OP_DOT_LN>;
	}			}
	let ArchGuard = "defined(__ARM_FEATURE_DOTPROD) && defined(__aarch64__)" in {			let ArchGuard = "defined(__ARM_FEATURE_DOTPROD) && defined(__aarch64__)" in {
	// Variants indexing into a 128-bit vector are A64 only.			// Variants indexing into a 128-bit vector are A64 only.
	def UDOT_LANEQ : SOpInst<"vdot_laneq", "dd89i", "iUiQiQUi", OP_DOT_LNQ>;			def UDOT_LANEQ : SOpInst<"vdot_laneq", "dd89i", "iUiQiQUi", OP_DOT_LNQ>;
	}			}

				// v8.2-A FP16 fused multiply-add long instructions.
				let ArchGuard = "defined(__ARM_FEATURE_FP16FML) && defined(__aarch64__)" in {
				def VFMLAL_LOW : SInst<"vfmlal_low", "ffHH", "UiQUi">;
				def VFMLSL_LOW : SInst<"vfmlsl_low", "ffHH", "UiQUi">;
				def VFMLAL_HIGH : SInst<"vfmlal_high", "ffHH", "UiQUi">;
				def VFMLSL_HIGH : SInst<"vfmlsl_high", "ffHH", "UiQUi">;

				def VFMLAL_LANE_LOW : SOpInst<"vfmlal_lane_low", "ffH0i", "UiQUi", OP_FMLAL_LN>;
				def VFMLSL_LANE_LOW : SOpInst<"vfmlsl_lane_low", "ffH0i", "UiQUi", OP_FMLSL_LN>;
				def VFMLAL_LANE_HIGH : SOpInst<"vfmlal_lane_high", "ffH0i", "UiQUi", OP_FMLAL_LN_Hi>;
				def VFMLSL_LANE_HIGH : SOpInst<"vfmlsl_lane_high", "ffH0i", "UiQUi", OP_FMLSL_LN_Hi>;

				def VFMLAL_LANEQ_LOW : SOpInst<"vfmlal_laneq_low", "ffH1i", "UiQUi", OP_FMLAL_LN>;
				def VFMLSL_LANEQ_LOW : SOpInst<"vfmlsl_laneq_low", "ffH1i", "UiQUi", OP_FMLSL_LN>;
				def VFMLAL_LANEQ_HIGH : SOpInst<"vfmlal_laneq_high", "ffH1i", "UiQUi", OP_FMLAL_LN_Hi>;
				def VFMLSL_LANEQ_HIGH : SOpInst<"vfmlsl_laneq_high", "ffH1i", "UiQUi", OP_FMLSL_LN_Hi>;
				}

include/clang/Basic/arm_neon_incl.td

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	// The VAL argument is saved to a temporary so it can be used			// The VAL argument is saved to a temporary so it can be used
	// as an l-value.			// as an l-value.
	def bitcast;			def bitcast;
	// dup - Take a scalar argument and create a vector by duplicating it into			// dup - Take a scalar argument and create a vector by duplicating it into
	// all lanes. The type of the vector is the base type of the intrinsic.			// all lanes. The type of the vector is the base type of the intrinsic.
	// example: (dup $p1) -> "(uint32x2_t) {__p1, __p1}" (assuming the base type			// example: (dup $p1) -> "(uint32x2_t) {__p1, __p1}" (assuming the base type
	// is uint32x2_t).			// is uint32x2_t).
	def dup;			def dup;
				// dup_typed - Take a vector and a scalar argument, and create a new vector of
				// the same type by duplicating the scalar value into all lanes.
				// example: (dup_typed $p1, $p2) -> "(float16x4_t) {__p2, __p2, __p2, __p2}"
				// (assuming __p1 is float16x4_t, and __p2 is a compatible scalar).
				def dup_typed;
	// splat - Take a vector and a lane index, and return a vector of the same type			// splat - Take a vector and a lane index, and return a vector of the same type
	// containing repeated instances of the source vector at the lane index.			// containing repeated instances of the source vector at the lane index.
	// example: (splat $p0, $p1) ->			// example: (splat $p0, $p1) ->
	// "__builtin_shufflevector(__p0, __p0, __p1, __p1, __p1, __p1)"			// "__builtin_shufflevector(__p0, __p0, __p1, __p1, __p1, __p1)"
	// (assuming __p0 has four elements).			// (assuming __p0 has four elements).
	def splat;			def splat;
	// save_temp - Create a temporary (local) variable. The variable takes a name			// save_temp - Create a temporary (local) variable. The variable takes a name
	// based on the zero'th parameter and can be referenced using			// based on the zero'th parameter and can be referenced using
	▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	//			//
	// v: void			// v: void
	// t: best-fit integer (int/poly args)			// t: best-fit integer (int/poly args)
	// x: signed integer (int/float args)			// x: signed integer (int/float args)
	// u: unsigned integer (int/float args)			// u: unsigned integer (int/float args)
	// f: float (int args)			// f: float (int args)
	// F: double (int args)			// F: double (int args)
	// H: half (int args)			// H: half (int args)
				// 0: half (int args), ignore 'Q' size modifier.
				// 1: half (int args), force 'Q' size modifier.
	// d: default			// d: default
	// g: default, ignore 'Q' size modifier.			// g: default, ignore 'Q' size modifier.
	// j: default, force 'Q' size modifier.			// j: default, force 'Q' size modifier.
	// w: double width elements, same num elts			// w: double width elements, same num elts
	// n: double width elements, half num elts			// n: double width elements, half num elts
	// h: half width elements, double num elts			// h: half width elements, double num elts
	// q: half width elements, quad num elts			// q: half width elements, quad num elts
	// e: half width elements, double num elts, unsigned			// e: half width elements, double num elts, unsigned
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

lib/Basic/Targets/AArch64.h

Show All 28 Lines	class LLVM_LIBRARY_VISIBILITY AArch64TargetInfo : public TargetInfo {
enum FPUModeEnum { FPUMode, NeonMode = (1 << 0), SveMode = (1 << 1) };		enum FPUModeEnum { FPUMode, NeonMode = (1 << 0), SveMode = (1 << 1) };

unsigned FPU;		unsigned FPU;
unsigned CRC;		unsigned CRC;
unsigned Crypto;		unsigned Crypto;
unsigned Unaligned;		unsigned Unaligned;
unsigned HasFullFP16;		unsigned HasFullFP16;
unsigned HasDotProd;		unsigned HasDotProd;
		unsigned HasFP16FML;
llvm::AArch64::ArchKind ArchKind;		llvm::AArch64::ArchKind ArchKind;

static const Builtin::Info BuiltinInfo[];		static const Builtin::Info BuiltinInfo[];

std::string ABI;		std::string ABI;

public:		public:
AArch64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts);		AArch64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts);
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

lib/Basic/Targets/AArch64.cpp

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	void AArch64TargetInfo::getTargetDefines(const LangOptions &Opts,
if ((FPU & NeonMode) && HasFullFP16)		if ((FPU & NeonMode) && HasFullFP16)
Builder.defineMacro("__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", "1");		Builder.defineMacro("__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", "1");
if (HasFullFP16)		if (HasFullFP16)
Builder.defineMacro("__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", "1");		Builder.defineMacro("__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", "1");

if (HasDotProd)		if (HasDotProd)
Builder.defineMacro("__ARM_FEATURE_DOTPROD", "1");		Builder.defineMacro("__ARM_FEATURE_DOTPROD", "1");

		if ((FPU & NeonMode) && HasFP16FML)
		Builder.defineMacro("__ARM_FEATURE_FP16FML", "1");

switch (ArchKind) {		switch (ArchKind) {
default:		default:
break;		break;
case llvm::AArch64::ArchKind::ARMV8_1A:		case llvm::AArch64::ArchKind::ARMV8_1A:
getTargetDefinesARMV81A(Opts, Builder);		getTargetDefinesARMV81A(Opts, Builder);
break;		break;
case llvm::AArch64::ArchKind::ARMV8_2A:		case llvm::AArch64::ArchKind::ARMV8_2A:
getTargetDefinesARMV82A(Opts, Builder);		getTargetDefinesARMV82A(Opts, Builder);
Show All 21 Lines
bool AArch64TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,		bool AArch64TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
DiagnosticsEngine &Diags) {		DiagnosticsEngine &Diags) {
FPU = FPUMode;		FPU = FPUMode;
CRC = 0;		CRC = 0;
Crypto = 0;		Crypto = 0;
Unaligned = 1;		Unaligned = 1;
HasFullFP16 = 0;		HasFullFP16 = 0;
HasDotProd = 0;		HasDotProd = 0;
		HasFP16FML = 0;
ArchKind = llvm::AArch64::ArchKind::ARMV8A;		ArchKind = llvm::AArch64::ArchKind::ARMV8A;

for (const auto &Feature : Features) {		for (const auto &Feature : Features) {
if (Feature == "+neon")		if (Feature == "+neon")
FPU \|= NeonMode;		FPU \|= NeonMode;
if (Feature == "+sve")		if (Feature == "+sve")
FPU \|= SveMode;		FPU \|= SveMode;
if (Feature == "+crc")		if (Feature == "+crc")
CRC = 1;		CRC = 1;
if (Feature == "+crypto")		if (Feature == "+crypto")
Crypto = 1;		Crypto = 1;
if (Feature == "+strict-align")		if (Feature == "+strict-align")
Unaligned = 0;		Unaligned = 0;
if (Feature == "+v8.1a")		if (Feature == "+v8.1a")
ArchKind = llvm::AArch64::ArchKind::ARMV8_1A;		ArchKind = llvm::AArch64::ArchKind::ARMV8_1A;
if (Feature == "+v8.2a")		if (Feature == "+v8.2a")
ArchKind = llvm::AArch64::ArchKind::ARMV8_2A;		ArchKind = llvm::AArch64::ArchKind::ARMV8_2A;
if (Feature == "+fullfp16")		if (Feature == "+fullfp16")
HasFullFP16 = 1;		HasFullFP16 = 1;
if (Feature == "+dotprod")		if (Feature == "+dotprod")
HasDotProd = 1;		HasDotProd = 1;
		if (Feature == "+fp16fml")
		HasFP16FML = 1;
}		}

setDataLayout();		setDataLayout();

return true;		return true;
}		}

TargetInfo::CallingConvCheckResult		TargetInfo::CallingConvCheckResult
▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,362 Lines • ▼ Show 20 Lines	static const NeonIntrinsicInfo AArch64SIMDIntrinsicMap[] = {
NEONMAP1(vcvtq_n_u64_v, aarch64_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvtq_n_u64_v, aarch64_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvtx_f32_v, aarch64_neon_fcvtxn, AddRetType \| Add1ArgType),		NEONMAP1(vcvtx_f32_v, aarch64_neon_fcvtxn, AddRetType \| Add1ArgType),
NEONMAP2(vdot_v, aarch64_neon_udot, aarch64_neon_sdot, 0),		NEONMAP2(vdot_v, aarch64_neon_udot, aarch64_neon_sdot, 0),
NEONMAP2(vdotq_v, aarch64_neon_udot, aarch64_neon_sdot, 0),		NEONMAP2(vdotq_v, aarch64_neon_udot, aarch64_neon_sdot, 0),
NEONMAP0(vext_v),		NEONMAP0(vext_v),
NEONMAP0(vextq_v),		NEONMAP0(vextq_v),
NEONMAP0(vfma_v),		NEONMAP0(vfma_v),
NEONMAP0(vfmaq_v),		NEONMAP0(vfmaq_v),
		NEONMAP1(vfmlal_high_v, aarch64_neon_fmlal2, 0),
		NEONMAP1(vfmlal_low_v, aarch64_neon_fmlal, 0),
		NEONMAP1(vfmlalq_high_v, aarch64_neon_fmlal2, 0),
		NEONMAP1(vfmlalq_low_v, aarch64_neon_fmlal, 0),
		NEONMAP1(vfmlsl_high_v, aarch64_neon_fmlsl2, 0),
		NEONMAP1(vfmlsl_low_v, aarch64_neon_fmlsl, 0),
		NEONMAP1(vfmlslq_high_v, aarch64_neon_fmlsl2, 0),
		NEONMAP1(vfmlslq_low_v, aarch64_neon_fmlsl, 0),
NEONMAP2(vhadd_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhadd_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),
NEONMAP2(vhaddq_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhaddq_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),
NEONMAP2(vhsub_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhsub_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),
NEONMAP2(vhsubq_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhsubq_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),
NEONMAP1(vld1_x2_v, aarch64_neon_ld1x2, 0),		NEONMAP1(vld1_x2_v, aarch64_neon_ld1x2, 0),
NEONMAP1(vld1_x3_v, aarch64_neon_ld1x3, 0),		NEONMAP1(vld1_x3_v, aarch64_neon_ld1x3, 0),
NEONMAP1(vld1_x4_v, aarch64_neon_ld1x4, 0),		NEONMAP1(vld1_x4_v, aarch64_neon_ld1x4, 0),
NEONMAP1(vld1q_x2_v, aarch64_neon_ld1x2, 0),		NEONMAP1(vld1q_x2_v, aarch64_neon_ld1x2, 0),
▲ Show 20 Lines • Show All 957 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
case NEON::BI__builtin_neon_vdot_v:		case NEON::BI__builtin_neon_vdot_v:
case NEON::BI__builtin_neon_vdotq_v: {		case NEON::BI__builtin_neon_vdotq_v: {
llvm::Type *InputTy =		llvm::Type *InputTy =
llvm::VectorType::get(Int8Ty, Ty->getPrimitiveSizeInBits() / 8);		llvm::VectorType::get(Int8Ty, Ty->getPrimitiveSizeInBits() / 8);
llvm::Type *Tys[2] = { Ty, InputTy };		llvm::Type *Tys[2] = { Ty, InputTy };
Int = Usgn ? LLVMIntrinsic : AltLLVMIntrinsic;		Int = Usgn ? LLVMIntrinsic : AltLLVMIntrinsic;
return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vdot");		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vdot");
}		}
		case NEON::BI__builtin_neon_vfmlal_low_v:
		case NEON::BI__builtin_neon_vfmlalq_low_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlal_low");
		}
		case NEON::BI__builtin_neon_vfmlsl_low_v:
		case NEON::BI__builtin_neon_vfmlslq_low_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlsl_low");
		}
		case NEON::BI__builtin_neon_vfmlal_high_v:
		case NEON::BI__builtin_neon_vfmlalq_high_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlal_high");
		}
		case NEON::BI__builtin_neon_vfmlsl_high_v:
		case NEON::BI__builtin_neon_vfmlslq_high_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlsl_high");
		}
}		}

assert(Int && "Expected valid intrinsic number");		assert(Int && "Expected valid intrinsic number");

// Determine the type(s) of this overloaded AArch64 intrinsic.		// Determine the type(s) of this overloaded AArch64 intrinsic.
Function *F = LookupNeonLLVMIntrinsic(Int, Modifier, Ty, E);		Function *F = LookupNeonLLVMIntrinsic(Int, Modifier, Ty, E);

Value *Result = EmitNeonCall(F, Ops, NameHint);		Value *Result = EmitNeonCall(F, Ops, NameHint);
▲ Show 20 Lines • Show All 7,533 Lines • Show Last 20 Lines

test/CodeGen/aarch64-neon-fp16fml.c

This file was added.

				// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +v8.2a -target-feature +neon -target-feature +fp16fml \
				// RUN: -fallow-half-arguments-and-returns -disable-O0-optnone -emit-llvm -o - %s \| opt -S -instcombine \| FileCheck %s

				// REQUIRES: aarch64-registered-target

				// Test AArch64 Armv8.2-A FP16 Fused Multiply-Add Long intrinsics

				#include <arm_neon.h>

				// Vector form

				float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_low_u32(a, b, c);
				}

				float32x2_t test_vfmlsl_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_low_u32(a, b, c);
				}

				float32x2_t test_vfmlal_high_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_high_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_high_u32(a, b, c);
				}

				float32x2_t test_vfmlsl_high_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_high_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_high_u32(a, b, c);
				}

				float32x4_t test_vfmlalq_low_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_low_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_low_u32(a, b, c);
				}

				float32x4_t test_vfmlslq_low_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_low_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_low_u32(a, b, c);
				}

				float32x4_t test_vfmlalq_high_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_high_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_high_u32(a, b, c);
				}

				float32x4_t test_vfmlslq_high_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_high_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_high_u32(a, b, c);
				}

				// Indexed form

				float32x2_t test_vfmlal_lane_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_lane_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_lane_low_u32(a, b, c, 0);
				}

				float32x2_t test_vfmlal_lane_high_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_lane_high_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_lane_high_u32(a, b, c, 1);
				}

				float32x4_t test_vfmlalq_lane_low_u32(float32x4_t a, float16x8_t b, float16x4_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_lane_low_u32(<4 x float> %a, <8 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_lane_low_u32(a, b, c, 2);
				}

				float32x4_t test_vfmlalq_lane_high_u32(float32x4_t a, float16x8_t b, float16x4_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_lane_high_u32(<4 x float> %a, <8 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_lane_high_u32(a, b, c, 3);
				}

				float32x2_t test_vfmlal_laneq_low_u32(float32x2_t a, float16x4_t b, float16x8_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_laneq_low_u32(<2 x float> %a, <4 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> <i32 4, i32 4, i32 4, i32 4>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_laneq_low_u32(a, b, c, 4);
				}

				float32x2_t test_vfmlal_laneq_high_u32(float32x2_t a, float16x4_t b, float16x8_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_laneq_high_u32(<2 x float> %a, <4 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> <i32 5, i32 5, i32 5, i32 5>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_laneq_high_u32(a, b, c, 5);
				}

				float32x4_t test_vfmlalq_laneq_low_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_laneq_low_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> <i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_laneq_low_u32(a, b, c, 6);
				}

				float32x4_t test_vfmlalq_laneq_high_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_laneq_high_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_laneq_high_u32(a, b, c, 7);
				}

				float32x2_t test_vfmlsl_lane_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_lane_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_lane_low_u32(a, b, c, 0);
				}

				float32x2_t test_vfmlsl_lane_high_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_lane_high_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_lane_high_u32(a, b, c, 1);
				}

				float32x4_t test_vfmlslq_lane_low_u32(float32x4_t a, float16x8_t b, float16x4_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_lane_low_u32(<4 x float> %a, <8 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_lane_low_u32(a, b, c, 2);
				}

				float32x4_t test_vfmlslq_lane_high_u32(float32x4_t a, float16x8_t b, float16x4_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_lane_high_u32(<4 x float> %a, <8 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_lane_high_u32(a, b, c, 3);
				}

				float32x2_t test_vfmlsl_laneq_low_u32(float32x2_t a, float16x4_t b, float16x8_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_laneq_low_u32(<2 x float> %a, <4 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> <i32 4, i32 4, i32 4, i32 4>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_laneq_low_u32(a, b, c, 4);
				}

				float32x2_t test_vfmlsl_laneq_high_u32(float32x2_t a, float16x4_t b, float16x8_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_laneq_high_u32(<2 x float> %a, <4 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> <i32 5, i32 5, i32 5, i32 5>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_laneq_high_u32(a, b, c, 5);
				}

				float32x4_t test_vfmlslq_laneq_low_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_laneq_low_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> <i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_laneq_low_u32(a, b, c, 6);
				}

				float32x4_t test_vfmlslq_laneq_high_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_laneq_high_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_laneq_high_u32(a, b, c, 7);
				}

test/Preprocessor/aarch64-target-features.c

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	// RUN: %clang -target aarch64 -mtune=cyclone -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MTUNE-CYCLONE %s			// RUN: %clang -target aarch64 -mtune=cyclone -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MTUNE-CYCLONE %s

	// RUN: %clang -target aarch64-none-linux-gnu -march=armv8-a+sve -x c -E -dM %s -o - \| FileCheck --check-prefix=CHECK-SVE %s			// RUN: %clang -target aarch64-none-linux-gnu -march=armv8-a+sve -x c -E -dM %s -o - \| FileCheck --check-prefix=CHECK-SVE %s
	// CHECK-SVE: __ARM_FEATURE_SVE 1			// CHECK-SVE: __ARM_FEATURE_SVE 1

	// RUN: %clang -target aarch64-none-linux-gnu -march=armv8.2a+dotprod -x c -E -dM %s -o - \| FileCheck --check-prefix=CHECK-DOTPROD %s			// RUN: %clang -target aarch64-none-linux-gnu -march=armv8.2a+dotprod -x c -E -dM %s -o - \| FileCheck --check-prefix=CHECK-DOTPROD %s
	// CHECK-DOTPROD: __ARM_FEATURE_DOTPROD 1			// CHECK-DOTPROD: __ARM_FEATURE_DOTPROD 1

	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// On ARMv8.2-A and above, +fp16fml implies +fp16.
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// On ARMv8.4-A and above, +fp16 implies +fp16fml.
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
				// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
				// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
				// CHECK-FULLFP16-FML: #define __ARM_FEATURE_FP16FML 1
				// CHECK-FULLFP16-NOFML-NOT: #define __ARM_FEATURE_FP16FML 1
	// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1			// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
	// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1			// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
	// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FP 0xE			// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FP 0xE
	// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1			// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1

	// +fp16fml+nosimd doesn't make sense as the fp16fml instructions all require SIMD.			// +fp16fml+nosimd doesn't make sense as the fp16fml instructions all require SIMD.
	// However, as +fp16fml implies +fp16 there is a set of defines that we would expect.			// However, as +fp16fml implies +fp16 there is a set of defines that we would expect.
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s
				// CHECK-FULLFP16-SCALAR-NOT: #define __ARM_FEATURE_FP16FML 1
	// CHECK-FULLFP16-SCALAR: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1			// CHECK-FULLFP16-SCALAR: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
	// CHECK-FULLFP16-SCALAR-NOT: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1			// CHECK-FULLFP16-SCALAR-NOT: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
	// CHECK-FULLFP16-SCALAR: #define __ARM_FP 0xE			// CHECK-FULLFP16-SCALAR: #define __ARM_FP 0xE
	// CHECK-FULLFP16-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1			// CHECK-FULLFP16-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1

	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16fml+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16fml+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
				// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16FML 1
	// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1			// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
	// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1			// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
	// CHECK-FULLFP16-NOFML-VECTOR-SCALAR: #define __ARM_FP 0xE			// CHECK-FULLFP16-NOFML-VECTOR-SCALAR: #define __ARM_FP 0xE
	// CHECK-FULLFP16-NOFML-VECTOR-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1			// CHECK-FULLFP16-NOFML-VECTOR-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1

	// ================== Check whether -mtune accepts mixed-case features.			// ================== Check whether -mtune accepts mixed-case features.
	// RUN: %clang -target aarch64 -mtune=CYCLONE -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MTUNE-CYCLONE %s			// RUN: %clang -target aarch64 -mtune=CYCLONE -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MTUNE-CYCLONE %s
	// CHECK-MTUNE-CYCLONE: "-cc1"{{.}} "-triple" "aarch64{{.}}" "-target-feature" "+neon" "-target-feature" "+zcm" "-target-feature" "+zcz"			// CHECK-MTUNE-CYCLONE: "-cc1"{{.}} "-triple" "aarch64{{.}}" "-target-feature" "+neon" "-target-feature" "+zcm" "-target-feature" "+zcz"

	// RUN: %clang -target aarch64 -mcpu=cyclone -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-CYCLONE %s			// RUN: %clang -target aarch64 -mcpu=cyclone -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-CYCLONE %s
	// RUN: %clang -target aarch64 -mcpu=cortex-a35 -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-A35 %s			// RUN: %clang -target aarch64 -mcpu=cortex-a35 -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-A35 %s
	// RUN: %clang -target aarch64 -mcpu=cortex-a53 -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-A53 %s			// RUN: %clang -target aarch64 -mcpu=cortex-a53 -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-A53 %s
	▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

utils/TableGen/NeonEmitter.cpp

Show First 20 Lines • Show All 488 Lines • ▼ Show 20 Lines	private:
public:		public:
DagEmitter(Intrinsic &Intr, StringRef CallPrefix) :		DagEmitter(Intrinsic &Intr, StringRef CallPrefix) :
Intr(Intr), CallPrefix(CallPrefix) {		Intr(Intr), CallPrefix(CallPrefix) {
}		}
std::pair<Type, std::string> emitDagArg(Init *Arg, std::string ArgName);		std::pair<Type, std::string> emitDagArg(Init *Arg, std::string ArgName);
std::pair<Type, std::string> emitDagSaveTemp(DagInit *DI);		std::pair<Type, std::string> emitDagSaveTemp(DagInit *DI);
std::pair<Type, std::string> emitDagSplat(DagInit *DI);		std::pair<Type, std::string> emitDagSplat(DagInit *DI);
std::pair<Type, std::string> emitDagDup(DagInit *DI);		std::pair<Type, std::string> emitDagDup(DagInit *DI);
		std::pair<Type, std::string> emitDagDupTyped(DagInit *DI);
std::pair<Type, std::string> emitDagShuffle(DagInit *DI);		std::pair<Type, std::string> emitDagShuffle(DagInit *DI);
std::pair<Type, std::string> emitDagCast(DagInit *DI, bool IsBitCast);		std::pair<Type, std::string> emitDagCast(DagInit *DI, bool IsBitCast);
std::pair<Type, std::string> emitDagCall(DagInit *DI);		std::pair<Type, std::string> emitDagCall(DagInit *DI);
std::pair<Type, std::string> emitDagNameReplace(DagInit *DI);		std::pair<Type, std::string> emitDagNameReplace(DagInit *DI);
std::pair<Type, std::string> emitDagLiteral(DagInit *DI);		std::pair<Type, std::string> emitDagLiteral(DagInit *DI);
std::pair<Type, std::string> emitDagOp(DagInit *DI);		std::pair<Type, std::string> emitDagOp(DagInit *DI);
std::pair<Type, std::string> emitDag(DagInit *DI);		std::pair<Type, std::string> emitDag(DagInit *DI);
};		};
▲ Show 20 Lines • Show All 387 Lines • ▼ Show 20 Lines	void Type::applyModifier(char Mod) {
case 'F':		case 'F':
Float = true;		Float = true;
ElementBitwidth = 64;		ElementBitwidth = 64;
break;		break;
case 'H':		case 'H':
Float = true;		Float = true;
ElementBitwidth = 16;		ElementBitwidth = 16;
break;		break;
		case '0':
		Float = true;
		if (AppliedQuad)
		Bitwidth /= 2;
		ElementBitwidth = 16;
		break;
		case '1':
		Float = true;
		if (!AppliedQuad)
		Bitwidth *= 2;
		ElementBitwidth = 16;
		break;
case 'g':		case 'g':
if (AppliedQuad)		if (AppliedQuad)
Bitwidth /= 2;		Bitwidth /= 2;
break;		break;
case 'j':		case 'j':
if (!AppliedQuad)		if (!AppliedQuad)
Bitwidth *= 2;		Bitwidth *= 2;
break;		break;
▲ Show 20 Lines • Show All 594 Lines • ▼ Show 20 Lines	std::pair<Type, std::string> Intrinsic::DagEmitter::emitDag(DagInit *DI) {
std::string Op = DefI->getAsString();		std::string Op = DefI->getAsString();

if (Op == "cast" \|\| Op == "bitcast")		if (Op == "cast" \|\| Op == "bitcast")
return emitDagCast(DI, Op == "bitcast");		return emitDagCast(DI, Op == "bitcast");
if (Op == "shuffle")		if (Op == "shuffle")
return emitDagShuffle(DI);		return emitDagShuffle(DI);
if (Op == "dup")		if (Op == "dup")
return emitDagDup(DI);		return emitDagDup(DI);
		if (Op == "dup_typed")
		return emitDagDupTyped(DI);
if (Op == "splat")		if (Op == "splat")
return emitDagSplat(DI);		return emitDagSplat(DI);
if (Op == "save_temp")		if (Op == "save_temp")
return emitDagSaveTemp(DI);		return emitDagSaveTemp(DI);
if (Op == "op")		if (Op == "op")
return emitDagOp(DI);		return emitDagOp(DI);
if (Op == "call")		if (Op == "call")
return emitDagCall(DI);		return emitDagCall(DI);
▲ Show 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	if (I != 0)
S += ", ";		S += ", ";
S += A.second;		S += A.second;
}		}
S += "}";		S += "}";

return std::make_pair(T, S);		return std::make_pair(T, S);
}		}

		std::pair<Type, std::string> Intrinsic::DagEmitter::emitDagDupTyped(DagInit *DI) {
		assert_with_loc(DI->getNumArgs() == 2, "dup_typed() expects two arguments");
		std::pair<Type, std::string> A = emitDagArg(DI->getArg(0),
		DI->getArgNameStr(0));
		std::pair<Type, std::string> B = emitDagArg(DI->getArg(1),
		DI->getArgNameStr(1));
		assert_with_loc(B.first.isScalar(),
		"dup_typed() requires a scalar as the second argument");

		Type T = A.first;
		assert_with_loc(T.isVector(), "dup_typed() used but target type is scalar!");
		std::string S = "(" + T.str() + ") {";
		for (unsigned I = 0; I < T.getNumElements(); ++I) {
		if (I != 0)
		S += ", ";
		S += B.second;
		}
		S += "}";

		return std::make_pair(T, S);
		}

std::pair<Type, std::string> Intrinsic::DagEmitter::emitDagSplat(DagInit *DI) {		std::pair<Type, std::string> Intrinsic::DagEmitter::emitDagSplat(DagInit *DI) {
assert_with_loc(DI->getNumArgs() == 2, "splat() expects two arguments");		assert_with_loc(DI->getNumArgs() == 2, "splat() expects two arguments");
std::pair<Type, std::string> A = emitDagArg(DI->getArg(0),		std::pair<Type, std::string> A = emitDagArg(DI->getArg(0),
DI->getArgNameStr(0));		DI->getArgNameStr(0));
std::pair<Type, std::string> B = emitDagArg(DI->getArg(1),		std::pair<Type, std::string> B = emitDagArg(DI->getArg(1),
DI->getArgNameStr(1));		DI->getArgNameStr(1));

assert_with_loc(B.first.isScalar(),		assert_with_loc(B.first.isScalar(),
▲ Show 20 Lines • Show All 818 Lines • Show Last 20 Lines