This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implement FP16FML intrinsics
ClosedPublic

Authored by bryanpkc on Oct 23 2018, 9:41 PM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
bogden
efriedma
t.p.northover

Commits

rG223307b3dc0c: [AArch64] Implement FP16FML intrinsics
rC345344: [AArch64] Implement FP16FML intrinsics
rL345344: [AArch64] Implement FP16FML intrinsics

Summary

Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now).
Add two new type modifiers to NeonEmitter to handle the new prototypes.
Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the
intrinsics with the macro in arm_neon.h.

Based on a patch by Gao Yiling.

Diff Detail

Repository: rL LLVM

Event Timeline

bryanpkc created this revision.Oct 23 2018, 9:41 PM

Herald added subscribers: cfe-commits, kristof.beyls, javed.absar. · View Herald TranscriptOct 23 2018, 9:41 PM

I think this is reasonable.

This revision is now accepted and ready to land.Oct 24 2018, 10:52 AM

In D53633#1274621, @t.p.northover wrote:

I think this is reasonable.

Thanks Tim. Could you also review D53632, which is the LLVM part of this implementation?

Closed by commit rL345344: [AArch64] Implement FP16FML intrinsics (authored by bryanpkc). · Explain WhyOct 25 2018, 4:50 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptOct 25 2018, 4:50 PM

ab added a subscriber: ab.Feb 14 2019, 4:30 PM

ab added inline comments.

cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12	Hey folks, I'm curious: where does the "_u32" suffix come from? Should it be _f16? Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been any release since IHI0073B/IHI0053D.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 4:30 PM

Herald added a subscriber: jdoerfert. · View Herald Transcript

SjoerdMeijer added inline comments.Feb 15 2019, 2:32 AM

cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12	Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been any release since IHI0073B/IHI0053D. I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon. where does the "_u32" suffix come from? Should it be _f16? Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to make much sense. Looks like the spec says _u32, and that's also what GCC has implemented. I think we want to update the spec and fix the name before the updated spec is available. Will chase this, and let you know once I know more.

SjoerdMeijer added inline comments.Feb 15 2019, 6:52 AM

cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12	An update on this: we should change this to _f32 (because the first suffixes were refering to the ouput type). The ACLE will be updated accordingly, and also GCC will change its current implementation (from _u32 to _f32). Many thanks for raising this issue. Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week.

ab added inline comments.Feb 15 2019, 2:09 PM

cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c
12	I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon. Great, thanks! An update on this: we should change this to _f32 (because the first suffixes were refering to the ouput type). Hmm, I was thinking _f16 based on the vmlal intrinsics: they seem to be named after the multiplication type rather than that of the accumulator/output. Either way seems fine to me though, I'll defer to you folks. The ACLE will be updated accordingly, and also GCC will change its current implementation (from _u32 to _f32). Many thanks for raising this issue. Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week. Sure: D58306 (with _f16 though, let me know what you think of vmlal) Thanks for checking!

FYI: a new ACLE version has been published, please find it here: https://developer.arm.com/architectures/system-architectures/software-standards/acle

The "Neon Intrinsics" section contains these new intrinsics.

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Basic/

arm_neon.td

27 lines

arm_neon_incl.td

7 lines

lib/

Basic/

Targets/

AArch64.h

1 line

AArch64.cpp

6 lines

CodeGen/

CGBuiltin.cpp

36 lines

test/

CodeGen/

aarch64-neon-fp16fml.c

196 lines

Preprocessor/

aarch64-target-features.c

30 lines

utils/

TableGen/

NeonEmitter.cpp

37 lines

Diff 171230

cfe/trunk/include/clang/Basic/arm_neon.td

	Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines

	def OP_DOT_LN			def OP_DOT_LN
	: Op<(call "vdot", $p0, $p1,			: Op<(call "vdot", $p0, $p1,
	(bitcast $p1, (splat(bitcast "uint32x2_t", $p2), $p3)))>;			(bitcast $p1, (splat(bitcast "uint32x2_t", $p2), $p3)))>;
	def OP_DOT_LNQ			def OP_DOT_LNQ
	: Op<(call "vdot", $p0, $p1,			: Op<(call "vdot", $p0, $p1,
	(bitcast $p1, (splat(bitcast "uint32x4_t", $p2), $p3)))>;			(bitcast $p1, (splat(bitcast "uint32x4_t", $p2), $p3)))>;

				def OP_FMLAL_LN : Op<(call "vfmlal_low", $p0, $p1,
				(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
				def OP_FMLSL_LN : Op<(call "vfmlsl_low", $p0, $p1,
				(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
				def OP_FMLAL_LN_Hi : Op<(call "vfmlal_high", $p0, $p1,
				(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;
				def OP_FMLSL_LN_Hi : Op<(call "vfmlsl_high", $p0, $p1,
				(dup_typed $p1, (call "vget_lane", $p2, $p3)))>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Instructions			// Instructions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// E.3.1 Addition			// E.3.1 Addition
	def VADD : IOpInst<"vadd", "ddd",			def VADD : IOpInst<"vadd", "ddd",
	"csilfUcUsUiUlQcQsQiQlQfQUcQUsQUiQUl", OP_ADD>;			"csilfUcUsUiUlQcQsQiQlQfQUcQUsQUiQUl", OP_ADD>;
	▲ Show 20 Lines • Show All 1,418 Lines • ▼ Show 20 Lines
	let ArchGuard = "defined(__ARM_FEATURE_DOTPROD)" in {			let ArchGuard = "defined(__ARM_FEATURE_DOTPROD)" in {
	def DOT : SInst<"vdot", "dd88", "iQiUiQUi">;			def DOT : SInst<"vdot", "dd88", "iQiUiQUi">;
	def DOT_LANE : SOpInst<"vdot_lane", "dd87i", "iUiQiQUi", OP_DOT_LN>;			def DOT_LANE : SOpInst<"vdot_lane", "dd87i", "iUiQiQUi", OP_DOT_LN>;
	}			}
	let ArchGuard = "defined(__ARM_FEATURE_DOTPROD) && defined(__aarch64__)" in {			let ArchGuard = "defined(__ARM_FEATURE_DOTPROD) && defined(__aarch64__)" in {
	// Variants indexing into a 128-bit vector are A64 only.			// Variants indexing into a 128-bit vector are A64 only.
	def UDOT_LANEQ : SOpInst<"vdot_laneq", "dd89i", "iUiQiQUi", OP_DOT_LNQ>;			def UDOT_LANEQ : SOpInst<"vdot_laneq", "dd89i", "iUiQiQUi", OP_DOT_LNQ>;
	}			}

				// v8.2-A FP16 fused multiply-add long instructions.
				let ArchGuard = "defined(__ARM_FEATURE_FP16FML) && defined(__aarch64__)" in {
				def VFMLAL_LOW : SInst<"vfmlal_low", "ffHH", "UiQUi">;
				def VFMLSL_LOW : SInst<"vfmlsl_low", "ffHH", "UiQUi">;
				def VFMLAL_HIGH : SInst<"vfmlal_high", "ffHH", "UiQUi">;
				def VFMLSL_HIGH : SInst<"vfmlsl_high", "ffHH", "UiQUi">;

				def VFMLAL_LANE_LOW : SOpInst<"vfmlal_lane_low", "ffH0i", "UiQUi", OP_FMLAL_LN>;
				def VFMLSL_LANE_LOW : SOpInst<"vfmlsl_lane_low", "ffH0i", "UiQUi", OP_FMLSL_LN>;
				def VFMLAL_LANE_HIGH : SOpInst<"vfmlal_lane_high", "ffH0i", "UiQUi", OP_FMLAL_LN_Hi>;
				def VFMLSL_LANE_HIGH : SOpInst<"vfmlsl_lane_high", "ffH0i", "UiQUi", OP_FMLSL_LN_Hi>;

				def VFMLAL_LANEQ_LOW : SOpInst<"vfmlal_laneq_low", "ffH1i", "UiQUi", OP_FMLAL_LN>;
				def VFMLSL_LANEQ_LOW : SOpInst<"vfmlsl_laneq_low", "ffH1i", "UiQUi", OP_FMLSL_LN>;
				def VFMLAL_LANEQ_HIGH : SOpInst<"vfmlal_laneq_high", "ffH1i", "UiQUi", OP_FMLAL_LN_Hi>;
				def VFMLSL_LANEQ_HIGH : SOpInst<"vfmlsl_laneq_high", "ffH1i", "UiQUi", OP_FMLSL_LN_Hi>;
				}

cfe/trunk/include/clang/Basic/arm_neon_incl.td

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	// The VAL argument is saved to a temporary so it can be used			// The VAL argument is saved to a temporary so it can be used
	// as an l-value.			// as an l-value.
	def bitcast;			def bitcast;
	// dup - Take a scalar argument and create a vector by duplicating it into			// dup - Take a scalar argument and create a vector by duplicating it into
	// all lanes. The type of the vector is the base type of the intrinsic.			// all lanes. The type of the vector is the base type of the intrinsic.
	// example: (dup $p1) -> "(uint32x2_t) {__p1, __p1}" (assuming the base type			// example: (dup $p1) -> "(uint32x2_t) {__p1, __p1}" (assuming the base type
	// is uint32x2_t).			// is uint32x2_t).
	def dup;			def dup;
				// dup_typed - Take a vector and a scalar argument, and create a new vector of
				// the same type by duplicating the scalar value into all lanes.
				// example: (dup_typed $p1, $p2) -> "(float16x4_t) {__p2, __p2, __p2, __p2}"
				// (assuming __p1 is float16x4_t, and __p2 is a compatible scalar).
				def dup_typed;
	// splat - Take a vector and a lane index, and return a vector of the same type			// splat - Take a vector and a lane index, and return a vector of the same type
	// containing repeated instances of the source vector at the lane index.			// containing repeated instances of the source vector at the lane index.
	// example: (splat $p0, $p1) ->			// example: (splat $p0, $p1) ->
	// "__builtin_shufflevector(__p0, __p0, __p1, __p1, __p1, __p1)"			// "__builtin_shufflevector(__p0, __p0, __p1, __p1, __p1, __p1)"
	// (assuming __p0 has four elements).			// (assuming __p0 has four elements).
	def splat;			def splat;
	// save_temp - Create a temporary (local) variable. The variable takes a name			// save_temp - Create a temporary (local) variable. The variable takes a name
	// based on the zero'th parameter and can be referenced using			// based on the zero'th parameter and can be referenced using
	▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	//			//
	// v: void			// v: void
	// t: best-fit integer (int/poly args)			// t: best-fit integer (int/poly args)
	// x: signed integer (int/float args)			// x: signed integer (int/float args)
	// u: unsigned integer (int/float args)			// u: unsigned integer (int/float args)
	// f: float (int args)			// f: float (int args)
	// F: double (int args)			// F: double (int args)
	// H: half (int args)			// H: half (int args)
				// 0: half (int args), ignore 'Q' size modifier.
				// 1: half (int args), force 'Q' size modifier.
	// d: default			// d: default
	// g: default, ignore 'Q' size modifier.			// g: default, ignore 'Q' size modifier.
	// j: default, force 'Q' size modifier.			// j: default, force 'Q' size modifier.
	// w: double width elements, same num elts			// w: double width elements, same num elts
	// n: double width elements, half num elts			// n: double width elements, half num elts
	// h: half width elements, double num elts			// h: half width elements, double num elts
	// q: half width elements, quad num elts			// q: half width elements, quad num elts
	// e: half width elements, double num elts, unsigned			// e: half width elements, double num elts, unsigned
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

cfe/trunk/lib/Basic/Targets/AArch64.h

Show All 28 Lines	class LLVM_LIBRARY_VISIBILITY AArch64TargetInfo : public TargetInfo {
enum FPUModeEnum { FPUMode, NeonMode = (1 << 0), SveMode = (1 << 1) };		enum FPUModeEnum { FPUMode, NeonMode = (1 << 0), SveMode = (1 << 1) };

unsigned FPU;		unsigned FPU;
unsigned CRC;		unsigned CRC;
unsigned Crypto;		unsigned Crypto;
unsigned Unaligned;		unsigned Unaligned;
unsigned HasFullFP16;		unsigned HasFullFP16;
unsigned HasDotProd;		unsigned HasDotProd;
		unsigned HasFP16FML;
llvm::AArch64::ArchKind ArchKind;		llvm::AArch64::ArchKind ArchKind;

static const Builtin::Info BuiltinInfo[];		static const Builtin::Info BuiltinInfo[];

std::string ABI;		std::string ABI;

public:		public:
AArch64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts);		AArch64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts);
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

cfe/trunk/lib/Basic/Targets/AArch64.cpp

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	void AArch64TargetInfo::getTargetDefines(const LangOptions &Opts,
if ((FPU & NeonMode) && HasFullFP16)		if ((FPU & NeonMode) && HasFullFP16)
Builder.defineMacro("__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", "1");		Builder.defineMacro("__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", "1");
if (HasFullFP16)		if (HasFullFP16)
Builder.defineMacro("__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", "1");		Builder.defineMacro("__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", "1");

if (HasDotProd)		if (HasDotProd)
Builder.defineMacro("__ARM_FEATURE_DOTPROD", "1");		Builder.defineMacro("__ARM_FEATURE_DOTPROD", "1");

		if ((FPU & NeonMode) && HasFP16FML)
		Builder.defineMacro("__ARM_FEATURE_FP16FML", "1");

switch (ArchKind) {		switch (ArchKind) {
default:		default:
break;		break;
case llvm::AArch64::ArchKind::ARMV8_1A:		case llvm::AArch64::ArchKind::ARMV8_1A:
getTargetDefinesARMV81A(Opts, Builder);		getTargetDefinesARMV81A(Opts, Builder);
break;		break;
case llvm::AArch64::ArchKind::ARMV8_2A:		case llvm::AArch64::ArchKind::ARMV8_2A:
getTargetDefinesARMV82A(Opts, Builder);		getTargetDefinesARMV82A(Opts, Builder);
Show All 21 Lines
bool AArch64TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,		bool AArch64TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
DiagnosticsEngine &Diags) {		DiagnosticsEngine &Diags) {
FPU = FPUMode;		FPU = FPUMode;
CRC = 0;		CRC = 0;
Crypto = 0;		Crypto = 0;
Unaligned = 1;		Unaligned = 1;
HasFullFP16 = 0;		HasFullFP16 = 0;
HasDotProd = 0;		HasDotProd = 0;
		HasFP16FML = 0;
ArchKind = llvm::AArch64::ArchKind::ARMV8A;		ArchKind = llvm::AArch64::ArchKind::ARMV8A;

for (const auto &Feature : Features) {		for (const auto &Feature : Features) {
if (Feature == "+neon")		if (Feature == "+neon")
FPU \|= NeonMode;		FPU \|= NeonMode;
if (Feature == "+sve")		if (Feature == "+sve")
FPU \|= SveMode;		FPU \|= SveMode;
if (Feature == "+crc")		if (Feature == "+crc")
CRC = 1;		CRC = 1;
if (Feature == "+crypto")		if (Feature == "+crypto")
Crypto = 1;		Crypto = 1;
if (Feature == "+strict-align")		if (Feature == "+strict-align")
Unaligned = 0;		Unaligned = 0;
if (Feature == "+v8.1a")		if (Feature == "+v8.1a")
ArchKind = llvm::AArch64::ArchKind::ARMV8_1A;		ArchKind = llvm::AArch64::ArchKind::ARMV8_1A;
if (Feature == "+v8.2a")		if (Feature == "+v8.2a")
ArchKind = llvm::AArch64::ArchKind::ARMV8_2A;		ArchKind = llvm::AArch64::ArchKind::ARMV8_2A;
if (Feature == "+fullfp16")		if (Feature == "+fullfp16")
HasFullFP16 = 1;		HasFullFP16 = 1;
if (Feature == "+dotprod")		if (Feature == "+dotprod")
HasDotProd = 1;		HasDotProd = 1;
		if (Feature == "+fp16fml")
		HasFP16FML = 1;
}		}

setDataLayout();		setDataLayout();

return true;		return true;
}		}

TargetInfo::CallingConvCheckResult		TargetInfo::CallingConvCheckResult
▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,362 Lines • ▼ Show 20 Lines	static const NeonIntrinsicInfo AArch64SIMDIntrinsicMap[] = {
NEONMAP1(vcvtq_n_u64_v, aarch64_neon_vcvtfp2fxu, 0),		NEONMAP1(vcvtq_n_u64_v, aarch64_neon_vcvtfp2fxu, 0),
NEONMAP1(vcvtx_f32_v, aarch64_neon_fcvtxn, AddRetType \| Add1ArgType),		NEONMAP1(vcvtx_f32_v, aarch64_neon_fcvtxn, AddRetType \| Add1ArgType),
NEONMAP2(vdot_v, aarch64_neon_udot, aarch64_neon_sdot, 0),		NEONMAP2(vdot_v, aarch64_neon_udot, aarch64_neon_sdot, 0),
NEONMAP2(vdotq_v, aarch64_neon_udot, aarch64_neon_sdot, 0),		NEONMAP2(vdotq_v, aarch64_neon_udot, aarch64_neon_sdot, 0),
NEONMAP0(vext_v),		NEONMAP0(vext_v),
NEONMAP0(vextq_v),		NEONMAP0(vextq_v),
NEONMAP0(vfma_v),		NEONMAP0(vfma_v),
NEONMAP0(vfmaq_v),		NEONMAP0(vfmaq_v),
		NEONMAP1(vfmlal_high_v, aarch64_neon_fmlal2, 0),
		NEONMAP1(vfmlal_low_v, aarch64_neon_fmlal, 0),
		NEONMAP1(vfmlalq_high_v, aarch64_neon_fmlal2, 0),
		NEONMAP1(vfmlalq_low_v, aarch64_neon_fmlal, 0),
		NEONMAP1(vfmlsl_high_v, aarch64_neon_fmlsl2, 0),
		NEONMAP1(vfmlsl_low_v, aarch64_neon_fmlsl, 0),
		NEONMAP1(vfmlslq_high_v, aarch64_neon_fmlsl2, 0),
		NEONMAP1(vfmlslq_low_v, aarch64_neon_fmlsl, 0),
NEONMAP2(vhadd_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhadd_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),
NEONMAP2(vhaddq_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhaddq_v, aarch64_neon_uhadd, aarch64_neon_shadd, Add1ArgType \| UnsignedAlts),
NEONMAP2(vhsub_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhsub_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),
NEONMAP2(vhsubq_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),		NEONMAP2(vhsubq_v, aarch64_neon_uhsub, aarch64_neon_shsub, Add1ArgType \| UnsignedAlts),
NEONMAP1(vld1_x2_v, aarch64_neon_ld1x2, 0),		NEONMAP1(vld1_x2_v, aarch64_neon_ld1x2, 0),
NEONMAP1(vld1_x3_v, aarch64_neon_ld1x3, 0),		NEONMAP1(vld1_x3_v, aarch64_neon_ld1x3, 0),
NEONMAP1(vld1_x4_v, aarch64_neon_ld1x4, 0),		NEONMAP1(vld1_x4_v, aarch64_neon_ld1x4, 0),
NEONMAP1(vld1q_x2_v, aarch64_neon_ld1x2, 0),		NEONMAP1(vld1q_x2_v, aarch64_neon_ld1x2, 0),
▲ Show 20 Lines • Show All 957 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
case NEON::BI__builtin_neon_vdot_v:		case NEON::BI__builtin_neon_vdot_v:
case NEON::BI__builtin_neon_vdotq_v: {		case NEON::BI__builtin_neon_vdotq_v: {
llvm::Type *InputTy =		llvm::Type *InputTy =
llvm::VectorType::get(Int8Ty, Ty->getPrimitiveSizeInBits() / 8);		llvm::VectorType::get(Int8Ty, Ty->getPrimitiveSizeInBits() / 8);
llvm::Type *Tys[2] = { Ty, InputTy };		llvm::Type *Tys[2] = { Ty, InputTy };
Int = Usgn ? LLVMIntrinsic : AltLLVMIntrinsic;		Int = Usgn ? LLVMIntrinsic : AltLLVMIntrinsic;
return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vdot");		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vdot");
}		}
		case NEON::BI__builtin_neon_vfmlal_low_v:
		case NEON::BI__builtin_neon_vfmlalq_low_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlal_low");
		}
		case NEON::BI__builtin_neon_vfmlsl_low_v:
		case NEON::BI__builtin_neon_vfmlslq_low_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlsl_low");
		}
		case NEON::BI__builtin_neon_vfmlal_high_v:
		case NEON::BI__builtin_neon_vfmlalq_high_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlal_high");
		}
		case NEON::BI__builtin_neon_vfmlsl_high_v:
		case NEON::BI__builtin_neon_vfmlslq_high_v: {
		llvm::Type *InputTy =
		llvm::VectorType::get(HalfTy, Ty->getPrimitiveSizeInBits() / 16);
		llvm::Type *Tys[2] = { Ty, InputTy };
		return EmitNeonCall(CGM.getIntrinsic(Int, Tys), Ops, "vfmlsl_high");
		}
}		}

assert(Int && "Expected valid intrinsic number");		assert(Int && "Expected valid intrinsic number");

// Determine the type(s) of this overloaded AArch64 intrinsic.		// Determine the type(s) of this overloaded AArch64 intrinsic.
Function *F = LookupNeonLLVMIntrinsic(Int, Modifier, Ty, E);		Function *F = LookupNeonLLVMIntrinsic(Int, Modifier, Ty, E);

Value *Result = EmitNeonCall(F, Ops, NameHint);		Value *Result = EmitNeonCall(F, Ops, NameHint);
▲ Show 20 Lines • Show All 7,561 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGen/aarch64-neon-fp16fml.c

				// RUN: %clang_cc1 -triple arm64-none-linux-gnu -target-feature +v8.2a -target-feature +neon -target-feature +fp16fml \
				// RUN: -fallow-half-arguments-and-returns -disable-O0-optnone -emit-llvm -o - %s \| opt -S -instcombine \| FileCheck %s

				// REQUIRES: aarch64-registered-target

				// Test AArch64 Armv8.2-A FP16 Fused Multiply-Add Long intrinsics

				#include <arm_neon.h>

				// Vector form

				float32x2_t test_vfmlal_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				abUnsubmitted Not Done Reply Inline Actions Hey folks, I'm curious: where does the "_u32" suffix come from? Should it be _f16? Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been any release since IHI0073B/IHI0053D. ab: Hey folks, I'm curious: where does the "_u32" suffix come from? Should it be _f16? Also, are…
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been any release since IHI0073B/IHI0053D. I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon. where does the "_u32" suffix come from? Should it be _f16? Good question. It could probably be _f32 or _f16, but _u32 doesn't seem to make much sense. Looks like the spec says _u32, and that's also what GCC has implemented. I think we want to update the spec and fix the name before the updated spec is available. Will chase this, and let you know once I know more. SjoerdMeijer: > Also, are there any new ACLE/intrinsic list documents? As far as I can tell there hasn't been…
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions An update on this: we should change this to _f32 (because the first suffixes were refering to the ouput type). The ACLE will be updated accordingly, and also GCC will change its current implementation (from _u32 to _f32). Many thanks for raising this issue. Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week. SjoerdMeijer: An update on this: we should change this to _f32 (because the first suffixes were refering to…
				abUnsubmitted Not Done Reply Inline Actions I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon. Great, thanks! An update on this: we should change this to _f32 (because the first suffixes were refering to the ouput type). Hmm, I was thinking _f16 based on the vmlal intrinsics: they seem to be named after the multiplication type rather than that of the accumulator/output. Either way seems fine to me though, I'll defer to you folks. The ACLE will be updated accordingly, and also GCC will change its current implementation (from _u32 to _f32). Many thanks for raising this issue. Is there a volunteer to prepare a patch? Or do you have one already? :-) I could look at it, but that will be towards the end of next week. Sure: D58306 (with _f16 though, let me know what you think of vmlal) Thanks for checking! ab: > I've checked, and an updated ACLE that includes these FP16FML intrinsics is coming soon.
				// CHECK-LABEL: define <2 x float> @test_vfmlal_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_low_u32(a, b, c);
				}

				float32x2_t test_vfmlsl_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_low_u32(a, b, c);
				}

				float32x2_t test_vfmlal_high_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_high_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_high_u32(a, b, c);
				}

				float32x2_t test_vfmlsl_high_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_high_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_high_u32(a, b, c);
				}

				float32x4_t test_vfmlalq_low_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_low_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_low_u32(a, b, c);
				}

				float32x4_t test_vfmlslq_low_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_low_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_low_u32(a, b, c);
				}

				float32x4_t test_vfmlalq_high_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_high_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_high_u32(a, b, c);
				}

				float32x4_t test_vfmlslq_high_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_high_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_high_u32(a, b, c);
				}

				// Indexed form

				float32x2_t test_vfmlal_lane_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_lane_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_lane_low_u32(a, b, c, 0);
				}

				float32x2_t test_vfmlal_lane_high_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_lane_high_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_lane_high_u32(a, b, c, 1);
				}

				float32x4_t test_vfmlalq_lane_low_u32(float32x4_t a, float16x8_t b, float16x4_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_lane_low_u32(<4 x float> %a, <8 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_lane_low_u32(a, b, c, 2);
				}

				float32x4_t test_vfmlalq_lane_high_u32(float32x4_t a, float16x8_t b, float16x4_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_lane_high_u32(<4 x float> %a, <8 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_lane_high_u32(a, b, c, 3);
				}

				float32x2_t test_vfmlal_laneq_low_u32(float32x2_t a, float16x4_t b, float16x8_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_laneq_low_u32(<2 x float> %a, <4 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> <i32 4, i32 4, i32 4, i32 4>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_laneq_low_u32(a, b, c, 4);
				}

				float32x2_t test_vfmlal_laneq_high_u32(float32x2_t a, float16x4_t b, float16x8_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlal_laneq_high_u32(<2 x float> %a, <4 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> <i32 5, i32 5, i32 5, i32 5>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlal2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlal_laneq_high_u32(a, b, c, 5);
				}

				float32x4_t test_vfmlalq_laneq_low_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_laneq_low_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> <i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_laneq_low_u32(a, b, c, 6);
				}

				float32x4_t test_vfmlalq_laneq_high_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlalq_laneq_high_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlal2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlalq_laneq_high_u32(a, b, c, 7);
				}

				float32x2_t test_vfmlsl_lane_low_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_lane_low_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_lane_low_u32(a, b, c, 0);
				}

				float32x2_t test_vfmlsl_lane_high_u32(float32x2_t a, float16x4_t b, float16x4_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_lane_high_u32(<2 x float> %a, <4 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_lane_high_u32(a, b, c, 1);
				}

				float32x4_t test_vfmlslq_lane_low_u32(float32x4_t a, float16x8_t b, float16x4_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_lane_low_u32(<4 x float> %a, <8 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_lane_low_u32(a, b, c, 2);
				}

				float32x4_t test_vfmlslq_lane_high_u32(float32x4_t a, float16x8_t b, float16x4_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_lane_high_u32(<4 x float> %a, <8 x half> %b, <4 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_lane_high_u32(a, b, c, 3);
				}

				float32x2_t test_vfmlsl_laneq_low_u32(float32x2_t a, float16x4_t b, float16x8_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_laneq_low_u32(<2 x float> %a, <4 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> <i32 4, i32 4, i32 4, i32 4>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_laneq_low_u32(a, b, c, 4);
				}

				float32x2_t test_vfmlsl_laneq_high_u32(float32x2_t a, float16x4_t b, float16x8_t c) {
				// CHECK-LABEL: define <2 x float> @test_vfmlsl_laneq_high_u32(<2 x float> %a, <4 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> <i32 5, i32 5, i32 5, i32 5>
				// CHECK: [[RESULT:%.*]] = call <2 x float> @llvm.aarch64.neon.fmlsl2.v2f32.v4f16(<2 x float> %a, <4 x half> %b, <4 x half> [[SHUFFLE]])
				// CHECK: ret <2 x float> [[RESULT]]
				return vfmlsl_laneq_high_u32(a, b, c, 5);
				}

				float32x4_t test_vfmlslq_laneq_low_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_laneq_low_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> <i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_laneq_low_u32(a, b, c, 6);
				}

				float32x4_t test_vfmlslq_laneq_high_u32(float32x4_t a, float16x8_t b, float16x8_t c) {
				// CHECK-LABEL: define <4 x float> @test_vfmlslq_laneq_high_u32(<4 x float> %a, <8 x half> %b, <8 x half> %c)
				// CHECK: [[SHUFFLE:%.*]] = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
				// CHECK: [[RESULT:%.*]] = call <4 x float> @llvm.aarch64.neon.fmlsl2.v4f32.v8f16(<4 x float> %a, <8 x half> %b, <8 x half> [[SHUFFLE]])
				// CHECK: ret <4 x float> [[RESULT]]
				return vfmlslq_laneq_high_u32(a, b, c, 7);
				}

cfe/trunk/test/Preprocessor/aarch64-target-features.c

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	// RUN: %clang -target aarch64 -mtune=cyclone -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MTUNE-CYCLONE %s			// RUN: %clang -target aarch64 -mtune=cyclone -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MTUNE-CYCLONE %s

	// RUN: %clang -target aarch64-none-linux-gnu -march=armv8-a+sve -x c -E -dM %s -o - \| FileCheck --check-prefix=CHECK-SVE %s			// RUN: %clang -target aarch64-none-linux-gnu -march=armv8-a+sve -x c -E -dM %s -o - \| FileCheck --check-prefix=CHECK-SVE %s
	// CHECK-SVE: __ARM_FEATURE_SVE 1			// CHECK-SVE: __ARM_FEATURE_SVE 1

	// RUN: %clang -target aarch64-none-linux-gnu -march=armv8.2a+dotprod -x c -E -dM %s -o - \| FileCheck --check-prefix=CHECK-DOTPROD %s			// RUN: %clang -target aarch64-none-linux-gnu -march=armv8.2a+dotprod -x c -E -dM %s -o - \| FileCheck --check-prefix=CHECK-DOTPROD %s
	// CHECK-DOTPROD: __ARM_FEATURE_DOTPROD 1			// CHECK-DOTPROD: __ARM_FEATURE_DOTPROD 1

	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// On ARMv8.2-A and above, +fp16fml implies +fp16.
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// On ARMv8.4-A and above, +fp16 implies +fp16fml.
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
				// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
				// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-FML --check-prefix=CHECK-FULLFP16-VECTOR-SCALAR %s
				// CHECK-FULLFP16-FML: #define __ARM_FEATURE_FP16FML 1
				// CHECK-FULLFP16-NOFML-NOT: #define __ARM_FEATURE_FP16FML 1
	// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1			// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
	// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1			// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
	// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FP 0xE			// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FP 0xE
	// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1			// CHECK-FULLFP16-VECTOR-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1

	// +fp16fml+nosimd doesn't make sense as the fp16fml instructions all require SIMD.			// +fp16fml+nosimd doesn't make sense as the fp16fml instructions all require SIMD.
	// However, as +fp16fml implies +fp16 there is a set of defines that we would expect.			// However, as +fp16fml implies +fp16 there is a set of defines that we would expect.
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16fml+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8-a+fp16+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16+nosimd -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-SCALAR %s
				// CHECK-FULLFP16-SCALAR-NOT: #define __ARM_FEATURE_FP16FML 1
	// CHECK-FULLFP16-SCALAR: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1			// CHECK-FULLFP16-SCALAR: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
	// CHECK-FULLFP16-SCALAR-NOT: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1			// CHECK-FULLFP16-SCALAR-NOT: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
	// CHECK-FULLFP16-SCALAR: #define __ARM_FP 0xE			// CHECK-FULLFP16-SCALAR: #define __ARM_FP 0xE
	// CHECK-FULLFP16-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1			// CHECK-FULLFP16-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1

	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16fml+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.2-a+fp16fml+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+nofp16fml -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
	// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s			// RUN: %clang -target aarch64-none-linux-gnueabi -march=armv8.4-a+fp16fml+nofp16 -x c -E -dM %s -o - \| FileCheck -match-full-lines --check-prefix=CHECK-FULLFP16-NOFML-VECTOR-SCALAR %s
				// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16FML 1
	// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1			// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC 1
	// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1			// CHECK-FULLFP16-NOFML-VECTOR-SCALAR-NOT: #define __ARM_FEATURE_FP16_VECTOR_ARITHMETIC 1
	// CHECK-FULLFP16-NOFML-VECTOR-SCALAR: #define __ARM_FP 0xE			// CHECK-FULLFP16-NOFML-VECTOR-SCALAR: #define __ARM_FP 0xE
	// CHECK-FULLFP16-NOFML-VECTOR-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1			// CHECK-FULLFP16-NOFML-VECTOR-SCALAR: #define __ARM_FP16_FORMAT_IEEE 1

	// ================== Check whether -mtune accepts mixed-case features.			// ================== Check whether -mtune accepts mixed-case features.
	// RUN: %clang -target aarch64 -mtune=CYCLONE -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MTUNE-CYCLONE %s			// RUN: %clang -target aarch64 -mtune=CYCLONE -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MTUNE-CYCLONE %s
	// CHECK-MTUNE-CYCLONE: "-cc1"{{.}} "-triple" "aarch64{{.}}" "-target-feature" "+neon" "-target-feature" "+zcm" "-target-feature" "+zcz"			// CHECK-MTUNE-CYCLONE: "-cc1"{{.}} "-triple" "aarch64{{.}}" "-target-feature" "+neon" "-target-feature" "+zcm" "-target-feature" "+zcz"

	// RUN: %clang -target aarch64 -mcpu=cyclone -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-CYCLONE %s			// RUN: %clang -target aarch64 -mcpu=cyclone -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-CYCLONE %s
	// RUN: %clang -target aarch64 -mcpu=cortex-a35 -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-A35 %s			// RUN: %clang -target aarch64 -mcpu=cortex-a35 -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-A35 %s
	// RUN: %clang -target aarch64 -mcpu=cortex-a53 -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-A53 %s			// RUN: %clang -target aarch64 -mcpu=cortex-a53 -### -c %s 2>&1 \| FileCheck -check-prefix=CHECK-MCPU-A53 %s
	▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

cfe/trunk/utils/TableGen/NeonEmitter.cpp

Show First 20 Lines • Show All 488 Lines • ▼ Show 20 Lines	private:
public:		public:
DagEmitter(Intrinsic &Intr, StringRef CallPrefix) :		DagEmitter(Intrinsic &Intr, StringRef CallPrefix) :
Intr(Intr), CallPrefix(CallPrefix) {		Intr(Intr), CallPrefix(CallPrefix) {
}		}
std::pair<Type, std::string> emitDagArg(Init *Arg, std::string ArgName);		std::pair<Type, std::string> emitDagArg(Init *Arg, std::string ArgName);
std::pair<Type, std::string> emitDagSaveTemp(DagInit *DI);		std::pair<Type, std::string> emitDagSaveTemp(DagInit *DI);
std::pair<Type, std::string> emitDagSplat(DagInit *DI);		std::pair<Type, std::string> emitDagSplat(DagInit *DI);
std::pair<Type, std::string> emitDagDup(DagInit *DI);		std::pair<Type, std::string> emitDagDup(DagInit *DI);
		std::pair<Type, std::string> emitDagDupTyped(DagInit *DI);
std::pair<Type, std::string> emitDagShuffle(DagInit *DI);		std::pair<Type, std::string> emitDagShuffle(DagInit *DI);
std::pair<Type, std::string> emitDagCast(DagInit *DI, bool IsBitCast);		std::pair<Type, std::string> emitDagCast(DagInit *DI, bool IsBitCast);
std::pair<Type, std::string> emitDagCall(DagInit *DI);		std::pair<Type, std::string> emitDagCall(DagInit *DI);
std::pair<Type, std::string> emitDagNameReplace(DagInit *DI);		std::pair<Type, std::string> emitDagNameReplace(DagInit *DI);
std::pair<Type, std::string> emitDagLiteral(DagInit *DI);		std::pair<Type, std::string> emitDagLiteral(DagInit *DI);
std::pair<Type, std::string> emitDagOp(DagInit *DI);		std::pair<Type, std::string> emitDagOp(DagInit *DI);
std::pair<Type, std::string> emitDag(DagInit *DI);		std::pair<Type, std::string> emitDag(DagInit *DI);
};		};
▲ Show 20 Lines • Show All 387 Lines • ▼ Show 20 Lines	void Type::applyModifier(char Mod) {
case 'F':		case 'F':
Float = true;		Float = true;
ElementBitwidth = 64;		ElementBitwidth = 64;
break;		break;
case 'H':		case 'H':
Float = true;		Float = true;
ElementBitwidth = 16;		ElementBitwidth = 16;
break;		break;
		case '0':
		Float = true;
		if (AppliedQuad)
		Bitwidth /= 2;
		ElementBitwidth = 16;
		break;
		case '1':
		Float = true;
		if (!AppliedQuad)
		Bitwidth *= 2;
		ElementBitwidth = 16;
		break;
case 'g':		case 'g':
if (AppliedQuad)		if (AppliedQuad)
Bitwidth /= 2;		Bitwidth /= 2;
break;		break;
case 'j':		case 'j':
if (!AppliedQuad)		if (!AppliedQuad)
Bitwidth *= 2;		Bitwidth *= 2;
break;		break;
▲ Show 20 Lines • Show All 594 Lines • ▼ Show 20 Lines	std::pair<Type, std::string> Intrinsic::DagEmitter::emitDag(DagInit *DI) {
std::string Op = DefI->getAsString();		std::string Op = DefI->getAsString();

if (Op == "cast" \|\| Op == "bitcast")		if (Op == "cast" \|\| Op == "bitcast")
return emitDagCast(DI, Op == "bitcast");		return emitDagCast(DI, Op == "bitcast");
if (Op == "shuffle")		if (Op == "shuffle")
return emitDagShuffle(DI);		return emitDagShuffle(DI);
if (Op == "dup")		if (Op == "dup")
return emitDagDup(DI);		return emitDagDup(DI);
		if (Op == "dup_typed")
		return emitDagDupTyped(DI);
if (Op == "splat")		if (Op == "splat")
return emitDagSplat(DI);		return emitDagSplat(DI);
if (Op == "save_temp")		if (Op == "save_temp")
return emitDagSaveTemp(DI);		return emitDagSaveTemp(DI);
if (Op == "op")		if (Op == "op")
return emitDagOp(DI);		return emitDagOp(DI);
if (Op == "call")		if (Op == "call")
return emitDagCall(DI);		return emitDagCall(DI);
▲ Show 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	if (I != 0)
S += ", ";		S += ", ";
S += A.second;		S += A.second;
}		}
S += "}";		S += "}";

return std::make_pair(T, S);		return std::make_pair(T, S);
}		}

		std::pair<Type, std::string> Intrinsic::DagEmitter::emitDagDupTyped(DagInit *DI) {
		assert_with_loc(DI->getNumArgs() == 2, "dup_typed() expects two arguments");
		std::pair<Type, std::string> A = emitDagArg(DI->getArg(0),
		DI->getArgNameStr(0));
		std::pair<Type, std::string> B = emitDagArg(DI->getArg(1),
		DI->getArgNameStr(1));
		assert_with_loc(B.first.isScalar(),
		"dup_typed() requires a scalar as the second argument");

		Type T = A.first;
		assert_with_loc(T.isVector(), "dup_typed() used but target type is scalar!");
		std::string S = "(" + T.str() + ") {";
		for (unsigned I = 0; I < T.getNumElements(); ++I) {
		if (I != 0)
		S += ", ";
		S += B.second;
		}
		S += "}";

		return std::make_pair(T, S);
		}

std::pair<Type, std::string> Intrinsic::DagEmitter::emitDagSplat(DagInit *DI) {		std::pair<Type, std::string> Intrinsic::DagEmitter::emitDagSplat(DagInit *DI) {
assert_with_loc(DI->getNumArgs() == 2, "splat() expects two arguments");		assert_with_loc(DI->getNumArgs() == 2, "splat() expects two arguments");
std::pair<Type, std::string> A = emitDagArg(DI->getArg(0),		std::pair<Type, std::string> A = emitDagArg(DI->getArg(0),
DI->getArgNameStr(0));		DI->getArgNameStr(0));
std::pair<Type, std::string> B = emitDagArg(DI->getArg(1),		std::pair<Type, std::string> B = emitDagArg(DI->getArg(1),
DI->getArgNameStr(1));		DI->getArgNameStr(1));

assert_with_loc(B.first.isScalar(),		assert_with_loc(B.first.isScalar(),
▲ Show 20 Lines • Show All 818 Lines • Show Last 20 Lines