This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors
ClosedPublic

Authored by dmgreen on Jan 3 2020, 4:05 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
samparker
ostannard
efriedma

Commits

rGfb8c9a339a9d: [ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors

Summary

This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing the target independent code to handle more folds in more situations (for example if the fast math flags are present, but the global AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately if needed.

Diff Detail

Event Timeline

dmgreen created this revision.Jan 3 2020, 4:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 3 2020, 4:05 AM

Herald added subscribers: hiraditya, kristof.beyls, dschuff. · View Herald Transcript

samparker added inline comments.Jan 3 2020, 6:11 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
15024	Is there a way this logic can sit in Subtarget to avoid it being a tablegen predicate as well as code here? I'm hopeless with our FP architectures... does FullFP16 infer VFP4?

dmgreen marked an inline comment as done.Jan 3 2020, 8:31 AM

dmgreen added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
15024	Yeah, that sounds good. I'll try and move it around. FullFP16 implies fp-armv8 I'm pretty sure. So at least VFP4.

Move more of the logic into Subtarget.

LGTM

This revision is now accepted and ready to land.Jan 3 2020, 9:21 AM

Closed by commit rGfb8c9a339a9d: [ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors (authored by dmgreen). · Explain WhyJan 5 2020, 4:01 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

17 lines

14 lines

2 lines

5 lines

ARMTargetTransformInfo.h

18 lines

test/

CodeGen/

ARM/

cortex-a57-misched-vfma.ll

6 lines

fp16-fullfp16.ll

2 lines

fp16-fusedMAC.ll

2 lines

Thumb2/

float-intrinsics-double.ll

2 lines

float-intrinsics-float.ll

2 lines

Diff 236017

llvm/lib/Target/ARM/ARM.td

Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	def FeatureNonpipelinedVFP : SubtargetFeature<"nonpipelined-vfp",
"VFP instructions are not pipelined">;		"VFP instructions are not pipelined">;

// Some processors have FP multiply-accumulate instructions that don't		// Some processors have FP multiply-accumulate instructions that don't
// play nicely with other VFP / NEON instructions, and it's generally better		// play nicely with other VFP / NEON instructions, and it's generally better
// to just not use them.		// to just not use them.
def FeatureHasSlowFPVMLx : SubtargetFeature<"slowfpvmlx", "SlowFPVMLx", "true",		def FeatureHasSlowFPVMLx : SubtargetFeature<"slowfpvmlx", "SlowFPVMLx", "true",
"Disable VFP / NEON MAC instructions">;		"Disable VFP / NEON MAC instructions">;

		// VFPv4 added VFMA instructions that can similar be fast or slow.
		def FeatureHasSlowFPVFMx : SubtargetFeature<"slowfpvfmx", "SlowFPVFMx", "true",
		"Disable VFP / NEON FMA instructions">;

// Cortex-A8 / A9 Advanced SIMD has multiplier accumulator forwarding.		// Cortex-A8 / A9 Advanced SIMD has multiplier accumulator forwarding.
def FeatureVMLxForwarding : SubtargetFeature<"vmlx-forwarding",		def FeatureVMLxForwarding : SubtargetFeature<"vmlx-forwarding",
"HasVMLxForwarding", "true",		"HasVMLxForwarding", "true",
"Has multiplier accumulator forwarding">;		"Has multiplier accumulator forwarding">;

// Disable 32-bit to 16-bit narrowing for experimentation.		// Disable 32-bit to 16-bit narrowing for experimentation.
def FeaturePref32BitThumb : SubtargetFeature<"32bit", "Pref32BitThumb", "true",		def FeaturePref32BitThumb : SubtargetFeature<"32bit", "Pref32BitThumb", "true",
"Prefer 32-bit Thumb instrs">;		"Prefer 32-bit Thumb instrs">;
▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	def ProcExynos : SubtargetFeature<"exynos", "ARMProcFamily", "Exynos",
FeatureSplatVFPToNeon,		FeatureSplatVFPToNeon,
FeatureSlowVGETLNi32,		FeatureSlowVGETLNi32,
FeatureSlowVDUP32,		FeatureSlowVDUP32,
FeatureSlowFPBrcc,		FeatureSlowFPBrcc,
FeatureProfUnpredicate,		FeatureProfUnpredicate,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureFuseLiterals,		FeatureFuseLiterals,
FeatureFuseAES,		FeatureFuseAES,
FeatureExpandMLx,		FeatureExpandMLx,
FeatureCrypto,		FeatureCrypto,
FeatureCRC]>;		FeatureCRC]>;

def ProcR4 : SubtargetFeature<"r4", "ARMProcFamily", "CortexR4",		def ProcR4 : SubtargetFeature<"r4", "ARMProcFamily", "CortexR4",
▲ Show 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	def : Processor<"arm1156t2f-s", ARMV6Itineraries, [ARMv6t2,
FeatureVFP2,		FeatureVFP2,
FeatureHasSlowFPVMLx]>;		FeatureHasSlowFPVMLx]>;

def : ProcessorModel<"cortex-a5", CortexA8Model, [ARMv7a, ProcA5,		def : ProcessorModel<"cortex-a5", CortexA8Model, [ARMv7a, ProcA5,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureTrustZone,		FeatureTrustZone,
FeatureSlowFPBrcc,		FeatureSlowFPBrcc,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureVMLxForwarding,		FeatureVMLxForwarding,
FeatureMP,		FeatureMP,
FeatureVFP4]>;		FeatureVFP4]>;

def : ProcessorModel<"cortex-a7", CortexA8Model, [ARMv7a, ProcA7,		def : ProcessorModel<"cortex-a7", CortexA8Model, [ARMv7a, ProcA7,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureTrustZone,		FeatureTrustZone,
FeatureSlowFPBrcc,		FeatureSlowFPBrcc,
FeatureHasVMLxHazards,		FeatureHasVMLxHazards,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureVMLxForwarding,		FeatureVMLxForwarding,
FeatureMP,		FeatureMP,
FeatureVFP4,		FeatureVFP4,
FeatureVirtualization]>;		FeatureVirtualization]>;

def : ProcessorModel<"cortex-a8", CortexA8Model, [ARMv7a, ProcA8,		def : ProcessorModel<"cortex-a8", CortexA8Model, [ARMv7a, ProcA8,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureNonpipelinedVFP,		FeatureNonpipelinedVFP,
FeatureTrustZone,		FeatureTrustZone,
FeatureSlowFPBrcc,		FeatureSlowFPBrcc,
FeatureHasVMLxHazards,		FeatureHasVMLxHazards,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureVMLxForwarding]>;		FeatureVMLxForwarding]>;

def : ProcessorModel<"cortex-a9", CortexA9Model, [ARMv7a, ProcA9,		def : ProcessorModel<"cortex-a9", CortexA9Model, [ARMv7a, ProcA9,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureTrustZone,		FeatureTrustZone,
FeatureHasVMLxHazards,		FeatureHasVMLxHazards,
FeatureVMLxForwarding,		FeatureVMLxForwarding,
FeatureFP16,		FeatureFP16,
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	def : ProcessorModel<"swift", SwiftModel, [ARMv7a, ProcSwift,
FeatureVFP4,		FeatureVFP4,
FeatureUseWideStrideVFP,		FeatureUseWideStrideVFP,
FeatureMP,		FeatureMP,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureAvoidPartialCPSR,		FeatureAvoidPartialCPSR,
FeatureAvoidMOVsShOp,		FeatureAvoidMOVsShOp,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureHasVMLxHazards,		FeatureHasVMLxHazards,
FeatureProfUnpredicate,		FeatureProfUnpredicate,
FeaturePrefISHSTBarrier,		FeaturePrefISHSTBarrier,
FeatureSlowOddRegister,		FeatureSlowOddRegister,
FeatureSlowLoadDSubreg,		FeatureSlowLoadDSubreg,
FeatureSlowVGETLNi32,		FeatureSlowVGETLNi32,
FeatureSlowVDUP32,		FeatureSlowVDUP32,
FeatureUseMISched,		FeatureUseMISched,
FeatureNoPostRASched]>;		FeatureNoPostRASched]>;

def : ProcessorModel<"cortex-r4", CortexA8Model, [ARMv7r, ProcR4,		def : ProcessorModel<"cortex-r4", CortexA8Model, [ARMv7r, ProcR4,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureAvoidPartialCPSR]>;		FeatureAvoidPartialCPSR]>;

def : ProcessorModel<"cortex-r4f", CortexA8Model, [ARMv7r, ProcR4,		def : ProcessorModel<"cortex-r4f", CortexA8Model, [ARMv7r, ProcR4,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureSlowFPBrcc,		FeatureSlowFPBrcc,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureVFP3_D16,		FeatureVFP3_D16,
FeatureAvoidPartialCPSR]>;		FeatureAvoidPartialCPSR]>;

def : ProcessorModel<"cortex-r5", CortexA8Model, [ARMv7r, ProcR5,		def : ProcessorModel<"cortex-r5", CortexA8Model, [ARMv7r, ProcR5,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureVFP3_D16,		FeatureVFP3_D16,
FeatureSlowFPBrcc,		FeatureSlowFPBrcc,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureAvoidPartialCPSR]>;		FeatureAvoidPartialCPSR]>;

def : ProcessorModel<"cortex-r7", CortexA8Model, [ARMv7r, ProcR7,		def : ProcessorModel<"cortex-r7", CortexA8Model, [ARMv7r, ProcR7,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureVFP3_D16,		FeatureVFP3_D16,
FeatureFP16,		FeatureFP16,
FeatureMP,		FeatureMP,
FeatureSlowFPBrcc,		FeatureSlowFPBrcc,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureAvoidPartialCPSR]>;		FeatureAvoidPartialCPSR]>;

def : ProcessorModel<"cortex-r8", CortexA8Model, [ARMv7r,		def : ProcessorModel<"cortex-r8", CortexA8Model, [ARMv7r,
FeatureHasRetAddrStack,		FeatureHasRetAddrStack,
FeatureVFP3_D16,		FeatureVFP3_D16,
FeatureFP16,		FeatureFP16,
FeatureMP,		FeatureMP,
FeatureSlowFPBrcc,		FeatureSlowFPBrcc,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureAvoidPartialCPSR]>;		FeatureAvoidPartialCPSR]>;

def : ProcessorModel<"cortex-m3", CortexM4Model, [ARMv7m,		def : ProcessorModel<"cortex-m3", CortexM4Model, [ARMv7m,
ProcM3,		ProcM3,
FeaturePrefLoopAlign32,		FeaturePrefLoopAlign32,
FeatureUseMISched,		FeatureUseMISched,
FeatureHasNoBranchPredictor]>;		FeatureHasNoBranchPredictor]>;

def : ProcessorModel<"sc300", CortexM4Model, [ARMv7m,		def : ProcessorModel<"sc300", CortexM4Model, [ARMv7m,
ProcM3,		ProcM3,
FeatureUseMISched,		FeatureUseMISched,
FeatureHasNoBranchPredictor]>;		FeatureHasNoBranchPredictor]>;

def : ProcessorModel<"cortex-m4", CortexM4Model, [ARMv7em,		def : ProcessorModel<"cortex-m4", CortexM4Model, [ARMv7em,
FeatureVFP4_D16_SP,		FeatureVFP4_D16_SP,
FeaturePrefLoopAlign32,		FeaturePrefLoopAlign32,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureUseMISched,		FeatureUseMISched,
FeatureHasNoBranchPredictor]>;		FeatureHasNoBranchPredictor]>;

def : ProcNoItin<"cortex-m7", [ARMv7em,		def : ProcNoItin<"cortex-m7", [ARMv7em,
FeatureFPARMv8_D16]>;		FeatureFPARMv8_D16]>;

def : ProcNoItin<"cortex-m23", [ARMv8mBaseline,		def : ProcNoItin<"cortex-m23", [ARMv8mBaseline,
FeatureNoMovt]>;		FeatureNoMovt]>;

def : ProcessorModel<"cortex-m33", CortexM4Model, [ARMv8mMainline,		def : ProcessorModel<"cortex-m33", CortexM4Model, [ARMv8mMainline,
FeatureDSP,		FeatureDSP,
FeatureFPARMv8_D16_SP,		FeatureFPARMv8_D16_SP,
FeaturePrefLoopAlign32,		FeaturePrefLoopAlign32,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureUseMISched,		FeatureUseMISched,
FeatureHasNoBranchPredictor]>;		FeatureHasNoBranchPredictor]>;

def : ProcessorModel<"cortex-m35p", CortexM4Model, [ARMv8mMainline,		def : ProcessorModel<"cortex-m35p", CortexM4Model, [ARMv8mMainline,
FeatureDSP,		FeatureDSP,
FeatureFPARMv8_D16_SP,		FeatureFPARMv8_D16_SP,
FeaturePrefLoopAlign32,		FeaturePrefLoopAlign32,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureUseMISched,		FeatureUseMISched,
FeatureHasNoBranchPredictor]>;		FeatureHasNoBranchPredictor]>;


def : ProcNoItin<"cortex-a32", [ARMv8a,		def : ProcNoItin<"cortex-a32", [ARMv8a,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureCrypto,		FeatureCrypto,
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	def : ProcessorModel<"cyclone", SwiftModel, [ARMv8a, ProcSwift,
FeatureNEONForFP,		FeatureNEONForFP,
FeatureVFP4,		FeatureVFP4,
FeatureMP,		FeatureMP,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureAvoidPartialCPSR,		FeatureAvoidPartialCPSR,
FeatureAvoidMOVsShOp,		FeatureAvoidMOVsShOp,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
		FeatureHasSlowFPVFMx,
FeatureCrypto,		FeatureCrypto,
FeatureUseMISched,		FeatureUseMISched,
FeatureZCZeroing,		FeatureZCZeroing,
FeatureNoPostRASched]>;		FeatureNoPostRASched]>;

def : ProcNoItin<"exynos-m3", [ARMv8a, ProcExynos]>;		def : ProcNoItin<"exynos-m3", [ARMv8a, ProcExynos]>;
def : ProcNoItin<"exynos-m4", [ARMv82a, ProcExynos,		def : ProcNoItin<"exynos-m4", [ARMv82a, ProcExynos,
FeatureFullFP16,		FeatureFullFP16,
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 15,007 Lines • ▼ Show 20 Lines
	/// lower a pair of fmul and fadd to the latter so it's not clear that there			/// lower a pair of fmul and fadd to the latter so it's not clear that there
	/// would be a gain or that the gain would be worthwhile enough to risk			/// would be a gain or that the gain would be worthwhile enough to risk
	/// correctness bugs.			/// correctness bugs.
	///			///
	/// For MVE, we set this to true as it helps simplify the need for some			/// For MVE, we set this to true as it helps simplify the need for some
	/// patterns (and we don't have the non-fused floating point instruction).			/// patterns (and we don't have the non-fused floating point instruction).
	bool ARMTargetLowering::isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,			bool ARMTargetLowering::isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
	EVT VT) const {			EVT VT) const {
	if (!Subtarget->hasMVEFloatOps())
	return false;

	if (!VT.isSimple())			if (!VT.isSimple())
	return false;			return false;

	switch (VT.getSimpleVT().SimpleTy) {			switch (VT.getSimpleVT().SimpleTy) {
	case MVT::v4f32:			case MVT::v4f32:
	case MVT::v8f16:			case MVT::v8f16:
	return true;			return Subtarget->hasMVEFloatOps();
				case MVT::f16:
				return Subtarget->hasFullFP16() && !Subtarget->isTargetDarwin() &&
				samparkerUnsubmitted Not Done Reply Inline Actions Is there a way this logic can sit in Subtarget to avoid it being a tablegen predicate as well as code here? I'm hopeless with our FP architectures... does FullFP16 infer VFP4? samparker: Is there a way this logic can sit in Subtarget to avoid it being a tablegen predicate as well…
				dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah, that sounds good. I'll try and move it around. FullFP16 implies fp-armv8 I'm pretty sure. So at least VFP4. dmgreen: Yeah, that sounds good. I'll try and move it around. FullFP16 implies fp-armv8 I'm pretty sure.
				Subtarget->useFPVFMx();
				case MVT::f32:
				return Subtarget->hasVFP4Base() && !Subtarget->isTargetDarwin() &&
				Subtarget->useFPVFMx();
				case MVT::f64:
				return Subtarget->hasVFP4Base() && Subtarget->hasFP64() &&
				!Subtarget->isTargetDarwin() && Subtarget->useFPVFMx();
	default:			default:
	break;			break;
	}			}

	return false;			return false;
	}			}

	static bool isLegalT1AddressImmediate(int64_t V, EVT VT) {			static bool isLegalT1AddressImmediate(int64_t V, EVT VT) {
	▲ Show 20 Lines • Show All 2,376 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMPredicates.td

	Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
	// Prefer fused MAC for fp mul + add over fp VMLA / VMLS if they are available.			// Prefer fused MAC for fp mul + add over fp VMLA / VMLS if they are available.
	// But only select them if more precision in FP computation is allowed, and when			// But only select them if more precision in FP computation is allowed, and when
	// they are not slower than a mul + add sequence.			// they are not slower than a mul + add sequence.
	// Do not use them for Darwin platforms.			// Do not use them for Darwin platforms.
	def UseFusedMAC : Predicate<"(TM.Options.AllowFPOpFusion =="			def UseFusedMAC : Predicate<"(TM.Options.AllowFPOpFusion =="
	" FPOpFusion::Fast && "			" FPOpFusion::Fast && "
	" Subtarget->hasVFP4Base()) && "			" Subtarget->hasVFP4Base()) && "
	"!Subtarget->isTargetDarwin() &&"			"!Subtarget->isTargetDarwin() &&"
	"Subtarget->useFPVMLx()">;			"Subtarget->useFPVFMx()">;

	def HasFastVGETLNi32 : Predicate<"!Subtarget->hasSlowVGETLNi32()">;			def HasFastVGETLNi32 : Predicate<"!Subtarget->hasSlowVGETLNi32()">;
	def HasSlowVGETLNi32 : Predicate<"Subtarget->hasSlowVGETLNi32()">;			def HasSlowVGETLNi32 : Predicate<"Subtarget->hasSlowVGETLNi32()">;

	def HasFastVDUP32 : Predicate<"!Subtarget->hasSlowVDUP32()">;			def HasFastVDUP32 : Predicate<"!Subtarget->hasSlowVDUP32()">;
	def HasSlowVDUP32 : Predicate<"Subtarget->hasSlowVDUP32()">;			def HasSlowVDUP32 : Predicate<"Subtarget->hasSlowVDUP32()">;

	def UseVMOVSR : Predicate<"Subtarget->preferVMOVSR() \|\|"			def UseVMOVSR : Predicate<"Subtarget->preferVMOVSR() \|\|"
	Show All 14 Lines

llvm/lib/Target/ARM/ARMSubtarget.h

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	protected:
/// UseMulOps - True if non-microcoded fused integer multiply-add and		/// UseMulOps - True if non-microcoded fused integer multiply-add and
/// multiply-subtract instructions should be used.		/// multiply-subtract instructions should be used.
bool UseMulOps = false;		bool UseMulOps = false;

/// SlowFPVMLx - If the VFP2 / NEON instructions are available, indicates		/// SlowFPVMLx - If the VFP2 / NEON instructions are available, indicates
/// whether the FP VML[AS] instructions are slow (if so, don't use them).		/// whether the FP VML[AS] instructions are slow (if so, don't use them).
bool SlowFPVMLx = false;		bool SlowFPVMLx = false;

		/// SlowFPVFMx - If the VFP4 / NEON instructions are available, indicates
		/// whether the FP VFM[AS] instructions are slow (if so, don't use them).
		bool SlowFPVFMx = false;

/// HasVMLxForwarding - If true, NEON has special multiplier accumulator		/// HasVMLxForwarding - If true, NEON has special multiplier accumulator
/// forwarding to allow mul + mla being issued back to back.		/// forwarding to allow mul + mla being issued back to back.
bool HasVMLxForwarding = false;		bool HasVMLxForwarding = false;

/// SlowFPBrcc - True if floating point compare + branch is slow.		/// SlowFPBrcc - True if floating point compare + branch is slow.
bool SlowFPBrcc = false;		bool SlowFPBrcc = false;

/// InThumbMode - True if compiling for Thumb, false for ARM.		/// InThumbMode - True if compiling for Thumb, false for ARM.
▲ Show 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	public:
bool hasAcquireRelease() const { return HasAcquireRelease; }		bool hasAcquireRelease() const { return HasAcquireRelease; }

bool hasAnyDataBarrier() const {		bool hasAnyDataBarrier() const {
return HasDataBarrier \|\| (hasV6Ops() && !isThumb());		return HasDataBarrier \|\| (hasV6Ops() && !isThumb());
}		}

bool useMulOps() const { return UseMulOps; }		bool useMulOps() const { return UseMulOps; }
bool useFPVMLx() const { return !SlowFPVMLx; }		bool useFPVMLx() const { return !SlowFPVMLx; }
		bool useFPVFMx() const { return !SlowFPVFMx; }
bool hasVMLxForwarding() const { return HasVMLxForwarding; }		bool hasVMLxForwarding() const { return HasVMLxForwarding; }
bool isFPBrccSlow() const { return SlowFPBrcc; }		bool isFPBrccSlow() const { return SlowFPBrcc; }
bool hasFP64() const { return HasFP64; }		bool hasFP64() const { return HasFP64; }
bool hasPerfMon() const { return HasPerfMon; }		bool hasPerfMon() const { return HasPerfMon; }
bool hasTrustZone() const { return HasTrustZone; }		bool hasTrustZone() const { return HasTrustZone; }
bool has8MSecExt() const { return Has8MSecExt; }		bool has8MSecExt() const { return Has8MSecExt; }
bool hasZeroCycleZeroing() const { return HasZeroCycleZeroing; }		bool hasZeroCycleZeroing() const { return HasZeroCycleZeroing; }
bool hasFPAO() const { return HasFPAO; }		bool hasFPAO() const { return HasFPAO; }
▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	const FeatureBitset InlineFeatureWhitelist = {
ARM::FeatureFPAO, ARM::FeatureFuseAES, ARM::FeatureZCZeroing,		ARM::FeatureFPAO, ARM::FeatureFuseAES, ARM::FeatureZCZeroing,
ARM::FeatureProfUnpredicate, ARM::FeatureSlowVGETLNi32,		ARM::FeatureProfUnpredicate, ARM::FeatureSlowVGETLNi32,
ARM::FeatureSlowVDUP32, ARM::FeaturePreferVMOVSR,		ARM::FeatureSlowVDUP32, ARM::FeaturePreferVMOVSR,
ARM::FeaturePrefISHSTBarrier, ARM::FeatureMuxedUnits,		ARM::FeaturePrefISHSTBarrier, ARM::FeatureMuxedUnits,
ARM::FeatureSlowOddRegister, ARM::FeatureSlowLoadDSubreg,		ARM::FeatureSlowOddRegister, ARM::FeatureSlowLoadDSubreg,
ARM::FeatureDontWidenVMOVS, ARM::FeatureExpandMLx,		ARM::FeatureDontWidenVMOVS, ARM::FeatureExpandMLx,
ARM::FeatureHasVMLxHazards, ARM::FeatureNEONForFPMovs,		ARM::FeatureHasVMLxHazards, ARM::FeatureNEONForFPMovs,
ARM::FeatureNEONForFP, ARM::FeatureCheckVLDnAlign,		ARM::FeatureNEONForFP, ARM::FeatureCheckVLDnAlign,
ARM::FeatureHasSlowFPVMLx, ARM::FeatureVMLxForwarding,		ARM::FeatureHasSlowFPVMLx, ARM::FeatureHasSlowFPVFMx,
ARM::FeaturePref32BitThumb, ARM::FeatureAvoidPartialCPSR,		ARM::FeatureVMLxForwarding, ARM::FeaturePref32BitThumb,
ARM::FeatureCheapPredicableCPSR, ARM::FeatureAvoidMOVsShOp,		ARM::FeatureAvoidPartialCPSR, ARM::FeatureCheapPredicableCPSR,
ARM::FeatureHasRetAddrStack, ARM::FeatureHasNoBranchPredictor,		ARM::FeatureAvoidMOVsShOp, ARM::FeatureHasRetAddrStack,
ARM::FeatureDSP, ARM::FeatureMP, ARM::FeatureVirtualization,		ARM::FeatureHasNoBranchPredictor, ARM::FeatureDSP, ARM::FeatureMP,
ARM::FeatureMClass, ARM::FeatureRClass, ARM::FeatureAClass,		ARM::FeatureVirtualization, ARM::FeatureMClass, ARM::FeatureRClass,
ARM::FeatureNaClTrap, ARM::FeatureStrictAlign, ARM::FeatureLongCalls,		ARM::FeatureAClass, ARM::FeatureNaClTrap, ARM::FeatureStrictAlign,
ARM::FeatureExecuteOnly, ARM::FeatureReserveR9, ARM::FeatureNoMovt,		ARM::FeatureLongCalls, ARM::FeatureExecuteOnly, ARM::FeatureReserveR9,
ARM::FeatureNoNegativeImmediates		ARM::FeatureNoMovt, ARM::FeatureNoNegativeImmediates
};		};

const ARMSubtarget *getST() const { return ST; }		const ARMSubtarget *getST() const { return ST; }
const ARMTargetLowering *getTLI() const { return TLI; }		const ARMTargetLowering *getTLI() const { return TLI; }

public:		public:
explicit ARMTTIImpl(const ARMBaseTargetMachine *TM, const Function &F)		explicit ARMTTIImpl(const ARMBaseTargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),		: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/cortex-a57-misched-vfma.ll

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; > VMULS common latency = 5			; > VMULS common latency = 5
	; CHECK: Latency : 5			; CHECK: Latency : 5
	; CHECK: Successors:			; CHECK: Successors:
	; CHECK: Data			; CHECK: Data
	; > VMULS read-advanced latency to VMLSS = 0			; > VMULS read-advanced latency to VMLSS = 0
	; CHECK-SAME: Latency=0			; CHECK-SAME: Latency=0

	; CHECK-DEFAULT: VMLSS			; CHECK-DEFAULT: VMLSS
	; CHECK-FAST: VFMSS			; CHECK-FAST: VFNMSS
	; > VMLSS common latency = 9			; > VFNMSS common latency = 9
	; CHECK: Latency : 9			; CHECK: Latency : 9
	; CHECK: Successors:			; CHECK: Successors:
	; CHECK: Data			; CHECK: Data
	; > VMLSS read-advanced latency to the next VMLSS = 4			; > VFNMSS read-advanced latency to the next VMLSS = 4
	; CHECK-SAME: Latency=4			; CHECK-SAME: Latency=4

	; CHECK-DEFAULT: VMLSS			; CHECK-DEFAULT: VMLSS
	; CHECK-FAST: VFMSS			; CHECK-FAST: VFMSS
	; CHECK: Latency : 9			; CHECK: Latency : 9
	; CHECK: Successors:			; CHECK: Successors:
	; CHECK: Data			; CHECK: Data
	; > VMLSS not-optimized latency to VMOVRS = 9			; > VMLSS not-optimized latency to VMOVRS = 9
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/fp16-fullfp16.ll

Show First 20 Lines • Show All 565 Lines • ▼ Show 20 Lines	; CHECK-NEXT: bx lr
ret void		ret void
}		}

define void @test_fmuladd(half* %p, half* %q, half* %r) {		define void @test_fmuladd(half* %p, half* %q, half* %r) {
; CHECK-LABEL: test_fmuladd:		; CHECK-LABEL: test_fmuladd:
; CHECK: vldr.16 s0, [r1]		; CHECK: vldr.16 s0, [r1]
; CHECK-NEXT: vldr.16 s2, [r0]		; CHECK-NEXT: vldr.16 s2, [r0]
; CHECK-NEXT: vldr.16 s4, [r2]		; CHECK-NEXT: vldr.16 s4, [r2]
; CHECK-NEXT: vmla.f16 s4, s2, s0		; CHECK-NEXT: vfma.f16 s4, s2, s0
; CHECK-NEXT: vstr.16 s4, [r0]		; CHECK-NEXT: vstr.16 s4, [r0]
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%a = load half, half* %p, align 2		%a = load half, half* %p, align 2
%b = load half, half* %q, align 2		%b = load half, half* %q, align 2
%c = load half, half* %r, align 2		%c = load half, half* %r, align 2
%v = call half @llvm.fmuladd.f16(half %a, half %b, half %c)		%v = call half @llvm.fmuladd.f16(half %a, half %b, half %c)
store half %v, half* %p		store half %v, half* %p
ret void		ret void
Show All 24 Lines

llvm/test/CodeGen/ARM/fp16-fusedMAC.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=thumbv8.1-m-none-eabi -mattr=+fullfp16 -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -mtriple=thumbv8.1-m-none-eabi -mattr=+fullfp16 -fp-contract=fast \| FileCheck %s
	; RUN: llc < %s -mtriple=thumbv8.1-m-none-eabi -mattr=+fullfp16,+slowfpvmlx -fp-contract=fast \| FileCheck %s -check-prefix=DONT-FUSE			; RUN: llc < %s -mtriple=thumbv8.1-m-none-eabi -mattr=+fullfp16,+slowfpvfmx -fp-contract=fast \| FileCheck %s -check-prefix=DONT-FUSE

	; Check generated fp16 fused MAC and MLS.			; Check generated fp16 fused MAC and MLS.

	define arm_aapcs_vfpcc void @fusedMACTest2(half %a1, half %a2, half *%a3) {			define arm_aapcs_vfpcc void @fusedMACTest2(half %a1, half %a2, half *%a3) {
	; CHECK-LABEL: fusedMACTest2:			; CHECK-LABEL: fusedMACTest2:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
	; CHECK-NEXT: vldr.16 s0, [r1]			; CHECK-NEXT: vldr.16 s0, [r1]
	; CHECK-NEXT: vldr.16 s2, [r0]			; CHECK-NEXT: vldr.16 s2, [r0]
	▲ Show 20 Lines • Show All 418 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/float-intrinsics-double.ll

	Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines

	declare double @llvm.fmuladd.f64(double %a, double %b, double %c)			declare double @llvm.fmuladd.f64(double %a, double %b, double %c)
	define double @fmuladd_d(double %a, double %b, double %c) {			define double @fmuladd_d(double %a, double %b, double %c) {
	; CHECK-LABEL: fmuladd_d:			; CHECK-LABEL: fmuladd_d:
	; SOFT: bl __aeabi_dmul			; SOFT: bl __aeabi_dmul
	; SOFT: bl __aeabi_dadd			; SOFT: bl __aeabi_dadd
	; VFP4: vmul.f64			; VFP4: vmul.f64
	; VFP4: vadd.f64			; VFP4: vadd.f64
	; FP-ARMv8: vmla.f64			; FP-ARMv8: vfma.f64
	%1 = call double @llvm.fmuladd.f64(double %a, double %b, double %c)			%1 = call double @llvm.fmuladd.f64(double %a, double %b, double %c)
	ret double %1			ret double %1
	}			}

	declare i16 @llvm.convert.to.fp16.f64(double %a)			declare i16 @llvm.convert.to.fp16.f64(double %a)
	define i16 @d_to_h(double %a) {			define i16 @d_to_h(double %a) {
	; CHECK-LABEL: d_to_h:			; CHECK-LABEL: d_to_h:
	; SOFT: bl __aeabi_d2h			; SOFT: bl __aeabi_d2h
	Show All 19 Lines

llvm/test/CodeGen/Thumb2/float-intrinsics-float.ll

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	; FP-ARMv8: vrinta.f32
ret float %1		ret float %1
}		}

declare float @llvm.fmuladd.f32(float %a, float %b, float %c)		declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
define float @fmuladd_f(float %a, float %b, float %c) {		define float @fmuladd_f(float %a, float %b, float %c) {
; CHECK-LABEL: fmuladd_f:		; CHECK-LABEL: fmuladd_f:
; SOFT: bl __aeabi_fmul		; SOFT: bl __aeabi_fmul
; SOFT: bl __aeabi_fadd		; SOFT: bl __aeabi_fadd
; VMLA: vmla.f32		; VMLA: vfma.f32
; NO-VMLA: vmul.f32		; NO-VMLA: vmul.f32
; NO-VMLA: vadd.f32		; NO-VMLA: vadd.f32
%1 = call float @llvm.fmuladd.f32(float %a, float %b, float %c)		%1 = call float @llvm.fmuladd.f32(float %a, float %b, float %c)
ret float %1		ret float %1
}		}

declare i16 @llvm.convert.to.fp16.f32(float %a)		declare i16 @llvm.convert.to.fp16.f32(float %a)
define i16 @f_to_h(float %a) {		define i16 @f_to_h(float %a) {
Show All 15 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 236017

llvm/lib/Target/ARM/ARM.td

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/lib/Target/ARM/ARMPredicates.td

llvm/lib/Target/ARM/ARMSubtarget.h

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/test/CodeGen/ARM/cortex-a57-misched-vfma.ll

llvm/test/CodeGen/ARM/fp16-fullfp16.ll

llvm/test/CodeGen/ARM/fp16-fusedMAC.ll

llvm/test/CodeGen/Thumb2/float-intrinsics-double.ll

llvm/test/CodeGen/Thumb2/float-intrinsics-float.ll

[ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors
ClosedPublic