This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add some missing fusion subtarget features
ClosedPublic

Authored by porglezomp on Jan 11 2021, 2:55 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
SjoerdMeijer

Commits

rG815dd4b29208: [AArch64] Add Cortex CPU subtarget features for instruction fusion.

Summary

Referencing ARM's software optimization guides:

A65 - 4.8 Instruction fusion

Address, AES, and MOVZ/MOVK literals

A72 - 4.11 Fast literal generation

4.12 PC-relative address calculation

A76 - 4.6. AES Encryption/Decryption
A77/A78/A78C/X1 - 4.13 Instruction fusion

CMP/CMN, TST, BICS + B.cond fusion
AES fusion

[AArch64] Make Cortex B.cc fusions more precise

The ArithmeticBccFusion feature expects to be able to fuse general
flag-updating arithmetic with a B.cc, for example an arbitrary SUBS
instructions and not just a CMP.

Since the Cortex cores are documented as fusing CMP/CMN/TST, and the A77
optimization guide specifies that BICS fusion must have a destination of
XZR or WZR, these cores should use a separate subtarget feature for
specifically fusing only comparisons with B.cc.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

porglezomp created this revision.Jan 11 2021, 2:55 PM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptJan 11 2021, 2:55 PM

porglezomp requested review of this revision.Jan 11 2021, 2:55 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2021, 2:55 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

The existing fusion tests seem a little bit weak—the machine scheduler with the base Cortex-A57 machine models that most of those use seemed to pass the address and literal fusion on its own (by coincidence?) without the explicit features requested. I was unable to make that test case stronger to see the A57 machine model not exploiting those fusion pairs when the feature was disabled. Is that worth worrying about?

Harbormaster completed remote builds in B84768: Diff 315945.Jan 11 2021, 3:46 PM

Thank, LGTM.

In D94457#2491639, @porglezomp wrote:

The existing fusion tests seem a little bit weak—the machine scheduler with the base Cortex-A57 machine models that most of those use seemed to pass the address and literal fusion on its own (by coincidence?) without the explicit features requested. I was unable to make that test case stronger to see the A57 machine model not exploiting those fusion pairs when the feature was disabled. Is that worth worrying about?

Yeah, perhaps there's room for improvement, it's not something that worries me very much.

llvm/lib/Target/AArch64/AArch64.td
221	Yep, makes sense I think.
624	Check
637	Check
669	Check
683	Check
694	Check
711	Check
739	Check

This revision is now accepted and ready to land.Jan 21 2021, 6:10 AM

I don't have commit access, so could someone commit the change for me? Thanks.

Closed by commit rG815dd4b29208: [AArch64] Add Cortex CPU subtarget features for instruction fusion. (authored by SjoerdMeijer). · Explain WhyJan 25 2021, 1:12 AM

This revision was automatically updated to reflect the committed changes.

SjoerdMeijer added a commit: rG815dd4b29208: [AArch64] Add Cortex CPU subtarget features for instruction fusion..

Thanks for contributing this. I have committed this on your behalf in rG815dd4b29208.
If you plan to do more LLVM work, you can always consider requesting an account so you can commit patches yourself (but am of course happy to do it on your behalf).

@SjoerdMeijer, @porglezomp - many thanks in advance,
I'd like to clarify one aspect related to this diff- is it intentional that currently only pairs of cmp + bcc get fused (but e.g. not cmp + cbz or cmp + cbnz) (unless I'm missing something)
and another question - after looking at https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/aarch64.cc#L1326 - would it make sense to enable CmpBccFusion by default.

cc: @dmgreen

Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2022, 4:57 PM

I have for a while thought that we are missing a number of tuning features for AArch64 cores, but the (very brief) look into enabling more fusion on more cores ended up showing worse performance. The changes were not very large IIRC, but I worry that the more aggressive fusion was forcing instruction into worse positions. It may have just been that it was unlucky with noise. We get a lot of fusion naturally by the way the scheduler positions cmp/br and cmp/csel.

The optimization guides usually list the instructions that can be merged. If you have some evidence that any of them are producing better performance then that sounds like a useful patch. The performance number I got were done too quickly to draw any strong conclusions from.

As for the questions you actually asked - cbz and cbnz take a reg under aarch64 so don't have a separate cmp. For adding to -mcpu=generic, the general principle is that is needs to generally improve performance without hurting any other cores. Especially around the big-little cpus found in android phones. (Or it enables other optimizations like the linker relaxation from FeatureFuseAdrpAdd). GCC believes it is a benefit or benign, but they may have a slightly different algorithm for deciding when to fuse, I'm not sure. So long as we have some results that show it's improving things or flat for a selection of cores (big and little), then it should be OK.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64.td

14 lines

AArch64MacroFusion.cpp

16 lines

AArch64Subtarget.h

2 lines

test/

CodeGen/

AArch64/

misched-fusion-addr.ll

1 line

misched-fusion-aes.ll

6 lines

misched-fusion-lit.ll

2 lines

Diff 318918

llvm/lib/Target/AArch64/AArch64.td

Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines
def FeatureArithmeticBccFusion : SubtargetFeature<		def FeatureArithmeticBccFusion : SubtargetFeature<
"arith-bcc-fusion", "HasArithmeticBccFusion", "true",		"arith-bcc-fusion", "HasArithmeticBccFusion", "true",
"CPU fuses arithmetic+bcc operations">;		"CPU fuses arithmetic+bcc operations">;

def FeatureArithmeticCbzFusion : SubtargetFeature<		def FeatureArithmeticCbzFusion : SubtargetFeature<
"arith-cbz-fusion", "HasArithmeticCbzFusion", "true",		"arith-cbz-fusion", "HasArithmeticCbzFusion", "true",
"CPU fuses arithmetic + cbz/cbnz operations">;		"CPU fuses arithmetic + cbz/cbnz operations">;

		def FeatureCmpBccFusion : SubtargetFeature<
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Yep, makes sense I think. SjoerdMeijer: Yep, makes sense I think.
		"cmp-bcc-fusion", "HasCmpBccFusion", "true",
		"CPU fuses cmp+bcc operations">;

def FeatureFuseAddress : SubtargetFeature<		def FeatureFuseAddress : SubtargetFeature<
"fuse-address", "HasFuseAddress", "true",		"fuse-address", "HasFuseAddress", "true",
"CPU fuses address generation and memory operations">;		"CPU fuses address generation and memory operations">;

def FeatureFuseAES : SubtargetFeature<		def FeatureFuseAES : SubtargetFeature<
"fuse-aes", "HasFuseAES", "true",		"fuse-aes", "HasFuseAES", "true",
"CPU fuses AES crypto operations">;		"CPU fuses AES crypto operations">;

▲ Show 20 Lines • Show All 381 Lines • ▼ Show 20 Lines

def ProcA65 : SubtargetFeature<"a65", "ARMProcFamily", "CortexA65",		def ProcA65 : SubtargetFeature<"a65", "ARMProcFamily", "CortexA65",
"Cortex-A65 ARM processors", [		"Cortex-A65 ARM processors", [
HasV8_2aOps,		HasV8_2aOps,
FeatureCrypto,		FeatureCrypto,
FeatureDotProd,		FeatureDotProd,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFullFP16,		FeatureFullFP16,
		FeatureFuseAddress,
		FeatureFuseAES,
		FeatureFuseLiterals,
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Check SjoerdMeijer: Check
FeatureNEON,		FeatureNEON,
FeatureRAS,		FeatureRAS,
FeatureRCPC,		FeatureRCPC,
FeatureSSBS,		FeatureSSBS,
]>;		]>;

def ProcA72 : SubtargetFeature<"a72", "ARMProcFamily", "CortexA72",		def ProcA72 : SubtargetFeature<"a72", "ARMProcFamily", "CortexA72",
"Cortex-A72 ARM processors", [		"Cortex-A72 ARM processors", [
FeatureCRC,		FeatureCRC,
FeatureCrypto,		FeatureCrypto,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFuseAES,		FeatureFuseAES,
		FeatureFuseLiterals,
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Check SjoerdMeijer: Check
FeatureNEON,		FeatureNEON,
FeaturePerfMon		FeaturePerfMon
]>;		]>;

def ProcA73 : SubtargetFeature<"a73", "ARMProcFamily", "CortexA73",		def ProcA73 : SubtargetFeature<"a73", "ARMProcFamily", "CortexA73",
"Cortex-A73 ARM processors", [		"Cortex-A73 ARM processors", [
FeatureCRC,		FeatureCRC,
FeatureCrypto,		FeatureCrypto,
Show All 15 Lines	def ProcA75 : SubtargetFeature<"a75", "ARMProcFamily", "CortexA75",
FeatureRCPC,		FeatureRCPC,
FeaturePerfMon		FeaturePerfMon
]>;		]>;

def ProcA76 : SubtargetFeature<"a76", "ARMProcFamily", "CortexA76",		def ProcA76 : SubtargetFeature<"a76", "ARMProcFamily", "CortexA76",
"Cortex-A76 ARM processors", [		"Cortex-A76 ARM processors", [
HasV8_2aOps,		HasV8_2aOps,
FeatureFPARMv8,		FeatureFPARMv8,
		FeatureFuseAES,
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Check SjoerdMeijer: Check
FeatureNEON,		FeatureNEON,
FeatureRCPC,		FeatureRCPC,
FeatureCrypto,		FeatureCrypto,
FeatureFullFP16,		FeatureFullFP16,
FeatureDotProd,		FeatureDotProd,
FeatureSSBS		FeatureSSBS
]>;		]>;

def ProcA77 : SubtargetFeature<"a77", "ARMProcFamily", "CortexA77",		def ProcA77 : SubtargetFeature<"a77", "ARMProcFamily", "CortexA77",
"Cortex-A77 ARM processors", [		"Cortex-A77 ARM processors", [
HasV8_2aOps,		HasV8_2aOps,
		FeatureCmpBccFusion,
FeatureFPARMv8,		FeatureFPARMv8,
		FeatureFuseAES,
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Check SjoerdMeijer: Check
FeatureNEON, FeatureRCPC,		FeatureNEON, FeatureRCPC,
FeatureCrypto,		FeatureCrypto,
FeatureFullFP16,		FeatureFullFP16,
FeatureDotProd		FeatureDotProd
]>;		]>;

def ProcA78 : SubtargetFeature<"cortex-a78", "ARMProcFamily",		def ProcA78 : SubtargetFeature<"cortex-a78", "ARMProcFamily",
"CortexA78",		"CortexA78",
"Cortex-A78 ARM processors", [		"Cortex-A78 ARM processors", [
HasV8_2aOps,		HasV8_2aOps,
		FeatureCmpBccFusion,
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Check SjoerdMeijer: Check
FeatureCrypto,		FeatureCrypto,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFuseAES,		FeatureFuseAES,
FeatureNEON,		FeatureNEON,
FeatureRCPC,		FeatureRCPC,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureSPE,		FeatureSPE,
FeatureFullFP16,		FeatureFullFP16,
FeatureSSBS,		FeatureSSBS,
FeatureDotProd]>;		FeatureDotProd]>;

def ProcA78C : SubtargetFeature<"cortex-a78c", "ARMProcFamily",		def ProcA78C : SubtargetFeature<"cortex-a78c", "ARMProcFamily",
"CortexA78C",		"CortexA78C",
"Cortex-A78C ARM processors", [		"Cortex-A78C ARM processors", [
HasV8_2aOps,		HasV8_2aOps,
		FeatureCmpBccFusion,
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Check SjoerdMeijer: Check
FeatureCrypto,		FeatureCrypto,
FeatureDotProd,		FeatureDotProd,
FeatureFlagM,		FeatureFlagM,
FeatureFP16FML,		FeatureFP16FML,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFullFP16,		FeatureFullFP16,
FeatureFuseAES,		FeatureFuseAES,
FeatureNEON,		FeatureNEON,
Show All 11 Lines	def ProcR82 : SubtargetFeature<"cortex-r82", "ARMProcFamily",
// TODO: crypto and FuseAES		// TODO: crypto and FuseAES
// All other features are implied by v8_0r ops:		// All other features are implied by v8_0r ops:
HasV8_0rOps,		HasV8_0rOps,
]>;		]>;

def ProcX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1",		def ProcX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1",
"Cortex-X1 ARM processors", [		"Cortex-X1 ARM processors", [
HasV8_2aOps,		HasV8_2aOps,
		FeatureCmpBccFusion,
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Check SjoerdMeijer: Check
FeatureCrypto,		FeatureCrypto,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFuseAES,		FeatureFuseAES,
FeatureNEON,		FeatureNEON,
FeatureRCPC,		FeatureRCPC,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureSPE,		FeatureSPE,
▲ Show 20 Lines • Show All 516 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64MacroFusion.cpp

	Show All 15 Lines
	#include "llvm/CodeGen/TargetInstrInfo.h"			#include "llvm/CodeGen/TargetInstrInfo.h"

	using namespace llvm;			using namespace llvm;

	namespace {			namespace {

	/// CMN, CMP, TST followed by Bcc			/// CMN, CMP, TST followed by Bcc
	static bool isArithmeticBccPair(const MachineInstr *FirstMI,			static bool isArithmeticBccPair(const MachineInstr *FirstMI,
	const MachineInstr &SecondMI) {			const MachineInstr &SecondMI, bool CmpOnly) {
	if (SecondMI.getOpcode() != AArch64::Bcc)			if (SecondMI.getOpcode() != AArch64::Bcc)
	return false;			return false;

	// Assume the 1st instr to be a wildcard if it is unspecified.			// Assume the 1st instr to be a wildcard if it is unspecified.
	if (FirstMI == nullptr)			if (FirstMI == nullptr)
	return true;			return true;

				// If we're in CmpOnly mode, we only fuse arithmetic instructions that
				// discard their result.
				if (CmpOnly && !(FirstMI->getOperand(0).getReg() == AArch64::XZR \|\|
				FirstMI->getOperand(0).getReg() == AArch64::WZR)) {
				return false;
				}

	switch (FirstMI->getOpcode()) {			switch (FirstMI->getOpcode()) {
	case AArch64::ADDSWri:			case AArch64::ADDSWri:
	case AArch64::ADDSWrr:			case AArch64::ADDSWrr:
	case AArch64::ADDSXri:			case AArch64::ADDSXri:
	case AArch64::ADDSXrr:			case AArch64::ADDSXrr:
	case AArch64::ANDSWri:			case AArch64::ANDSWri:
	case AArch64::ANDSWrr:			case AArch64::ANDSWrr:
	case AArch64::ANDSXri:			case AArch64::ANDSXri:
	▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines
	static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,			static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
	const TargetSubtargetInfo &TSI,			const TargetSubtargetInfo &TSI,
	const MachineInstr *FirstMI,			const MachineInstr *FirstMI,
	const MachineInstr &SecondMI) {			const MachineInstr &SecondMI) {
	const AArch64Subtarget &ST = static_cast<const AArch64Subtarget&>(TSI);			const AArch64Subtarget &ST = static_cast<const AArch64Subtarget&>(TSI);

	// All checking functions assume that the 1st instr is a wildcard if it is			// All checking functions assume that the 1st instr is a wildcard if it is
	// unspecified.			// unspecified.
	if (ST.hasArithmeticBccFusion() && isArithmeticBccPair(FirstMI, SecondMI))			if (ST.hasCmpBccFusion() \|\| ST.hasArithmeticBccFusion()) {
				bool CmpOnly = !ST.hasArithmeticBccFusion();
				if (isArithmeticBccPair(FirstMI, SecondMI, CmpOnly))
	return true;			return true;
				}
	if (ST.hasArithmeticCbzFusion() && isArithmeticCbzPair(FirstMI, SecondMI))			if (ST.hasArithmeticCbzFusion() && isArithmeticCbzPair(FirstMI, SecondMI))
	return true;			return true;
	if (ST.hasFuseAES() && isAESPair(FirstMI, SecondMI))			if (ST.hasFuseAES() && isAESPair(FirstMI, SecondMI))
	return true;			return true;
	if (ST.hasFuseCryptoEOR() && isCryptoEORPair(FirstMI, SecondMI))			if (ST.hasFuseCryptoEOR() && isCryptoEORPair(FirstMI, SecondMI))
	return true;			return true;
	if (ST.hasFuseLiterals() && isLiteralsPair(FirstMI, SecondMI))			if (ST.hasFuseLiterals() && isLiteralsPair(FirstMI, SecondMI))
	return true;			return true;
	Show All 20 Lines

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	protected:
bool ExynosAsCheapAsMove = false;		bool ExynosAsCheapAsMove = false;
bool UsePostRAScheduler = false;		bool UsePostRAScheduler = false;
bool Misaligned128StoreIsSlow = false;		bool Misaligned128StoreIsSlow = false;
bool Paired128IsSlow = false;		bool Paired128IsSlow = false;
bool STRQroIsSlow = false;		bool STRQroIsSlow = false;
bool UseAlternateSExtLoadCVTF32Pattern = false;		bool UseAlternateSExtLoadCVTF32Pattern = false;
bool HasArithmeticBccFusion = false;		bool HasArithmeticBccFusion = false;
bool HasArithmeticCbzFusion = false;		bool HasArithmeticCbzFusion = false;
		bool HasCmpBccFusion = false;
bool HasFuseAddress = false;		bool HasFuseAddress = false;
bool HasFuseAES = false;		bool HasFuseAES = false;
bool HasFuseArithmeticLogic = false;		bool HasFuseArithmeticLogic = false;
bool HasFuseCCSelect = false;		bool HasFuseCCSelect = false;
bool HasFuseCryptoEOR = false;		bool HasFuseCryptoEOR = false;
bool HasFuseLiterals = false;		bool HasFuseLiterals = false;
bool DisableLatencySchedHeuristic = false;		bool DisableLatencySchedHeuristic = false;
bool UseRSqrt = false;		bool UseRSqrt = false;
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	public:
bool isMisaligned128StoreSlow() const { return Misaligned128StoreIsSlow; }		bool isMisaligned128StoreSlow() const { return Misaligned128StoreIsSlow; }
bool isPaired128Slow() const { return Paired128IsSlow; }		bool isPaired128Slow() const { return Paired128IsSlow; }
bool isSTRQroSlow() const { return STRQroIsSlow; }		bool isSTRQroSlow() const { return STRQroIsSlow; }
bool useAlternateSExtLoadCVTF32Pattern() const {		bool useAlternateSExtLoadCVTF32Pattern() const {
return UseAlternateSExtLoadCVTF32Pattern;		return UseAlternateSExtLoadCVTF32Pattern;
}		}
bool hasArithmeticBccFusion() const { return HasArithmeticBccFusion; }		bool hasArithmeticBccFusion() const { return HasArithmeticBccFusion; }
bool hasArithmeticCbzFusion() const { return HasArithmeticCbzFusion; }		bool hasArithmeticCbzFusion() const { return HasArithmeticCbzFusion; }
		bool hasCmpBccFusion() const { return HasCmpBccFusion; }
bool hasFuseAddress() const { return HasFuseAddress; }		bool hasFuseAddress() const { return HasFuseAddress; }
bool hasFuseAES() const { return HasFuseAES; }		bool hasFuseAES() const { return HasFuseAES; }
bool hasFuseArithmeticLogic() const { return HasFuseArithmeticLogic; }		bool hasFuseArithmeticLogic() const { return HasFuseArithmeticLogic; }
bool hasFuseCCSelect() const { return HasFuseCCSelect; }		bool hasFuseCCSelect() const { return HasFuseCCSelect; }
bool hasFuseCryptoEOR() const { return HasFuseCryptoEOR; }		bool hasFuseCryptoEOR() const { return HasFuseCryptoEOR; }
bool hasFuseLiterals() const { return HasFuseLiterals; }		bool hasFuseLiterals() const { return HasFuseLiterals; }

/// Return true if the CPU supports any kind of instruction fusion.		/// Return true if the CPU supports any kind of instruction fusion.
▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/misched-fusion-addr.ll

	; RUN: llc %s -o - -mtriple=aarch64-unknown -mattr=fuse-address \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mattr=fuse-address \| FileCheck %s
				; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a65 \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m3 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m3 \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m4 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m4 \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m5 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m5 \| FileCheck %s

	target triple = "aarch64-unknown"			target triple = "aarch64-unknown"

	@var_8bit = dso_local global i8 0			@var_8bit = dso_local global i8 0
	@var_16bit = dso_local global i16 0			@var_16bit = dso_local global i16 0
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/misched-fusion-aes.ll

	; RUN: llc %s -o - -mtriple=aarch64-unknown -mattr=+fuse-aes,+crypto \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mattr=+fuse-aes,+crypto \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=generic -mattr=+crypto \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=generic -mattr=+crypto \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a53 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a53 \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a57 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a57 \| FileCheck %s
				; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a65 \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a72 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a72 \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a73 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a73 \| FileCheck %s
				; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a76 \| FileCheck %s
				; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a77 \| FileCheck %s
				; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a78 \| FileCheck %s
				; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a78c\| FileCheck %s
				; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-x1 \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m3 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m3 \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m4 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m4 \| FileCheck %s
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m5 \| FileCheck %s			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m5 \| FileCheck %s

	declare <16 x i8> @llvm.aarch64.crypto.aese(<16 x i8> %d, <16 x i8> %k)			declare <16 x i8> @llvm.aarch64.crypto.aese(<16 x i8> %d, <16 x i8> %k)
	declare <16 x i8> @llvm.aarch64.crypto.aesmc(<16 x i8> %d)			declare <16 x i8> @llvm.aarch64.crypto.aesmc(<16 x i8> %d)
	declare <16 x i8> @llvm.aarch64.crypto.aesd(<16 x i8> %d, <16 x i8> %k)			declare <16 x i8> @llvm.aarch64.crypto.aesd(<16 x i8> %d, <16 x i8> %k)
	declare <16 x i8> @llvm.aarch64.crypto.aesimc(<16 x i8> %d)			declare <16 x i8> @llvm.aarch64.crypto.aesimc(<16 x i8> %d)
	▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/misched-fusion-lit.ll

	; RUN: llc %s -o - -mtriple=aarch64-unknown -mattr=-fuse-literals \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKDONT			; RUN: llc %s -o - -mtriple=aarch64-unknown -mattr=-fuse-literals \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKDONT
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mattr=+fuse-literals \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE			; RUN: llc %s -o - -mtriple=aarch64-unknown -mattr=+fuse-literals \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a57 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a57 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE
				; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a65 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE
				; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a72 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m3 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m3 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m4 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m4 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE
	; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m5 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE			; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m5 \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECKFUSE

	@g = common local_unnamed_addr global i8* null, align 8			@g = common local_unnamed_addr global i8* null, align 8

	define dso_local i8* @litp(i32 %a, i32 %b) {			define dso_local i8* @litp(i32 %a, i32 %b) {
	entry:			entry:
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines