This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64.td
-
AArch64Subtarget.h
-
AArch64Subtarget.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
aarch64-gep-opt.ll
1/2
arm64-variadic-aapcs.ll
-
arm64-virtual_base.ll
-
ilp32-va.ll
-
misched-stp.ll
-
seh-finally.ll

Differential D98781

[AArch64] Enable UseAA globally in the AArch64 backend
ClosedPublic

Authored by dmgreen on Mar 17 2021, 7:21 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
fhahn
SjoerdMeijer
david-arm
sdesmalen
efriedma
hfinkel

Commits

rGaf342f724004: [AArch64] Enable UseAA globally in the AArch64 backend

Summary

This is similar to D69796 from the ARM backend. We remove the UseAA feature, enabling it globally in the AArch64 backend. This should in general be an improvement allowing the backend to reorder more instructions in scheduling and codegen, and enabling it by default helps to improve the testing of the feature, not making it cpu-specific. A debugging option is added instead for testing.

Diff Detail

Event Timeline

dmgreen created this revision.Mar 17 2021, 7:21 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptMar 17 2021, 7:21 AM

dmgreen requested review of this revision.Mar 17 2021, 7:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2021, 7:21 AM

Harbormaster completed remote builds in B94242: Diff 331260.Mar 17 2021, 7:59 AM

david-arm added inline comments.Mar 19 2021, 2:23 AM

llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
135	This looks like there is potentially a regression here? We've gone from a `ldp` to two `'ldr` instructions here.

dmgreen added inline comments.Apr 10 2021, 6:16 AM

llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
135	There are already a lot of command lines added to this test, presumably to attempt to get it to work in the same way as it did in the past. `-pre-RA-sched=linearize -enable-misched=false -disable-post-ra`. Without those flags this performs the same with and without UseAA. In general, UseAA can give more scheduling freedom and the compiler is perfectly able to shoot itself in the foot with that extra freedom. The opposite can be true too though, where it does help things, and it should help more than it hinders. And it's already enabled for certain CPU's. This just makes it global.

Matt added a subscriber: Matt.Apr 13 2021, 7:23 AM

LGTM! Thanks for explaining - I think the change looks sensible to me.

This revision is now accepted and ready to land.Apr 16 2021, 6:07 AM

Thanks. Lets see if the buildbots/everyone else agrees..

Closed by commit rGaf342f724004: [AArch64] Enable UseAA globally in the AArch64 backend (authored by dmgreen). · Explain WhyApr 24 2021, 9:52 AM

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGaf342f724004: [AArch64] Enable UseAA globally in the AArch64 backend.

Herald added a subscriber: tmatheson. · View Herald TranscriptApr 24 2021, 9:52 AM

asb mentioned this in D157250: [RISCV] Enable alias analysis by default.Aug 8 2023, 11:33 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64.td

11 lines

AArch64Subtarget.h

3 lines

AArch64Subtarget.cpp

5 lines

test/

CodeGen/

AArch64/

aarch64-gep-opt.ll

10 lines

arm64-variadic-aapcs.ll

5 lines

arm64-virtual_base.ll

6 lines

ilp32-va.ll

6 lines

misched-stp.ll

2 lines

seh-finally.ll

8 lines

Diff 331260

llvm/lib/Target/AArch64/AArch64.td

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	foreach i = {1-7,9-15,18,20-28,30} in
def FeatureReserveX#i : SubtargetFeature<"reserve-x"#i, "ReserveXRegister["#i#"]", "true",		def FeatureReserveX#i : SubtargetFeature<"reserve-x"#i, "ReserveXRegister["#i#"]", "true",
"Reserve X"#i#", making it unavailable "		"Reserve X"#i#", making it unavailable "
"as a GPR">;		"as a GPR">;

foreach i = {8-15,18} in		foreach i = {8-15,18} in
def FeatureCallSavedX#i : SubtargetFeature<"call-saved-x"#i,		def FeatureCallSavedX#i : SubtargetFeature<"call-saved-x"#i,
"CustomCallSavedXRegs["#i#"]", "true", "Make X"#i#" callee saved.">;		"CustomCallSavedXRegs["#i#"]", "true", "Make X"#i#" callee saved.">;

def FeatureUseAA : SubtargetFeature<"use-aa", "UseAA", "true",
"Use alias analysis during codegen">;

def FeatureBalanceFPOps : SubtargetFeature<"balance-fp-ops", "BalanceFPOps",		def FeatureBalanceFPOps : SubtargetFeature<"balance-fp-ops", "BalanceFPOps",
"true",		"true",
"balance mix of odd and even D-registers for fp multiply(-accumulate) ops">;		"balance mix of odd and even D-registers for fp multiply(-accumulate) ops">;

def FeaturePredictableSelectIsExpensive : SubtargetFeature<		def FeaturePredictableSelectIsExpensive : SubtargetFeature<
"predictable-select-expensive", "PredictableSelectIsExpensive", "true",		"predictable-select-expensive", "PredictableSelectIsExpensive", "true",
"Prefer likely predicted branches over selects">;		"Prefer likely predicted branches over selects">;

▲ Show 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	def ProcA53 : SubtargetFeature<"a53", "ARMProcFamily", "CortexA53",
FeatureCRC,		FeatureCRC,
FeatureCrypto,		FeatureCrypto,
FeatureCustomCheapAsMoveHandling,		FeatureCustomCheapAsMoveHandling,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFuseAES,		FeatureFuseAES,
FeatureNEON,		FeatureNEON,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureUseAA
]>;		]>;

def ProcA55 : SubtargetFeature<"a55", "ARMProcFamily", "CortexA55",		def ProcA55 : SubtargetFeature<"a55", "ARMProcFamily", "CortexA55",
"Cortex-A55 ARM processors", [		"Cortex-A55 ARM processors", [
HasV8_2aOps,		HasV8_2aOps,
FeatureCrypto,		FeatureCrypto,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFuseAES,		FeatureFuseAES,
FeatureNEON,		FeatureNEON,
FeatureFullFP16,		FeatureFullFP16,
FeatureDotProd,		FeatureDotProd,
FeatureRCPC,		FeatureRCPC,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureUseAA
]>;		]>;

def ProcA57 : SubtargetFeature<"a57", "ARMProcFamily", "CortexA57",		def ProcA57 : SubtargetFeature<"a57", "ARMProcFamily", "CortexA57",
"Cortex-A57 ARM processors", [		"Cortex-A57 ARM processors", [
FeatureBalanceFPOps,		FeatureBalanceFPOps,
FeatureCRC,		FeatureCRC,
FeatureCrypto,		FeatureCrypto,
FeatureCustomCheapAsMoveHandling,		FeatureCustomCheapAsMoveHandling,
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	def ProcA78C : SubtargetFeature<"cortex-a78c", "ARMProcFamily",
FeatureRCPC,		FeatureRCPC,
FeatureSPE,		FeatureSPE,
FeatureSSBS]>;		FeatureSSBS]>;

def ProcR82 : SubtargetFeature<"cortex-r82", "ARMProcFamily",		def ProcR82 : SubtargetFeature<"cortex-r82", "ARMProcFamily",
"CortexR82",		"CortexR82",
"Cortex-R82 ARM Processors", [		"Cortex-R82 ARM Processors", [
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureUseAA,
// All other features are implied by v8_0r ops:		// All other features are implied by v8_0r ops:
HasV8_0rOps,		HasV8_0rOps,
]>;		]>;

def ProcX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1",		def ProcX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1",
"Cortex-X1 ARM processors", [		"Cortex-X1 ARM processors", [
HasV8_2aOps,		HasV8_2aOps,
FeatureCmpBccFusion,		FeatureCmpBccFusion,
▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	def ProcNeoverseE1 : SubtargetFeature<"neoversee1", "ARMProcFamily",
FeatureCrypto,		FeatureCrypto,
FeatureDotProd,		FeatureDotProd,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFullFP16,		FeatureFullFP16,
FeatureNEON,		FeatureNEON,
FeatureRCPC,		FeatureRCPC,
FeatureSSBS,		FeatureSSBS,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureUseAA,
FeatureFuseAES,		FeatureFuseAES,
]>;		]>;

def ProcNeoverseN1 : SubtargetFeature<"neoversen1", "ARMProcFamily",		def ProcNeoverseN1 : SubtargetFeature<"neoversen1", "ARMProcFamily",
"NeoverseN1",		"NeoverseN1",
"Neoverse N1 ARM processors", [		"Neoverse N1 ARM processors", [
HasV8_2aOps,		HasV8_2aOps,
FeatureCrypto,		FeatureCrypto,
FeatureDotProd,		FeatureDotProd,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFullFP16,		FeatureFullFP16,
FeatureNEON,		FeatureNEON,
FeatureRCPC,		FeatureRCPC,
FeatureSPE,		FeatureSPE,
FeatureSSBS,		FeatureSSBS,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureUseAA,
FeatureFuseAES,		FeatureFuseAES,
]>;		]>;

def ProcNeoverseN2 : SubtargetFeature<"neoversen2", "ARMProcFamily",		def ProcNeoverseN2 : SubtargetFeature<"neoversen2", "ARMProcFamily",
"NeoverseN2",		"NeoverseN2",
"Neoverse N2 ARM processors", [		"Neoverse N2 ARM processors", [
HasV8_5aOps,		HasV8_5aOps,
FeatureBF16,		FeatureBF16,
FeatureETE,		FeatureETE,
FeatureMatMulInt8,		FeatureMatMulInt8,
FeatureMTE,		FeatureMTE,
FeatureSVE2,		FeatureSVE2,
FeatureSVE2BitPerm,		FeatureSVE2BitPerm,
FeatureTRBE,		FeatureTRBE,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureUseAA,
FeatureCrypto,		FeatureCrypto,
FeatureFuseAES,		FeatureFuseAES,
]>;		]>;

def ProcNeoverseV1 : SubtargetFeature<"neoversev1", "ARMProcFamily",		def ProcNeoverseV1 : SubtargetFeature<"neoversev1", "ARMProcFamily",
"NeoverseV1",		"NeoverseV1",
"Neoverse V1 ARM processors", [		"Neoverse V1 ARM processors", [
HasV8_4aOps,		HasV8_4aOps,
FeatureBF16,		FeatureBF16,
FeatureCacheDeepPersist,		FeatureCacheDeepPersist,
FeatureCrypto,		FeatureCrypto,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFP16FML,		FeatureFP16FML,
FeatureFullFP16,		FeatureFullFP16,
FeatureFuseAES,		FeatureFuseAES,
FeatureMatMulInt8,		FeatureMatMulInt8,
FeatureNEON,		FeatureNEON,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureUseAA,
FeatureRandGen,		FeatureRandGen,
FeatureSPE,		FeatureSPE,
FeatureSSBS,		FeatureSSBS,
FeatureSVE]>;		FeatureSVE]>;

def ProcSaphira : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",		def ProcSaphira : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",
"Qualcomm Saphira processors", [		"Qualcomm Saphira processors", [
FeatureCrypto,		FeatureCrypto,
Show All 30 Lines	def ProcThunderX3T110 : SubtargetFeature<"thunderx3t110", "ARMProcFamily",
FeatureCrypto,		FeatureCrypto,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureArithmeticBccFusion,		FeatureArithmeticBccFusion,
FeatureNEON,		FeatureNEON,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeaturePredictableSelectIsExpensive,		FeaturePredictableSelectIsExpensive,
FeatureLSE,		FeatureLSE,
FeaturePAuth,		FeaturePAuth,
FeatureUseAA,
FeatureBalanceFPOps,		FeatureBalanceFPOps,
FeaturePerfMon,		FeaturePerfMon,
FeatureStrictAlign,		FeatureStrictAlign,
HasV8_3aOps]>;		HasV8_3aOps]>;

def ProcThunderX : SubtargetFeature<"thunderx", "ARMProcFamily", "ThunderX",		def ProcThunderX : SubtargetFeature<"thunderx", "ARMProcFamily", "ThunderX",
"Cavium ThunderX processors", [		"Cavium ThunderX processors", [
FeatureCRC,		FeatureCRC,
▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	protected:

// NegativeImmediates - transform instructions with negative immediates		// NegativeImmediates - transform instructions with negative immediates
bool NegativeImmediates = true;		bool NegativeImmediates = true;

// Enable 64-bit vectorization in SLP.		// Enable 64-bit vectorization in SLP.
unsigned MinVectorRegisterBitWidth = 64;		unsigned MinVectorRegisterBitWidth = 64;

bool OutlineAtomics = false;		bool OutlineAtomics = false;
bool UseAA = false;
bool PredictableSelectIsExpensive = false;		bool PredictableSelectIsExpensive = false;
bool BalanceFPOps = false;		bool BalanceFPOps = false;
bool CustomAsCheapAsMove = false;		bool CustomAsCheapAsMove = false;
bool ExynosAsCheapAsMove = false;		bool ExynosAsCheapAsMove = false;
bool UsePostRAScheduler = false;		bool UsePostRAScheduler = false;
bool Misaligned128StoreIsSlow = false;		bool Misaligned128StoreIsSlow = false;
bool Paired128IsSlow = false;		bool Paired128IsSlow = false;
bool STRQroIsSlow = false;		bool STRQroIsSlow = false;
▲ Show 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	public:
bool isTargetELF() const { return TargetTriple.isOSBinFormatELF(); }		bool isTargetELF() const { return TargetTriple.isOSBinFormatELF(); }
bool isTargetMachO() const { return TargetTriple.isOSBinFormatMachO(); }		bool isTargetMachO() const { return TargetTriple.isOSBinFormatMachO(); }

bool isTargetILP32() const {		bool isTargetILP32() const {
return TargetTriple.isArch32Bit() \|\|		return TargetTriple.isArch32Bit() \|\|
TargetTriple.getEnvironment() == Triple::GNUILP32;		TargetTriple.getEnvironment() == Triple::GNUILP32;
}		}

bool useAA() const override { return UseAA; }		bool useAA() const override;

bool outlineAtomics() const { return OutlineAtomics; }		bool outlineAtomics() const { return OutlineAtomics; }

bool hasVH() const { return HasVH; }		bool hasVH() const { return HasVH; }
bool hasPAN() const { return HasPAN; }		bool hasPAN() const { return HasPAN; }
bool hasLOR() const { return HasLOR; }		bool hasLOR() const { return HasLOR; }

bool hasPsUAO() const { return HasPsUAO; }		bool hasPsUAO() const { return HasPsUAO; }
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> SVEVectorBitsMax(
cl::init(0), cl::Hidden);		cl::init(0), cl::Hidden);

static cl::opt<unsigned> SVEVectorBitsMin(		static cl::opt<unsigned> SVEVectorBitsMin(
"aarch64-sve-vector-bits-min",		"aarch64-sve-vector-bits-min",
cl::desc("Assume SVE vector registers are at least this big, "		cl::desc("Assume SVE vector registers are at least this big, "
"with zero meaning no minimum size is assumed."),		"with zero meaning no minimum size is assumed."),
cl::init(0), cl::Hidden);		cl::init(0), cl::Hidden);

		static cl::opt<bool> UseAA("aarch64-use-aa", cl::init(true),
		cl::desc("Enable the use of AA during codegen."));

AArch64Subtarget &		AArch64Subtarget &
AArch64Subtarget::initializeSubtargetDependencies(StringRef FS,		AArch64Subtarget::initializeSubtargetDependencies(StringRef FS,
StringRef CPUString) {		StringRef CPUString) {
// Determine default and user-specified characteristics		// Determine default and user-specified characteristics

if (CPUString.empty())		if (CPUString.empty())
CPUString = "generic";		CPUString = "generic";

▲ Show 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	if (SVEVectorBitsMax == 0)
return (SVEVectorBitsMin / 128) * 128;		return (SVEVectorBitsMin / 128) * 128;
return (std::min(SVEVectorBitsMin, SVEVectorBitsMax) / 128) * 128;		return (std::min(SVEVectorBitsMin, SVEVectorBitsMax) / 128) * 128;
}		}

bool AArch64Subtarget::useSVEForFixedLengthVectors() const {		bool AArch64Subtarget::useSVEForFixedLengthVectors() const {
// Prefer NEON unless larger SVE registers are available.		// Prefer NEON unless larger SVE registers are available.
return hasSVE() && getMinSVEVectorSizeInBits() >= 256;		return hasSVE() && getMinSVEVectorSizeInBits() >= 256;
}		}

		bool AArch64Subtarget::useAA() const { return UseAA; }

llvm/test/CodeGen/AArch64/aarch64-gep-opt.ll

	; RUN: llc -O3 -aarch64-enable-gep-opt=true -verify-machineinstrs %s -o - \| FileCheck %s			; RUN: llc -O3 -aarch64-enable-gep-opt=true -verify-machineinstrs %s -o - \| FileCheck %s
	; RUN: llc -O3 -aarch64-enable-gep-opt=true -mattr=-use-aa -print-after=codegenprepare < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-NoAA <%t %s			; RUN: llc -O3 -aarch64-enable-gep-opt=true -print-after=codegenprepare < %s 2>&1 \| FileCheck --check-prefix=CHECK-UseAA %s
	; RUN: llc -O3 -aarch64-enable-gep-opt=true -mattr=+use-aa -print-after=codegenprepare < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-UseAA <%t %s			; RUN: llc -O3 -aarch64-enable-gep-opt=true -aarch64-use-aa=false -print-after=codegenprepare < %s 2>&1 \| FileCheck --check-prefix=CHECK-NoAA %s
	; RUN: llc -O3 -aarch64-enable-gep-opt=true -print-after=codegenprepare -mcpu=cyclone < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-NoAA <%t %s			; RUN: llc -O3 -aarch64-enable-gep-opt=true -print-after=codegenprepare -mcpu=cyclone < %s 2>&1 \| FileCheck --check-prefix=CHECK-UseAA %s
	; RUN: llc -O3 -aarch64-enable-gep-opt=true -print-after=codegenprepare -mcpu=cortex-a53 < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-UseAA <%t %s			; RUN: llc -O3 -aarch64-enable-gep-opt=true -print-after=codegenprepare -mcpu=cortex-a53 < %s 2>&1 \| FileCheck --check-prefix=CHECK-UseAA %s

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-linux-gnueabi"			target triple = "aarch64-linux-gnueabi"

	; Following test cases test enabling SeparateConstOffsetFromGEP pass in AArch64			; Following test cases test enabling SeparateConstOffsetFromGEP pass in AArch64
	; backend. If useAA() returns true, it will lower a GEP with multiple indices			; backend. If useAA() returns true, it will lower a GEP with multiple indices
	; into GEPs with a single index, otherwise it will lower it into a			; into GEPs with a single index, otherwise it will lower it into a
	; "ptrtoint+arithmetics+inttoptr" form.			; "ptrtoint+arithmetics+inttoptr" form.

	%struct = type { i32, i32, i32, i32, [20 x i32] }			%struct = type { i32, i32, i32, i32, [20 x i32] }

	; Check that when two complex GEPs are used in two basic blocks, LLVM can			; Check that when two complex GEPs are used in two basic blocks, LLVM can
	; elimilate the common subexpression for the second use.			; eliminate the common subexpression for the second use.
	define void @test_GEP_CSE([240 x %struct]* %string, i32* %adj, i32 %lib, i64 %idxprom) {			define void @test_GEP_CSE([240 x %struct]* %string, i32* %adj, i32 %lib, i64 %idxprom) {
	%liberties = getelementptr [240 x %struct], [240 x %struct]* %string, i64 1, i64 %idxprom, i32 3			%liberties = getelementptr [240 x %struct], [240 x %struct]* %string, i64 1, i64 %idxprom, i32 3
	%1 = load i32, i32* %liberties, align 4			%1 = load i32, i32* %liberties, align 4
	%cmp = icmp eq i32 %1, %lib			%cmp = icmp eq i32 %1, %lib
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	if.then: ; preds = %entry			if.then: ; preds = %entry
	%origin = getelementptr [240 x %struct], [240 x %struct]* %string, i64 1, i64 %idxprom, i32 2			%origin = getelementptr [240 x %struct], [240 x %struct]* %string, i64 1, i64 %idxprom, i32 2
	▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-variadic-aapcs.ll

	Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
	define dso_local void @test_va_copy() {			define dso_local void @test_va_copy() {
	; CHECK-LABEL: test_va_copy:			; CHECK-LABEL: test_va_copy:
	%srcaddr = bitcast %va_list* @var to i8*			%srcaddr = bitcast %va_list* @var to i8*
	%dstaddr = bitcast %va_list* @second_list to i8*			%dstaddr = bitcast %va_list* @second_list to i8*
	call void @llvm.va_copy(i8* %dstaddr, i8* %srcaddr)			call void @llvm.va_copy(i8* %dstaddr, i8* %srcaddr)

	; CHECK: add x[[SRC:[0-9]+]], {{x[0-9]+}}, :lo12:var			; CHECK: add x[[SRC:[0-9]+]], {{x[0-9]+}}, :lo12:var

	; CHECK: ldp [[BLOCK:q[0-9]+]], [[BLOCK:q[0-9]+]], [x[[SRC]]]			; CHECK: ldr [[BLOCKB:q[0-9]+]], [x[[SRC]], #16]
				david-armUnsubmitted Not Done Reply Inline Actions This looks like there is potentially a regression here? We've gone from a `ldp` to two `'ldr` instructions here. david-arm: This looks like there is potentially a regression here? We've gone from a `ldp` to two `'ldr`…
				dmgreenAuthorUnsubmitted Done Reply Inline Actions There are already a lot of command lines added to this test, presumably to attempt to get it to work in the same way as it did in the past. `-pre-RA-sched=linearize -enable-misched=false -disable-post-ra`. Without those flags this performs the same with and without UseAA. In general, UseAA can give more scheduling freedom and the compiler is perfectly able to shoot itself in the foot with that extra freedom. The opposite can be true too though, where it does help things, and it should help more than it hinders. And it's already enabled for certain CPU's. This just makes it global. dmgreen: There are already a lot of command lines added to this test, presumably to attempt to get it to…
	; CHECK: add x[[DST:[0-9]+]], {{x[0-9]+}}, :lo12:second_list			; CHECK: add x[[DST:[0-9]+]], {{x[0-9]+}}, :lo12:second_list
	; CHECK: stp [[BLOCK:q[0-9]+]], [[BLOCK:q[0-9]+]], [x[[DST]]]			; CHECK: ldr [[BLOCKA:q[0-9]+]], [x[[SRC]]]
				; CHECK: stp [[BLOCKA]], [[BLOCKB]], [x[[DST]]]
	ret void			ret void
	; CHECK: ret			; CHECK: ret
	}			}

llvm/test/CodeGen/AArch64/arm64-virtual_base.ll

	Show All 27 Lines
	%struct.Ray_Struct = type { [3 x double], [3 x double], i32, [100 x %struct.Interior_Struct*] }			%struct.Ray_Struct = type { [3 x double], [3 x double], i32, [100 x %struct.Interior_Struct*] }
	%struct.istack_struct = type { %struct.istack_struct, %struct.istk_entry, i32 }			%struct.istack_struct = type { %struct.istack_struct, %struct.istk_entry, i32 }
	%struct.istk_entry = type { double, [3 x double], [3 x double], %struct.Object_Struct, i32, i32, double, double, i8 }			%struct.istk_entry = type { double, [3 x double], [3 x double], %struct.Object_Struct, i32, i32, double, double, i8 }
	%struct.Transform_Struct = type { [4 x [4 x double]], [4 x [4 x double]] }			%struct.Transform_Struct = type { [4 x [4 x double]], [4 x [4 x double]] }
	%struct.Bezier_Node_Struct = type { i32, [3 x double], double, i32, i8* }			%struct.Bezier_Node_Struct = type { i32, [3 x double], double, i32, i8* }

	define void @Precompute_Patch_Values(%struct.Bicubic_Patch_Struct* %Shape) {			define void @Precompute_Patch_Values(%struct.Bicubic_Patch_Struct* %Shape) {
	; CHECK: Precompute_Patch_Values			; CHECK: Precompute_Patch_Values
	; CHECK: ldr [[VAL:x[0-9]+]], [x0, #288]			; CHECK: ldr [[VAL2:q[0-9]+]], [x0, #272]
	; CHECK-NEXT: ldr [[VAL2:q[0-9]+]], [x0, #272]			; CHECK-NEXT: ldr [[VAL:x[0-9]+]], [x0, #288]
	; CHECK-NEXT: str [[VAL]], [sp, #232]
	; CHECK-NEXT: stur [[VAL2]], {{\[}}sp, #216]			; CHECK-NEXT: stur [[VAL2]], {{\[}}sp, #216]
				; CHECK-NEXT: str [[VAL]], [sp, #232]
	entry:			entry:
	%Control_Points = alloca [16 x [3 x double]], align 8			%Control_Points = alloca [16 x [3 x double]], align 8
	%arraydecay5.3.1 = getelementptr inbounds [16 x [3 x double]], [16 x [3 x double]]* %Control_Points, i64 0, i64 9, i64 0			%arraydecay5.3.1 = getelementptr inbounds [16 x [3 x double]], [16 x [3 x double]]* %Control_Points, i64 0, i64 9, i64 0
	%tmp14 = bitcast double* %arraydecay5.3.1 to i8*			%tmp14 = bitcast double* %arraydecay5.3.1 to i8*
	%arraydecay11.3.1 = getelementptr inbounds %struct.Bicubic_Patch_Struct, %struct.Bicubic_Patch_Struct* %Shape, i64 0, i32 12, i64 1, i64 3, i64 0			%arraydecay11.3.1 = getelementptr inbounds %struct.Bicubic_Patch_Struct, %struct.Bicubic_Patch_Struct* %Shape, i64 0, i32 12, i64 1, i64 3, i64 0
	%tmp15 = bitcast double* %arraydecay11.3.1 to i8*			%tmp15 = bitcast double* %arraydecay11.3.1 to i8*
	call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp14, i8* %tmp15, i64 24, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i64(i8* %tmp14, i8* %tmp15, i64 24, i1 false)
	ret void			ret void
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1)			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1)

llvm/test/CodeGen/AArch64/ilp32-va.ll

	Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
	define dso_local void @test_va_copy() {			define dso_local void @test_va_copy() {
	; CHECK-LABEL: test_va_copy:			; CHECK-LABEL: test_va_copy:
	%srcaddr = bitcast %va_list* @var to i8*			%srcaddr = bitcast %va_list* @var to i8*
	%dstaddr = bitcast %va_list* @second_list to i8*			%dstaddr = bitcast %va_list* @second_list to i8*
	call void @llvm.va_copy(i8* %dstaddr, i8* %srcaddr)			call void @llvm.va_copy(i8* %dstaddr, i8* %srcaddr)

	; CHECK: add x[[SRC:[0-9]+]], {{x[0-9]+}}, :lo12:var			; CHECK: add x[[SRC:[0-9]+]], {{x[0-9]+}}, :lo12:var

	; CHECK: ldr [[BLOCK:q[0-9]+]], [x[[SRC]]]
	; CHECK: add x[[DST:[0-9]+]], {{x[0-9]+}}, :lo12:second_list
	; CHECK: ldr [[BLOCK:w[0-9]+]], [x[[SRC]], #16]			; CHECK: ldr [[BLOCK:w[0-9]+]], [x[[SRC]], #16]
	; CHECK: str [[BLOCK:q[0-9]+]], [x[[DST]]]			; CHECK: add x[[DST:[0-9]+]], {{x[0-9]+}}, :lo12:second_list
	; CHECK: str [[BLOCK:w[0-9]+]], [x[[DST]], #16]			; CHECK: str [[BLOCK:w[0-9]+]], [x[[DST]], #16]
				; CHECK: ldr [[BLOCK:q[0-9]+]], [x[[SRC]]]
				; CHECK: str [[BLOCK:q[0-9]+]], [x[[DST]]]
	ret void			ret void
	; CHECK: ret			; CHECK: ret
	}			}

llvm/test/CodeGen/AArch64/misched-stp.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc < %s -mtriple=aarch64 -mcpu=cyclone -mattr=+use-aa,+slow-misaligned-128store -enable-misched -verify-misched -o - \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64 -mcpu=cyclone -mattr=+slow-misaligned-128store -enable-misched -verify-misched -o - \| FileCheck %s

	; Tests to check that the scheduler dependencies derived from alias analysis are			; Tests to check that the scheduler dependencies derived from alias analysis are
	; correct when we have loads that have been split up so that they can later be			; correct when we have loads that have been split up so that they can later be
	; merged into STP.			; merged into STP.

	; Now that overwritten stores are elided in SelectionDAG, dependencies			; Now that overwritten stores are elided in SelectionDAG, dependencies
	; are resolved and removed before MISCHED. Check that we have			; are resolved and removed before MISCHED. Check that we have
	; equivalent pair of stp calls as a baseline.			; equivalent pair of stp calls as a baseline.
	Show All 40 Lines

llvm/test/CodeGen/AArch64/seh-finally.ll

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	ehcleanup: ; preds = %entry
cleanupret from %2 unwind to caller		cleanupret from %2 unwind to caller
}		}

define void @fin_simple_seh(i8 %abnormal_termination, i8* %frame_pointer) {		define void @fin_simple_seh(i8 %abnormal_termination, i8* %frame_pointer) {
entry:		entry:
; CHECK-LABEL: fin_simple_seh		; CHECK-LABEL: fin_simple_seh
; CHECK: movz x8, #:abs_g1_s:.Lsimple_seh$frame_escape_0		; CHECK: movz x8, #:abs_g1_s:.Lsimple_seh$frame_escape_0
; CHECK: movk x8, #:abs_g0_nc:.Lsimple_seh$frame_escape_0		; CHECK: movk x8, #:abs_g0_nc:.Lsimple_seh$frame_escape_0
		; CHECK: ldr w8, [x1, x8]
; CHECK: strb w0, [sp, #15]		; CHECK: strb w0, [sp, #15]
; CHECK: ldr w0, [x1, x8]
; CHECK: bl foo		; CHECK: bl foo

%frame_pointer.addr = alloca i8*, align 8		%frame_pointer.addr = alloca i8*, align 8
%abnormal_termination.addr = alloca i8, align 1		%abnormal_termination.addr = alloca i8, align 1
%0 = call i8* @llvm.localrecover(i8* bitcast (void ()* @simple_seh to i8), i8 %frame_pointer, i32 0)		%0 = call i8* @llvm.localrecover(i8* bitcast (void ()* @simple_seh to i8), i8 %frame_pointer, i32 0)
%o = bitcast i8* %0 to %struct.S*		%o = bitcast i8* %0 to %struct.S*
store i8* %frame_pointer, i8** %frame_pointer.addr, align 8		store i8* %frame_pointer, i8** %frame_pointer.addr, align 8
store i8 %abnormal_termination, i8* %abnormal_termination.addr, align 1		store i8 %abnormal_termination, i8* %abnormal_termination.addr, align 1
Show All 36 Lines	ehcleanup: ; preds = %entry
cleanupret from %2 unwind to caller		cleanupret from %2 unwind to caller
}		}

define void @fin_stack_realign(i8 %abnormal_termination, i8* %frame_pointer) {		define void @fin_stack_realign(i8 %abnormal_termination, i8* %frame_pointer) {
entry:		entry:
; CHECK-LABEL: fin_stack_realign		; CHECK-LABEL: fin_stack_realign
; CHECK: movz x8, #:abs_g1_s:.Lstack_realign$frame_escape_0		; CHECK: movz x8, #:abs_g1_s:.Lstack_realign$frame_escape_0
; CHECK: movk x8, #:abs_g0_nc:.Lstack_realign$frame_escape_0		; CHECK: movk x8, #:abs_g0_nc:.Lstack_realign$frame_escape_0
		; CHECK: ldr w8, [x1, x8]
; CHECK: strb w0, [sp, #15]		; CHECK: strb w0, [sp, #15]
; CHECK: ldr w0, [x1, x8]
; CHECK: bl foo		; CHECK: bl foo

%frame_pointer.addr = alloca i8*, align 8		%frame_pointer.addr = alloca i8*, align 8
%abnormal_termination.addr = alloca i8, align 1		%abnormal_termination.addr = alloca i8, align 1
%0 = call i8* @llvm.localrecover(i8* bitcast (void ()* @stack_realign to i8), i8 %frame_pointer, i32 0)		%0 = call i8* @llvm.localrecover(i8* bitcast (void ()* @stack_realign to i8), i8 %frame_pointer, i32 0)
%o = bitcast i8* %0 to %struct.S*		%o = bitcast i8* %0 to %struct.S*
store i8* %frame_pointer, i8** %frame_pointer.addr, align 8		store i8* %frame_pointer, i8** %frame_pointer.addr, align 8
store i8 %abnormal_termination, i8* %abnormal_termination.addr, align 1		store i8 %abnormal_termination, i8* %abnormal_termination.addr, align 1
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	ehcleanup: ; preds = %entry
cleanupret from %6 unwind to caller		cleanupret from %6 unwind to caller
}		}

define void @fin_vla_present(i8 %abnormal_termination, i8* %frame_pointer) {		define void @fin_vla_present(i8 %abnormal_termination, i8* %frame_pointer) {
entry:		entry:
; CHECK-LABEL: fin_vla_present		; CHECK-LABEL: fin_vla_present
; CHECK: movz x8, #:abs_g1_s:.Lvla_present$frame_escape_0		; CHECK: movz x8, #:abs_g1_s:.Lvla_present$frame_escape_0
; CHECK: movk x8, #:abs_g0_nc:.Lvla_present$frame_escape_0		; CHECK: movk x8, #:abs_g0_nc:.Lvla_present$frame_escape_0
		; CHECK: ldr w8, [x1, x8]
; CHECK: strb w0, [sp, #15]		; CHECK: strb w0, [sp, #15]
; CHECK: ldr w0, [x1, x8]
; CHECK: bl foo		; CHECK: bl foo

%frame_pointer.addr = alloca i8*, align 8		%frame_pointer.addr = alloca i8*, align 8
%abnormal_termination.addr = alloca i8, align 1		%abnormal_termination.addr = alloca i8, align 1
%0 = call i8* @llvm.localrecover(i8* bitcast (void (i32)* @vla_present to i8), i8 %frame_pointer, i32 0)		%0 = call i8* @llvm.localrecover(i8* bitcast (void (i32)* @vla_present to i8), i8 %frame_pointer, i32 0)
%n.addr = bitcast i8* %0 to i32*		%n.addr = bitcast i8* %0 to i32*
store i8* %frame_pointer, i8** %frame_pointer.addr, align 8		store i8* %frame_pointer, i8** %frame_pointer.addr, align 8
store i8 %abnormal_termination, i8* %abnormal_termination.addr, align 1		store i8 %abnormal_termination, i8* %abnormal_termination.addr, align 1
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	ehcleanup: ; preds = %entry
cleanupret from %6 unwind to caller		cleanupret from %6 unwind to caller
}		}

define void @fin_vla_and_realign(i8 %abnormal_termination, i8* %frame_pointer) {		define void @fin_vla_and_realign(i8 %abnormal_termination, i8* %frame_pointer) {
entry:		entry:
; CHECK-LABEL: fin_vla_and_realign		; CHECK-LABEL: fin_vla_and_realign
; CHECK: movz x8, #:abs_g1_s:.Lvla_and_realign$frame_escape_0		; CHECK: movz x8, #:abs_g1_s:.Lvla_and_realign$frame_escape_0
; CHECK: movk x8, #:abs_g0_nc:.Lvla_and_realign$frame_escape_0		; CHECK: movk x8, #:abs_g0_nc:.Lvla_and_realign$frame_escape_0
		; CHECK: ldr w8, [x1, x8]
; CHECK: strb w0, [sp, #15]		; CHECK: strb w0, [sp, #15]
; CHECK: ldr w0, [x1, x8]
; CHECK: bl foo		; CHECK: bl foo

%frame_pointer.addr = alloca i8*, align 8		%frame_pointer.addr = alloca i8*, align 8
%abnormal_termination.addr = alloca i8, align 1		%abnormal_termination.addr = alloca i8, align 1
%0 = call i8* @llvm.localrecover(i8* bitcast (void (i32)* @vla_and_realign to i8), i8 %frame_pointer, i32 0)		%0 = call i8* @llvm.localrecover(i8* bitcast (void (i32)* @vla_and_realign to i8), i8 %frame_pointer, i32 0)
%o = bitcast i8* %0 to %struct.S*		%o = bitcast i8* %0 to %struct.S*
store i8* %frame_pointer, i8** %frame_pointer.addr, align 8		store i8* %frame_pointer, i8** %frame_pointer.addr, align 8
store i8 %abnormal_termination, i8* %abnormal_termination.addr, align 1		store i8 %abnormal_termination, i8* %abnormal_termination.addr, align 1
Show All 15 Lines