This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Set MaxBytesForLoopAlignment for more targets
ClosedPublic

Authored by NickGuy on Mar 28 2022, 2:29 AM.

Download Raw Diff

Details

Reviewers

dmgreen
samtebbs

Commits

rG7d676714fbf2: [AArch64] Set MaxBytesForLoopAlignment for more targets

Summary

Further implementation of D114590 for the AArch64 backend. Specifying the max padding allowed for loop alignment for further AArch64 targets.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

NickGuy created this revision.Mar 28 2022, 2:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 2:29 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

NickGuy requested review of this revision.Mar 28 2022, 2:29 AM

Harbormaster completed remote builds in B156512: Diff 418524.Mar 28 2022, 2:29 AM

Adding a MaxBytesForLoopAlignment without a PrefLoopLogAlignment doesn't seem to make a lot of sense. I don't think it would do much on its own. Can this add sensible values for PrefLoopLogAlignment at the same time?

It could then extend the test in D114879 for all the CPUs added, to show it's tested.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
16–1	These should be added too if they can be. They may need to be split into separate case blocks, if the A510 is different to the others now.

NickGuy updated this revision to Diff 418569.Mar 28 2022, 6:45 AM

NickGuy marked an inline comment as done.

Harbormaster completed remote builds in B156550: Diff 418569.Mar 28 2022, 6:46 AM

dmgreen added inline comments.Mar 29 2022, 2:27 AM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
17	This needn't be set if there is no PrefLoopLogAlignment set too. Either that, or it can be treated like a CortexA53/A55 below by adding it to the same case block.
llvm/test/CodeGen/AArch64/aarch64-p2align-max-bytes-neoverse.ll
21	Should this be checking for 4, 8? Can we add some of the other cpus like A53 and A55 too?

NickGuy updated this revision to Diff 418903.Mar 29 2022, 9:23 AM

NickGuy marked 2 inline comments as done.

NickGuy added inline comments.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
17	I missed that when adding the others, now added.
llvm/test/CodeGen/AArch64/aarch64-p2align-max-bytes-neoverse.ll
21	Fixed, and added more of the effected cpus.

Harbormaster completed remote builds in B156780: Diff 418903.Mar 29 2022, 9:24 AM

Thanks. LGTM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
17	I would fold this into the CortexA53 block. The cpus in those blocks are similar, and I dont believe there is any reason to treat them differently for function alignment.

This revision is now accepted and ready to land.Mar 31 2022, 12:25 AM

This revision was landed with ongoing or failed builds.Mar 31 2022, 3:37 AM

Closed by commit rG7d676714fbf2: [AArch64] Set MaxBytesForLoopAlignment for more targets (authored by NickGuy). · Explain Why

This revision was automatically updated to reflect the committed changes.

NickGuy added a commit: rG7d676714fbf2: [AArch64] Set MaxBytesForLoopAlignment for more targets.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64Subtarget.cpp

18 lines

test/

CodeGen/

AArch64/

aarch64-p2align-max-bytes-neoverse.ll

21 lines

merge-store-dependency.ll

2 lines

preferred-function-alignment.ll

2 lines

Diff 419386

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

//===-- AArch64Subtarget.cpp - AArch64 Subtarget Information ----- C++ --===//		//===-- AArch64Subtarget.cpp - AArch64 Subtarget Information ----- C++ --===//
		dmgreenUnsubmitted Done Reply Inline Actions These should be added too if they can be. They may need to be split into separate case blocks, if the A510 is different to the others now. dmgreen: These should be added too if they can be. They may need to be split into separate case blocks…
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the AArch64 specific subclass of TargetSubtarget.		// This file implements the AArch64 specific subclass of TargetSubtarget.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64Subtarget.h"		#include "AArch64Subtarget.h"

#include "AArch64.h"		#include "AArch64.h"
#include "AArch64InstrInfo.h"		#include "AArch64InstrInfo.h"
#include "AArch64PBQPRegAlloc.h"		#include "AArch64PBQPRegAlloc.h"
		dmgreenUnsubmitted Done Reply Inline Actions This needn't be set if there is no PrefLoopLogAlignment set too. Either that, or it can be treated like a CortexA53/A55 below by adding it to the same case block. dmgreen: This needn't be set if there is no PrefLoopLogAlignment set too. Either that, or it can be…
		NickGuyAuthorUnsubmitted Done Reply Inline Actions I missed that when adding the others, now added. NickGuy: I missed that when adding the others, now added.
		dmgreenUnsubmitted Not Done Reply Inline Actions I would fold this into the CortexA53 block. The cpus in those blocks are similar, and I dont believe there is any reason to treat them differently for function alignment. dmgreen: I would fold this into the CortexA53 block. The cpus in those blocks are similar, and I dont…
#include "AArch64TargetMachine.h"		#include "AArch64TargetMachine.h"
#include "GISel/AArch64CallLowering.h"		#include "GISel/AArch64CallLowering.h"
#include "GISel/AArch64LegalizerInfo.h"		#include "GISel/AArch64LegalizerInfo.h"
#include "GISel/AArch64RegisterBankInfo.h"		#include "GISel/AArch64RegisterBankInfo.h"
#include "MCTargetDesc/AArch64AddressingModes.h"		#include "MCTargetDesc/AArch64AddressingModes.h"
#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"		#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineScheduler.h"		#include "llvm/CodeGen/MachineScheduler.h"
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	void AArch64Subtarget::initializeProperties() {
// features.		// features.
switch (ARMProcFamily) {		switch (ARMProcFamily) {
case Others:		case Others:
break;		break;
case Carmel:		case Carmel:
CacheLineSize = 64;		CacheLineSize = 64;
break;		break;
case CortexA35:		case CortexA35:
break;
case CortexA53:		case CortexA53:
case CortexA55:		case CortexA55:
PrefFunctionLogAlignment = 4;		PrefFunctionLogAlignment = 4;
		PrefLoopLogAlignment = 4;
		MaxBytesForLoopAlignment = 8;
break;		break;
case CortexA57:		case CortexA57:
MaxInterleaveFactor = 4;		MaxInterleaveFactor = 4;
PrefFunctionLogAlignment = 4;		PrefFunctionLogAlignment = 4;
		PrefLoopLogAlignment = 4;
		MaxBytesForLoopAlignment = 8;
break;		break;
case CortexA65:		case CortexA65:
PrefFunctionLogAlignment = 3;		PrefFunctionLogAlignment = 3;
break;		break;
case CortexA72:		case CortexA72:
case CortexA73:		case CortexA73:
case CortexA75:		case CortexA75:
		PrefFunctionLogAlignment = 4;
		PrefLoopLogAlignment = 4;
		MaxBytesForLoopAlignment = 8;
		break;
case CortexA76:		case CortexA76:
case CortexA77:		case CortexA77:
case CortexA78:		case CortexA78:
case CortexA78C:		case CortexA78C:
case CortexR82:		case CortexR82:
case CortexX1:		case CortexX1:
case CortexX1C:		case CortexX1C:
PrefFunctionLogAlignment = 4;		PrefFunctionLogAlignment = 4;
		PrefLoopLogAlignment = 5;
		MaxBytesForLoopAlignment = 16;
break;		break;
case CortexA510:		case CortexA510:
		PrefFunctionLogAlignment = 4;
		VScaleForTuning = 1;
		PrefLoopLogAlignment = 4;
		MaxBytesForLoopAlignment = 8;
		break;
case CortexA710:		case CortexA710:
case CortexX2:		case CortexX2:
PrefFunctionLogAlignment = 4;		PrefFunctionLogAlignment = 4;
VScaleForTuning = 1;		VScaleForTuning = 1;
		PrefLoopLogAlignment = 5;
		MaxBytesForLoopAlignment = 16;
break;		break;
case A64FX:		case A64FX:
CacheLineSize = 256;		CacheLineSize = 256;
PrefFunctionLogAlignment = 3;		PrefFunctionLogAlignment = 3;
PrefLoopLogAlignment = 2;		PrefLoopLogAlignment = 2;
MaxInterleaveFactor = 4;		MaxInterleaveFactor = 4;
PrefetchDistance = 128;		PrefetchDistance = 128;
MinPrefetchStride = 1024;		MinPrefetchStride = 1024;
▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-p2align-max-bytes-neoverse.ll

	; RUN: llc -mtriple=aarch64-none-linux-gnu -align-loops=32 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-DEFAULT			; RUN: llc -mtriple=aarch64-none-linux-gnu -align-loops=32 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-DEFAULT
	; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-n1 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-N1			; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-n1 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-16
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-n2 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-16
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-v1 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-16
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=cortex-x1 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-16
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=cortex-x2 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-16
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a35 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-8
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a53 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-8
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a55 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-8
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a57 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-8
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a510 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-8
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a75 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-8
				; RUN: llc -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a710 < %s -o -\| FileCheck %s --check-prefixes=CHECK,CHECK-16

	define i32 @a(i32 %x, i32* nocapture readonly %y, i32* nocapture readonly %z) {			define i32 @a(i32 %x, i32* nocapture readonly %y, i32* nocapture readonly %z) {
	; CHECK-DEFAULT: .p2align 5			; CHECK-DEFAULT: .p2align 5
	; CHECK-N1: .p2align 5, 0x0, 16			; CHECK-8: .p2align 4, 0x0, 8
				; CHECK-16: .p2align 5, 0x0, 16
	; CHECK-NEXT: .LBB0_5: // %vector.body			; CHECK-NEXT: .LBB0_5: // %vector.body
	; CHECK-DEFAULT: .p2align 5			; CHECK-DEFAULT: .p2align 5
	; CHECK-N1: .p2align 5, 0x0, 16			; CHECK-8: .p2align 4, 0x0, 8
				dmgreenUnsubmitted Done Reply Inline Actions Should this be checking for 4, 8? Can we add some of the other cpus like A53 and A55 too? dmgreen: Should this be checking for 4, 8? Can we add some of the other cpus like A53 and A55 too?
				NickGuyAuthorUnsubmitted Done Reply Inline Actions Fixed, and added more of the effected cpus. NickGuy: Fixed, and added more of the effected cpus.
				; CHECK-16: .p2align 5, 0x0, 16
	; CHECK-NEXT: .LBB0_8: // %for.body			; CHECK-NEXT: .LBB0_8: // %for.body
	entry:			entry:
	%cmp10 = icmp sgt i32 %x, 0			%cmp10 = icmp sgt i32 %x, 0
	br i1 %cmp10, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp10, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	%wide.trip.count = zext i32 %x to i64			%wide.trip.count = zext i32 %x to i64
	%min.iters.check = icmp ult i32 %x, 8			%min.iters.check = icmp ult i32 %x, 8
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/merge-store-dependency.ll

	Show All 29 Lines
	; A53-NEXT: bl fcntl			; A53-NEXT: bl fcntl
	; A53-NEXT: adrp x9, gv0			; A53-NEXT: adrp x9, gv0
	; A53-NEXT: add x9, x9, :lo12:gv0			; A53-NEXT: add x9, x9, :lo12:gv0
	; A53-NEXT: cmp x19, x9			; A53-NEXT: cmp x19, x9
	; A53-NEXT: b.eq .LBB0_4			; A53-NEXT: b.eq .LBB0_4
	; A53-NEXT: // %bb.1:			; A53-NEXT: // %bb.1:
	; A53-NEXT: ldr w8, [x19]			; A53-NEXT: ldr w8, [x19]
	; A53-NEXT: ldr w9, [x9]			; A53-NEXT: ldr w9, [x9]
				; A53-NEXT: .p2align 4, 0x0, 8
	; A53-NEXT: .LBB0_2: // %while.body.i.split.ver.us			; A53-NEXT: .LBB0_2: // %while.body.i.split.ver.us
	; A53-NEXT: // =>This Inner Loop Header: Depth=1			; A53-NEXT: // =>This Inner Loop Header: Depth=1
	; A53-NEXT: lsl w9, w9, #1			; A53-NEXT: lsl w9, w9, #1
	; A53-NEXT: cmp w9, w8			; A53-NEXT: cmp w9, w8
	; A53-NEXT: b.le .LBB0_2			; A53-NEXT: b.le .LBB0_2
	; A53-NEXT: // %bb.3: // %while.end.i			; A53-NEXT: // %bb.3: // %while.end.i
	; A53-NEXT: bl foo			; A53-NEXT: bl foo
	; A53-NEXT: adrp x8, gv1			; A53-NEXT: adrp x8, gv1
	; A53-NEXT: str x0, [x8, :lo12:gv1]			; A53-NEXT: str x0, [x8, :lo12:gv1]
	; A53-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload			; A53-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload
	; A53-NEXT: ret			; A53-NEXT: ret
				; A53-NEXT: .p2align 4, 0x0, 8
	; A53-NEXT: .LBB0_4: // %while.body.i.split			; A53-NEXT: .LBB0_4: // %while.body.i.split
	; A53-NEXT: // =>This Inner Loop Header: Depth=1			; A53-NEXT: // =>This Inner Loop Header: Depth=1
	; A53-NEXT: b .LBB0_4			; A53-NEXT: b .LBB0_4
	entry:			entry:
	%0 = bitcast %struct1* %fde to i8*			%0 = bitcast %struct1* %fde to i8*
	tail call void @llvm.memset.p0i8.i64(i8* align 8 %0, i8 0, i64 40, i1 false)			tail call void @llvm.memset.p0i8.i64(i8* align 8 %0, i8 0, i64 40, i1 false)
	%state = getelementptr inbounds %struct1, %struct1* %fde, i64 0, i32 4			%state = getelementptr inbounds %struct1, %struct1* %fde, i64 0, i32 4
	store i16 256, i16* %state, align 8			store i16 256, i16* %state, align 8
	▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/preferred-function-alignment.ll

	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=generic < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=generic < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a35 < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a35 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a53 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a53 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a55 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a55 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a57 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a57 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a65 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a65 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a65ae < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a65ae < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a72 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a72 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a73 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a73 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a75 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a75 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	Show All 32 Lines