This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Set preferred function alignment to 16 bytes on Neoverse N1
ClosedPublic

Authored by pbarrio on Aug 2 2019, 6:25 AM.

Download Raw Diff

Details

Reviewers

fhahn
greened
samparker
dmgreen

Commits

rGa8426b43f8b9: [AArch64] Set preferred function alignment to 16 bytes on Neoverse N1
rL367894: [AArch64] Set preferred function alignment to 16 bytes on Neoverse N1

Summary

The Arm Neoverse N1 Software Optimization Guide [1], Section "4.8 Branch
instruction alignment" states:

"Consider aligning subroutine entry points and branch targets to 32B
boundaries, within the bounds of the code-density requirements of the
program."

This patch sets the preferred function alignment on Neoverse N1 to 2^4=16B.
This was already the case in some of the latest Cortex-A CPUs. Benchmarking
in previous Cortex-A CPUs suggested that 16B alignment is already better
than the default. See commit d04ee305.

The reason we don't set it to 32B right now (as the optimisation guide
suggests) is that this will impact code size and perhaps the instruction
cache performance. Therefore we need benchmark numbers first.

I have also added testing for A75 and A76 that we were missing.

[1] https://developer.arm.com/docs/swog309707/latest

Diff Detail

Repository: rL LLVM

Event Timeline

pbarrio created this revision.Aug 2 2019, 6:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2019, 6:25 AM

Herald added subscribers: hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

Harbormaster completed remote builds in B36022: Diff 213035.Aug 2 2019, 6:26 AM

Looks very sensible to me! Thanks

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
82 ↗	(On Diff #213035)	Whilst you are here can you set this to 3 too, like the A53.
127 ↗	(On Diff #213035)	This one can be 3 too I think. Like the A53.

This revision is now accepted and ready to land.Aug 2 2019, 7:40 AM

pbarrio marked 2 inline comments as done.Aug 2 2019, 8:30 AM

pbarrio added inline comments.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
82 ↗	(On Diff #213035)	I agree, I'll do that as a separate patch
127 ↗	(On Diff #213035)	I agree, I'll do that as a separate patch

Closed by commit rL367894: [AArch64] Set preferred function alignment to 16 bytes on Neoverse N1 (authored by pabbar01). · Explain WhyAug 5 2019, 10:38 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64Subtarget.cpp

2 lines

test/

CodeGen/

AArch64/

preferred-function-alignment.ll

11 lines

Diff 213401

llvm/trunk/lib/Target/AArch64/AArch64Subtarget.cpp

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	case Kryo:
CacheLineSize = 128;		CacheLineSize = 128;
PrefetchDistance = 740;		PrefetchDistance = 740;
MinPrefetchStride = 1024;		MinPrefetchStride = 1024;
MaxPrefetchIterationsAhead = 11;		MaxPrefetchIterationsAhead = 11;
// FIXME: remove this to enable 64-bit SLP if performance looks good.		// FIXME: remove this to enable 64-bit SLP if performance looks good.
MinVectorRegisterBitWidth = 128;		MinVectorRegisterBitWidth = 128;
break;		break;
case NeoverseE1:		case NeoverseE1:
		break;
case NeoverseN1:		case NeoverseN1:
		PrefFunctionAlignment = 4;
break;		break;
case Saphira:		case Saphira:
MaxInterleaveFactor = 4;		MaxInterleaveFactor = 4;
// FIXME: remove this to enable 64-bit SLP if performance looks good.		// FIXME: remove this to enable 64-bit SLP if performance looks good.
MinVectorRegisterBitWidth = 128;		MinVectorRegisterBitWidth = 128;
break;		break;
case ThunderX2T99:		case ThunderX2T99:
CacheLineSize = 64;		CacheLineSize = 64;
▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/preferred-function-alignment.ll

	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=generic < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=generic < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a35 < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a35 < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s
				; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a53 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s
				; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a57 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
				; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a72 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
				; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a73 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
				; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a75 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
				; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a76 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cyclone < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cyclone < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=falkor < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=falkor < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=kryo < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=kryo < %s \| FileCheck --check-prefixes=ALIGN2,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a53 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=neoverse-n1 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderx < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderx < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderxt81 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderxt81 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderxt83 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderxt83 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderxt88 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderxt88 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderx2t99 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=thunderx2t99 < %s \| FileCheck --check-prefixes=ALIGN3,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a57 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a72 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=cortex-a73 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=exynos-m1 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=exynos-m1 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=exynos-m2 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=exynos-m2 < %s \| FileCheck --check-prefixes=ALIGN4,CHECK %s
	; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=exynos-m3 < %s \| FileCheck --check-prefixes=ALIGN5,CHECK %s			; RUN: llc -mtriple=aarch64-unknown-linux -mcpu=exynos-m3 < %s \| FileCheck --check-prefixes=ALIGN5,CHECK %s

	define void @test() {			define void @test() {
	ret void			ret void
	}			}

	Show All 12 Lines