This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Use the Cortex-A57 sched model for Cortex-A72
ClosedPublic

Authored by samparker on Oct 23 2018, 3:43 AM.

Download Raw Diff

Details

Reviewers

john.brawn
dmgreen
fhahn
javed.absar

Commits

rGa16667e79be9: [ARM] Use Cortex-A57 sched model for Cortex-A72
rL345272: [ARM] Use Cortex-A57 sched model for Cortex-A72

Summary

This mirrors what we already do for AArch64 and LNT scores are improved by a geomean of 1.57%.

Diff Detail

Repository: rL LLVM

Event Timeline

samparker created this revision.Oct 23 2018, 3:43 AM

Herald added subscribers: chrib, kristof.beyls, javed.absar. · View Herald TranscriptOct 23 2018, 3:43 AM

LGTM. AFAIK cortex-a57 and cortex-a72 are close enough for this to be beneficial. AArch64 re-uses the A57 model for A72 too.

This revision is now accepted and ready to land.Oct 23 2018, 7:27 AM

Hey Florian,

It was brought to my attention that the scheduler still wasn't enabled because of the missing feature. I've now added this and the geomean improvement is 2.23%. I will shortly add a couple of tests too.

Added the a72 to a couple of scheduling tests, as well as the basic unroll one.

Herald added a subscriber: zzheng. · View Herald TranscriptOct 24 2018, 12:43 AM

In D53562#1273812, @samparker wrote:

Hey Florian,

It was brought to my attention that the scheduler still wasn't enabled because of the missing feature. I've now added this and the geomean improvement is 2.23%. I will shortly add a couple of tests too.

So yes, to enable the machine scheduler for a core on ARM, FeatureUseMISched is required. I think it is worth splitting using the Cortex-A57 model and enabling the machine scheduler for Cortex-A72. The A57 model should be more accurate for A72 than no model.

For changing to using the MachineScheduler, the last time I looked at it (about a year ago), I remember seeing some relatively big regressions on some benchmarks, so we might want to be a bit more cautious there, as there might be potential to tweak the scheduling heuristics for ARM. Also, I think it would be good to have numbers for a large set of benchmarks (test-suite + commercial ones)

Ok, fair enough, my LNT numbers show that the MISched results are more variable:

Regressions (%):

benchmark	Model	Model + MISched
Stanford/FloatMM		40.1
McGill/queens	29.6	29.4
Stanford/Puzzle	16.05	15.6
FreeBench/mason/mason		9.27
Olden/power/power		7.61
VersaBench/ecbdes/ecbdes		5.25
BenchmarkGame/fannkuch	6.55
Fhourstones/fhourstones	5.75

Improvements (%):

benchmark	Model	Model + MISched
Stanford/Perm	21.7	22.53
Olden/mst/mst	22.05	22.21
FreeBench/fourinarow/fourinarow	22.83	21.21
TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl	14.17	19.52
TSVC/Searching-flt/Searching-flt	16.26	19.23
TSVC/Searching-dbl/Searching-dbl	16.28	19.15

I will remove the use of the feature for now.

cheers,

Closed by commit rL345272: [ARM] Use Cortex-A57 sched model for Cortex-A72 (authored by sam_parker). · Explain WhyOct 25 2018, 8:10 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARM.td

2 lines

test/

Transforms/

LoopUnroll/

ARM/

loop-unrolling.ll

1 line

Diff 171106

llvm/trunk/lib/Target/ARM/ARM.td

Show First 20 Lines • Show All 1,037 Lines • ▼ Show 20 Lines	def : ProcessorModel<"cortex-a57", CortexA57Model, [ARMv8a, ProcA57,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureCrypto,		FeatureCrypto,
FeatureCRC,		FeatureCRC,
FeatureFPAO,		FeatureFPAO,
FeatureAvoidPartialCPSR,		FeatureAvoidPartialCPSR,
FeatureCheapPredicableCPSR]>;		FeatureCheapPredicableCPSR]>;

def : ProcNoItin<"cortex-a72", [ARMv8a, ProcA72,		def : ProcessorModel<"cortex-a72", CortexA57Model, [ARMv8a, ProcA72,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureCrypto,		FeatureCrypto,
FeatureCRC]>;		FeatureCRC]>;

def : ProcNoItin<"cortex-a73", [ARMv8a, ProcA73,		def : ProcNoItin<"cortex-a73", [ARMv8a, ProcA73,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopUnroll/ARM/loop-unrolling.ll

	; RUN: opt -mtriple=armv7 -mcpu=cortex-a57 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-A			; RUN: opt -mtriple=armv7 -mcpu=cortex-a57 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-A
	; RUN: opt -mtriple=thumbv7 -mcpu=cortex-a57 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-A			; RUN: opt -mtriple=thumbv7 -mcpu=cortex-a57 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-A
				; RUN: opt -mtriple=thumbv7 -mcpu=cortex-a72 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-A
	; RUN: opt -mtriple=thumbv8m -mcpu=cortex-m23 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-T1			; RUN: opt -mtriple=thumbv8m -mcpu=cortex-m23 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-T1
	; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-T2			; RUN: opt -mtriple=thumbv8m.main -mcpu=cortex-m33 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-T2
	; RUN: opt -mtriple=thumbv7em -mcpu=cortex-m7 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-T2			; RUN: opt -mtriple=thumbv7em -mcpu=cortex-m7 -loop-unroll -S %s -o - \| FileCheck %s --check-prefix=CHECK-UNROLL-T2

	; CHECK-LABEL: partial			; CHECK-LABEL: partial
	define arm_aapcs_vfpcc void @partial(i32* nocapture %C, i32* nocapture readonly %A, i32* nocapture readonly %B) local_unnamed_addr #0 {			define arm_aapcs_vfpcc void @partial(i32* nocapture %C, i32* nocapture readonly %A, i32* nocapture readonly %B) local_unnamed_addr #0 {
	entry:			entry:
	br label %for.body			br label %for.body
	▲ Show 20 Lines • Show All 236 Lines • Show Last 20 Lines