This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Enable useAA() for the in-order Cortex-R52
ClosedPublic

Authored by dmgreen on Jun 12 2018, 4:59 AM.

Download Raw Diff

Details

Reviewers

efriedma
javed.absar
fhahn
t.p.northover
rengolin
hfinkel

Commits

rG21a2973cc4c2: [ARM] Enable useAA() for the in-order Cortex-R52
rL335249: [ARM] Enable useAA() for the in-order Cortex-R52

Summary

This option allows codegen (such as DAGCombine or MI scheduling) to use alias analysis information, which can help with the codegen on in-order cpu's. Here I have done things the same way as AArch64, adding a subtarget feature to enable this for specific cores.

I was going to enable this for A53 too, but seeing as we happen to not have a AArch32 A53 schedule, the usefulness is not as high as R52.

Diff Detail

Repository: rL LLVM

Event Timeline

dmgreen created this revision.Jun 12 2018, 4:59 AM

Herald added subscribers: chrib, kristof.beyls. · View Herald TranscriptJun 12 2018, 4:59 AM

Requires D48029 to survives a bootstrap, but that looks like a more generic error than having to use this option. Otherwise I believe this is safe.

dmgreen edited reviewers, added: t.p.northover, rengolin; removed: hfinkel.Jun 12 2018, 5:09 AM

dmgreen added a reviewer: hfinkel.

LGTM but will wait for others to comment as well before accepting

Thanks

javed.absar accepted this revision.Jun 14 2018, 12:45 AM

This revision is now accepted and ready to land.Jun 14 2018, 12:45 AM

I'm generally not a fan of having features like this, which have widespread implications, turned on only for certain target CPUs; it tends to make it much harder to find bugs, since the code gets little testing. But I guess this is okay for now.

Yes I can see that. I would have liked to turn this on for more in-order cores, but without scheduling enough to at least say that a load takes multiple cycles, I didn't feel I had a great justification. For the record, these were the changes I saw on a A53 with useAA returning true (units are time, so lower is better. these are more than 2%):

SingleSource/Benchmarks/BenchmarkGame/n-body -14.38%
SingleSource/Benchmarks/Shootout/Shootout-lists -6.40%
SingleSource/Benchmarks/Misc-C++/Large/ray -6.20%
MultiSource/Applications/ALAC/encode/alacconvert-encode -5.44%
MultiSource/Benchmarks/McCat/17-bintr/bintr -3.27%
SingleSource/Benchmarks/CoyoteBench/huffbench -3.15%
MultiSource/Benchmarks/SciMark2-C/scimark2 -2.97%
MultiSource/Benchmarks/Bullet/bullet -2.50%
MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt -2.33%
SingleSource/Benchmarks/Misc/richards_benchmark -2.20%
MultiSource/Benchmarks/Ptrdist/yacr2/yacr2 +4.60%
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 +9.66%

They don't look too bad, but there are some decreases. enc-pc1 is genuinely worse, yacr2 might be noise. And without instruction scheduling, they may be getting lucky. Compile time increase was roughly 0.25% on CT-mark (may not be statistically significant, but it was enough alternating runs to make me think it's probably close).

I tried it on the A72 too, on both T32 and A64, with more varied results, both showing several large increases in places (including a memcpy benchmark). This option, as far as I can tell, should give more freedom to the DAG, but that may not be used in the best way all the time.

Closed by commit rL335249: [ARM] Enable useAA() for the in-order Cortex-R52 (authored by dmgreen). · Explain WhyJun 21 2018, 8:52 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARM.td

7 lines

ARMSubtarget.h

7 lines

test/

CodeGen/

ARM/

useaa.ll

26 lines

Diff 152316

llvm/trunk/lib/Target/ARM/ARM.td

	Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines
	// Use the MachineScheduler for instruction scheduling for the subtarget.			// Use the MachineScheduler for instruction scheduling for the subtarget.
	def FeatureUseMISched: SubtargetFeature<"use-misched", "UseMISched", "true",			def FeatureUseMISched: SubtargetFeature<"use-misched", "UseMISched", "true",
	"Use the MachineScheduler">;			"Use the MachineScheduler">;

	def FeatureNoPostRASched : SubtargetFeature<"disable-postra-scheduler",			def FeatureNoPostRASched : SubtargetFeature<"disable-postra-scheduler",
	"DisablePostRAScheduler", "true",			"DisablePostRAScheduler", "true",
	"Don't schedule again after register allocation">;			"Don't schedule again after register allocation">;

				// Enable use of alias analysis during code generation
				def FeatureUseAA : SubtargetFeature<"use-aa", "UseAA", "true",
				"Use alias analysis during codegen">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// ARM architecture class			// ARM architecture class
	//			//

	// A-series ISA			// A-series ISA
	def FeatureAClass : SubtargetFeature<"aclass", "ARMProcClass", "AClass",			def FeatureAClass : SubtargetFeature<"aclass", "ARMProcClass", "AClass",
	"Is application profile ('A' series)">;			"Is application profile ('A' series)">;

	▲ Show 20 Lines • Show All 660 Lines • ▼ Show 20 Lines
	def : ProcNoItin<"kryo", [ARMv8a, ProcKryo,			def : ProcNoItin<"kryo", [ARMv8a, ProcKryo,
	FeatureHWDivThumb,			FeatureHWDivThumb,
	FeatureHWDivARM,			FeatureHWDivARM,
	FeatureCrypto,			FeatureCrypto,
	FeatureCRC]>;			FeatureCRC]>;

	def : ProcessorModel<"cortex-r52", CortexR52Model, [ARMv8r, ProcR52,			def : ProcessorModel<"cortex-r52", CortexR52Model, [ARMv8r, ProcR52,
	FeatureUseMISched,			FeatureUseMISched,
	FeatureFPAO]>;			FeatureFPAO,
				FeatureUseAA]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Register File Description			// Register File Description
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "ARMRegisterInfo.td"			include "ARMRegisterInfo.td"
	include "ARMRegisterBanks.td"			include "ARMRegisterBanks.td"
	include "ARMCallingConv.td"			include "ARMCallingConv.td"
	Show All 37 Lines

llvm/trunk/lib/Target/ARM/ARMSubtarget.h

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	protected:

/// UseMISched - True if MachineScheduler should be used for this subtarget.		/// UseMISched - True if MachineScheduler should be used for this subtarget.
bool UseMISched = false;		bool UseMISched = false;

/// DisablePostRAScheduler - False if scheduling should happen again after		/// DisablePostRAScheduler - False if scheduling should happen again after
/// register allocation.		/// register allocation.
bool DisablePostRAScheduler = false;		bool DisablePostRAScheduler = false;

		/// UseAA - True if using AA during codegen (DAGCombine, MISched, etc)
		bool UseAA = false;

/// HasThumb2 - True if Thumb2 instructions are supported.		/// HasThumb2 - True if Thumb2 instructions are supported.
bool HasThumb2 = false;		bool HasThumb2 = false;

/// NoARM - True if subtarget does not support ARM mode execution.		/// NoARM - True if subtarget does not support ARM mode execution.
bool NoARM = false;		bool NoARM = false;

/// ReserveR9 - True if R9 is not available as a general purpose register.		/// ReserveR9 - True if R9 is not available as a general purpose register.
bool ReserveR9 = false;		bool ReserveR9 = false;
▲ Show 20 Lines • Show All 509 Lines • ▼ Show 20 Lines	public:
unsigned getMispredictionPenalty() const;		unsigned getMispredictionPenalty() const;

/// Returns true if machine scheduler should be enabled.		/// Returns true if machine scheduler should be enabled.
bool enableMachineScheduler() const override;		bool enableMachineScheduler() const override;

/// True for some subtargets at > -O0.		/// True for some subtargets at > -O0.
bool enablePostRAScheduler() const override;		bool enablePostRAScheduler() const override;

		/// Enable use of alias analysis during code generation (during MI
		/// scheduling, DAGCombine, etc.).
		bool useAA() const override { return UseAA; }

// enableAtomicExpand- True if we need to expand our atomics.		// enableAtomicExpand- True if we need to expand our atomics.
bool enableAtomicExpand() const override;		bool enableAtomicExpand() const override;

/// getInstrItins - Return the instruction itineraries based on subtarget		/// getInstrItins - Return the instruction itineraries based on subtarget
/// selection.		/// selection.
const InstrItineraryData *getInstrItineraryData() const override {		const InstrItineraryData *getInstrItineraryData() const override {
return &InstrItins;		return &InstrItins;
}		}
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/useaa.ll

				; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=cortex-r52 \| FileCheck %s --check-prefix=CHECK --check-prefix=USEAA
				; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=generic \| FileCheck %s --check-prefix=CHECK --check-prefix=GENERIC

				; Check we use AA during codegen, so can interleave these loads/stores.

				; CHECK-LABEL: test
				; GENERIC: ldr
				; GENERIC: str
				; GENERIC: ldr
				; GENERIC: str
				; USEAA: ldr
				; USEAA: ldr
				; USEAA: str
				; USEAA: str

				define void @test(i32* nocapture %a, i32* noalias nocapture %b) {
				entry:
				%0 = load i32, i32* %a, align 4
				%add = add nsw i32 %0, 10
				store i32 %add, i32* %a, align 4
				%1 = load i32, i32* %b, align 4
				%add2 = add nsw i32 %1, 20
				store i32 %add2, i32* %b, align 4
				ret void
				}