This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/lib/Target/ARM/
-
trunk/
-
lib/
-
Target/
-
ARM/
-
ARM.td
-
ARMSubtarget.h
1
ARMSubtarget.cpp

Differential D36866

[ARM] Add PostRAScheduling option
ClosedPublic

Authored by samparker on Aug 18 2017, 2:46 AM.

Download Raw Diff

Details

Reviewers

fhahn

Commits

rG04a7db5915c9: [ARM] Add PostRAScheduler option
rL311162: [ARM] Add PostRAScheduler option

Summary

D35935 added an option for selecting the machine scheduler for ARM. This patch adds the option to allow also using the post ra scheduler, which brings the ARM backend inline with AArch64 targets. The SchedModel can also set 'PostRAScheduler', as the R52 does, so also query this property in the overridden function.

Diff Detail

Repository: rL LLVM

Event Timeline

samparker created this revision.Aug 18 2017, 2:46 AM

Herald added subscribers: kristof.beyls, javed.absar, aemerson. · View Herald TranscriptAug 18 2017, 2:46 AM

I think this change makes sense and will be convenient when we move ARM codegen to the MachineScheduler. I just have a quick question about adding FeaturePostRAScheduler to some processors.

lib/Target/ARM/ARM.td
876 ↗	(On Diff #111642)	What's the reason for adding `FeaturePostRAScheduler` to some processors but not others? It seems to be a nonfunctional change, because all changed processors seem to have FeatureThumb2, so `enablePostRAScheduler` would return true anyways.

This revision is now accepted and ready to land.Aug 18 2017, 3:18 AM

Please clarify the question about adding FeaturePostRAScheduler to some processors and maybe hold off a bit with committing, in case @t.p.northover has any comments

fhahn added a subscriber: t.p.northover.Aug 18 2017, 3:28 AM

samparker added inline comments.Aug 18 2017, 3:29 AM

lib/Target/ARM/ARM.td
876 ↗	(On Diff #111642)	I've added them to the processors that my team is currently tracking the performance of. If we want to develop some downstream schedulers and use the machine scheduler, it easily allows us to use post RA as well. Even though this can be defined in the SchedModel, I wanted to keep some consistency with your previous patch and the approach of AArch64.

Thanks for clarifying that! Sounds and looks good to me.

Closed by commit rL311162: [ARM] Add PostRAScheduler option (authored by sam_parker). · Explain WhyAug 18 2017, 7:28 AM

This revision was automatically updated to reflect the committed changes.

MatzeB added a subscriber: MatzeB.Aug 21 2017, 1:54 PM

MatzeB added inline comments.

llvm/trunk/lib/Target/ARM/ARMSubtarget.cpp
360–368	We now have 3 different ways in which the PostRA scheduler could be enabled on ARM: A subtarget feature. In the scheduling model. With the `if (!useMachineSched) && (!Thumb \|\| Thumb2)` criterion. I think just one of those options (preferably the first) would be enough.

Hi Matthias,

I agree that this would be nice, so I've had a play around it seems non-trivial to get a default configuration without breaking some tests. It is simple enough to predicate the PostRA scheduler on the availability of Thumb2 but this causes breakages for Swift. Do you know why we don't want to enable PostRA for MIScheduled arm cores? PostRA is used on almost all of the AArch64 cores along with the MIScheduler, so I'm wondering why the two approaches are different.

Thanks,
sam

In D36866#848488, @samparker wrote:

Hi Matthias,

I agree that this would be nice, so I've had a play around it seems non-trivial to get a default configuration without breaking some tests. It is simple enough to predicate the PostRA scheduler on the availability of Thumb2 but this causes breakages for Swift. Do you know why we don't want to enable PostRA for MIScheduled arm cores? PostRA is used on almost all of the AArch64 cores along with the MIScheduler, so I'm wondering why the two approaches are different.

I would skip the check of the SchedModel; you should be able to set the subtarget feature on all CPUs using such a schedmodel instead.
Similarily I thought it should be possible to check each CPU whether it uses the machine scheduler today or matches (!Thumb || Thumb2) and then add/remove the PostRA subtarget feature accordingly.
AFAIK we disabled the post ra scheduler on Swift/Cyclone because the out of order execution hides instruction latencies in nearly all cases so spending time on the post-ra scheduler didn't seem like a good use of compiletime.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARM.td

15 lines

ARMSubtarget.h

5 lines

ARMSubtarget.cpp

4 lines

Diff 111672

llvm/trunk/lib/Target/ARM/ARM.td

Show First 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	def FeatureNoNegativeImmediates
"to their negated or complemented "		"to their negated or complemented "
"equivalent when the immediate does "		"equivalent when the immediate does "
"not fit in the encoding.">;		"not fit in the encoding.">;

// Use the MachineScheduler for instruction scheduling for the subtarget.		// Use the MachineScheduler for instruction scheduling for the subtarget.
def FeatureUseMISched: SubtargetFeature<"use-misched", "UseMISched", "true",		def FeatureUseMISched: SubtargetFeature<"use-misched", "UseMISched", "true",
"Use the MachineScheduler">;		"Use the MachineScheduler">;

		def FeaturePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
		"UsePostRAScheduler", "true", "Schedule again after register allocation">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ARM architecture class		// ARM architecture class
//		//

// A-series ISA		// A-series ISA
def FeatureAClass : SubtargetFeature<"aclass", "ARMProcClass", "AClass",		def FeatureAClass : SubtargetFeature<"aclass", "ARMProcClass", "AClass",
"Is application profile ('A' series)">;		"Is application profile ('A' series)">;

▲ Show 20 Lines • Show All 530 Lines • ▼ Show 20 Lines	def : ProcessorModel<"cortex-r8", CortexA8Model, [ARMv7r,
FeatureMP,		FeatureMP,
FeatureSlowFPBrcc,		FeatureSlowFPBrcc,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureHasSlowFPVMLx,		FeatureHasSlowFPVMLx,
FeatureAvoidPartialCPSR]>;		FeatureAvoidPartialCPSR]>;

def : ProcessorModel<"cortex-m3", CortexM3Model, [ARMv7m,		def : ProcessorModel<"cortex-m3", CortexM3Model, [ARMv7m,
ProcM3,		ProcM3,
FeatureHasNoBranchPredictor]>;		FeatureHasNoBranchPredictor,
		FeaturePostRAScheduler]>;

def : ProcessorModel<"sc300", CortexM3Model, [ARMv7m,		def : ProcessorModel<"sc300", CortexM3Model, [ARMv7m,
ProcM3,		ProcM3,
FeatureHasNoBranchPredictor]>;		FeatureHasNoBranchPredictor]>;

def : ProcessorModel<"cortex-m4", CortexM3Model, [ARMv7em,		def : ProcessorModel<"cortex-m4", CortexM3Model, [ARMv7em,
FeatureVFP4,		FeatureVFP4,
FeatureVFPOnlySP,		FeatureVFPOnlySP,
FeatureD16,		FeatureD16,
FeatureHasNoBranchPredictor]>;		FeatureHasNoBranchPredictor,
		FeaturePostRAScheduler]>;

def : ProcNoItin<"cortex-m7", [ARMv7em,		def : ProcNoItin<"cortex-m7", [ARMv7em,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureD16]>;		FeatureD16,
		FeaturePostRAScheduler]>;

def : ProcNoItin<"cortex-m23", [ARMv8mBaseline,		def : ProcNoItin<"cortex-m23", [ARMv8mBaseline,
FeatureNoMovt]>;		FeatureNoMovt]>;

def : ProcessorModel<"cortex-m33", CortexM3Model, [ARMv8mMainline,		def : ProcessorModel<"cortex-m33", CortexM3Model, [ARMv8mMainline,
FeatureDSP,		FeatureDSP,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureD16,		FeatureD16,
FeatureVFPOnlySP,		FeatureVFPOnlySP,
FeatureHasNoBranchPredictor]>;		FeatureHasNoBranchPredictor,
		FeaturePostRAScheduler]>;

def : ProcNoItin<"cortex-a32", [ARMv8a,		def : ProcNoItin<"cortex-a32", [ARMv8a,
FeatureHWDivThumb,		FeatureHWDivThumb,
FeatureHWDivARM,		FeatureHWDivARM,
FeatureCrypto,		FeatureCrypto,
FeatureCRC]>;		FeatureCRC]>;

def : ProcNoItin<"cortex-a35", [ARMv8a, ProcA35,		def : ProcNoItin<"cortex-a35", [ARMv8a, ProcA35,
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMSubtarget.h

Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	protected:
bool InThumbMode = false;		bool InThumbMode = false;

/// UseSoftFloat - True if we're using software floating point features.		/// UseSoftFloat - True if we're using software floating point features.
bool UseSoftFloat = false;		bool UseSoftFloat = false;

/// UseMISched - True if MachineScheduler should be used for this subtarget.		/// UseMISched - True if MachineScheduler should be used for this subtarget.
bool UseMISched = false;		bool UseMISched = false;

		/// UsePostRAScheduler - True if scheduling should happen again after
		/// register allocation.
		bool UsePostRAScheduler = false;

/// HasThumb2 - True if Thumb2 instructions are supported.		/// HasThumb2 - True if Thumb2 instructions are supported.
bool HasThumb2 = false;		bool HasThumb2 = false;

/// NoARM - True if subtarget does not support ARM mode execution.		/// NoARM - True if subtarget does not support ARM mode execution.
bool NoARM = false;		bool NoARM = false;

/// ReserveR9 - True if R9 is not available as a general purpose register.		/// ReserveR9 - True if R9 is not available as a general purpose register.
bool ReserveR9 = false;		bool ReserveR9 = false;
▲ Show 20 Lines • Show All 453 Lines • ▼ Show 20 Lines	public:
bool isAPCS_ABI() const;		bool isAPCS_ABI() const;
bool isAAPCS_ABI() const;		bool isAAPCS_ABI() const;
bool isAAPCS16_ABI() const;		bool isAAPCS16_ABI() const;

bool isROPI() const;		bool isROPI() const;
bool isRWPI() const;		bool isRWPI() const;

bool useMachineScheduler() const { return UseMISched; }		bool useMachineScheduler() const { return UseMISched; }
		bool usePostRAScheduler() const { return UsePostRAScheduler; }
bool useSoftFloat() const { return UseSoftFloat; }		bool useSoftFloat() const { return UseSoftFloat; }
bool isThumb() const { return InThumbMode; }		bool isThumb() const { return InThumbMode; }
bool isThumb1Only() const { return InThumbMode && !HasThumb2; }		bool isThumb1Only() const { return InThumbMode && !HasThumb2; }
bool isThumb2() const { return InThumbMode && HasThumb2; }		bool isThumb2() const { return InThumbMode && HasThumb2; }
bool hasThumb2() const { return HasThumb2; }		bool hasThumb2() const { return HasThumb2; }
bool isMClass() const { return ARMProcClass == MClass; }		bool isMClass() const { return ARMProcClass == MClass; }
bool isRClass() const { return ARMProcClass == RClass; }		bool isRClass() const { return ARMProcClass == RClass; }
bool isAClass() const { return ARMProcClass == AClass; }		bool isAClass() const { return ARMProcClass == AClass; }
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMSubtarget.cpp

	Show First 20 Lines • Show All 351 Lines • ▼ Show 20 Lines
	bool ARMSubtarget::enableMachineScheduler() const {			bool ARMSubtarget::enableMachineScheduler() const {
	// Enable the MachineScheduler before register allocation for subtargets			// Enable the MachineScheduler before register allocation for subtargets
	// with the use-misched feature.			// with the use-misched feature.
	return useMachineScheduler();			return useMachineScheduler();
	}			}

	// This overrides the PostRAScheduler bit in the SchedModel for any CPU.			// This overrides the PostRAScheduler bit in the SchedModel for any CPU.
	bool ARMSubtarget::enablePostRAScheduler() const {			bool ARMSubtarget::enablePostRAScheduler() const {
				if (usePostRAScheduler())
				return true;
				if (SchedModel.PostRAScheduler)
				return true;
	// No need for PostRA scheduling on subtargets where we use the			// No need for PostRA scheduling on subtargets where we use the
	// MachineScheduler.			// MachineScheduler.
	if (useMachineScheduler())			if (useMachineScheduler())
	return false;			return false;
	return (!isThumb() \|\| hasThumb2());			return (!isThumb() \|\| hasThumb2());
				MatzeBUnsubmitted Not Done Reply Inline Actions We now have 3 different ways in which the PostRA scheduler could be enabled on ARM: A subtarget feature. In the scheduling model. With the `if (!useMachineSched) && (!Thumb \|\| Thumb2)` criterion. I think just one of those options (preferably the first) would be enough. MatzeB: We now have 3 different ways in which the PostRA scheduler could be enabled on ARM: - A…
	}			}

	bool ARMSubtarget::enableAtomicExpand() const { return hasAnyDataBarrier(); }			bool ARMSubtarget::enableAtomicExpand() const { return hasAnyDataBarrier(); }

	bool ARMSubtarget::useStride4VFPs(const MachineFunction &MF) const {			bool ARMSubtarget::useStride4VFPs(const MachineFunction &MF) const {
	// For general targets, the prologue can grow when VFPs are allocated with			// For general targets, the prologue can grow when VFPs are allocated with
	// stride 4 (more vpush instructions). But WatchOS uses a compact unwind			// stride 4 (more vpush instructions). But WatchOS uses a compact unwind
	// format which it's more important to get right.			// format which it's more important to get right.
	Show All 25 Lines