This is an archive of the discontinued LLVM Phabricator instance.

MachineScheduler: Refactor setPolicy() to limit computing remaining latency
ClosedPublic

Authored by tstellar on Aug 8 2018, 6:14 PM.

Download Raw Diff

Details

Reviewers

atrick
MatzeB
airlied
mareko

Commits

rGecd6aa5be2e8: MachineScheduler: Refactor setPolicy() to limit computing remaining latency
rL340346: MachineScheduler: Refactor setPolicy() to limit computing remaining latency

Summary

Computing the remaining latency can be very expensive especially
on graphs of N nodes where the number of edges approaches N^2.

This reduces the compile time of a pathological case with the
AMDGPU backend from ~7.5 seconds to ~3 seconds. This test case has
a basic block with 2655 stores, each with somewhere between 500
and 1500 successors and predecessors.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 21279
Build 21279: arc lint + arc unit

Event Timeline

tstellar created this revision.Aug 8 2018, 6:14 PM

Herald added subscribers: javed.absar, tpr. · View Herald TranscriptAug 8 2018, 6:14 PM

Harbormaster completed remote builds in B21279: Diff 159841.Aug 8 2018, 6:14 PM

mareko accepted this revision.Aug 10 2018, 9:30 AM

This revision is now accepted and ready to land.Aug 10 2018, 9:30 AM

@atrick does this look OK to you?

Looks fine to me.

lib/CodeGen/MachineScheduler.cpp
2429	I suggest adding a brief comment here. I think this is just a quick check to bypass computeRemLatency, and ultimately we're trying to determine whether the current cycle plus remaining latency is greater than the critical path in the scheduling region.

Closed by commit rL340346: MachineScheduler: Refactor setPolicy() to limit computing remaining latency (authored by tstellar). · Explain WhyAug 21 2018, 2:49 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

MachineScheduler.h

4 lines

lib/

CodeGen/

MachineScheduler.cpp

84 lines

Diff 159841

include/llvm/CodeGen/MachineScheduler.h

Show First 20 Lines • Show All 889 Lines • ▼ Show 20 Lines	protected:
GenericSchedulerBase(const MachineSchedContext *C) : Context(C) {}		GenericSchedulerBase(const MachineSchedContext *C) : Context(C) {}

void setPolicy(CandPolicy &Policy, bool IsPostRA, SchedBoundary &CurrZone,		void setPolicy(CandPolicy &Policy, bool IsPostRA, SchedBoundary &CurrZone,
SchedBoundary *OtherZone);		SchedBoundary *OtherZone);

#ifndef NDEBUG		#ifndef NDEBUG
void traceCandidate(const SchedCandidate &Cand);		void traceCandidate(const SchedCandidate &Cand);
#endif		#endif

		private:
		bool shouldReduceLatency(const CandPolicy &Policy, SchedBoundary &CurrZone,
		bool ComputeRemLatency, unsigned &RemLatency) const;
};		};

// Utility functions used by heuristics in tryCandidate().		// Utility functions used by heuristics in tryCandidate().
bool tryLess(int TryVal, int CandVal,		bool tryLess(int TryVal, int CandVal,
GenericSchedulerBase::SchedCandidate &TryCand,		GenericSchedulerBase::SchedCandidate &TryCand,
GenericSchedulerBase::SchedCandidate &Cand,		GenericSchedulerBase::SchedCandidate &Cand,
GenericSchedulerBase::CandReason Reason);		GenericSchedulerBase::CandReason Reason);
bool tryGreater(int TryVal, int CandVal,		bool tryGreater(int TryVal, int CandVal,
▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

lib/CodeGen/MachineScheduler.cpp

Show First 20 Lines • Show All 2,391 Lines • ▼ Show 20 Lines	for (TargetSchedModel::ProcResIter
PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
if (PI->ProcResourceIdx == Policy.ReduceResIdx)		if (PI->ProcResourceIdx == Policy.ReduceResIdx)
ResDelta.CritResources += PI->Cycles;		ResDelta.CritResources += PI->Cycles;
if (PI->ProcResourceIdx == Policy.DemandResIdx)		if (PI->ProcResourceIdx == Policy.DemandResIdx)
ResDelta.DemandedResources += PI->Cycles;		ResDelta.DemandedResources += PI->Cycles;
}		}
}		}

		/// Compute remaining latency. We need this both to determine whether the
		/// overall schedule has become latency-limited and whether the instructions
		/// outside this zone are resource or latency limited.
		///
		/// The "dependent" latency is updated incrementally during scheduling as the
		/// max height/depth of scheduled nodes minus the cycles since it was
		/// scheduled:
		/// DLat = max (N.depth - (CurrCycle - N.ReadyCycle) for N in Zone
		///
		/// The "independent" latency is the max ready queue depth:
		/// ILat = max N.depth for N in Available\|Pending
		///
		/// RemainingLatency is the greater of independent and dependent latency.
		///
		/// These computations are expensive, especially in DAGs with many edges, so
		/// only do them if necessary.
		static unsigned computeRemLatency(SchedBoundary &CurrZone) {
		unsigned RemLatency = CurrZone.getDependentLatency();
		RemLatency = std::max(RemLatency,
		CurrZone.findMaxLatency(CurrZone.Available.elements()));
		RemLatency = std::max(RemLatency,
		CurrZone.findMaxLatency(CurrZone.Pending.elements()));
		return RemLatency;
		}

		bool GenericSchedulerBase::shouldReduceLatency(const CandPolicy &Policy,
		SchedBoundary &CurrZone,
		bool ComputeRemLatency,
		unsigned &RemLatency) const {
		if (CurrZone.getCurrCycle() > Rem.CriticalPath)
		atrickUnsubmitted Not Done Reply Inline Actions I suggest adding a brief comment here. I think this is just a quick check to bypass computeRemLatency, and ultimately we're trying to determine whether the current cycle plus remaining latency is greater than the critical path in the scheduling region. atrick: I suggest adding a brief comment here. I think this is just a quick check to bypass…
		return true;

		// If we haven't scheduled anything yet, then we aren't latency limited.
		if (CurrZone.getCurrCycle() == 0)
		return false;

		if (ComputeRemLatency)
		RemLatency = computeRemLatency(CurrZone);

		return RemLatency + CurrZone.getCurrCycle() > Rem.CriticalPath;
		}

/// Set the CandPolicy given a scheduling zone given the current resources and		/// Set the CandPolicy given a scheduling zone given the current resources and
/// latencies inside and outside the zone.		/// latencies inside and outside the zone.
void GenericSchedulerBase::setPolicy(CandPolicy &Policy, bool IsPostRA,		void GenericSchedulerBase::setPolicy(CandPolicy &Policy, bool IsPostRA,
SchedBoundary &CurrZone,		SchedBoundary &CurrZone,
SchedBoundary *OtherZone) {		SchedBoundary *OtherZone) {
// Apply preemptive heuristics based on the total latency and resources		// Apply preemptive heuristics based on the total latency and resources
// inside and outside this zone. Potential stalls should be considered before		// inside and outside this zone. Potential stalls should be considered before
// following this policy.		// following this policy.

// Compute remaining latency. We need this both to determine whether the
// overall schedule has become latency-limited and whether the instructions
// outside this zone are resource or latency limited.
//
// The "dependent" latency is updated incrementally during scheduling as the
// max height/depth of scheduled nodes minus the cycles since it was
// scheduled:
// DLat = max (N.depth - (CurrCycle - N.ReadyCycle) for N in Zone
//
// The "independent" latency is the max ready queue depth:
// ILat = max N.depth for N in Available\|Pending
//
// RemainingLatency is the greater of independent and dependent latency.
unsigned RemLatency = CurrZone.getDependentLatency();
RemLatency = std::max(RemLatency,
CurrZone.findMaxLatency(CurrZone.Available.elements()));
RemLatency = std::max(RemLatency,
CurrZone.findMaxLatency(CurrZone.Pending.elements()));

// Compute the critical resource outside the zone.		// Compute the critical resource outside the zone.
unsigned OtherCritIdx = 0;		unsigned OtherCritIdx = 0;
unsigned OtherCount =		unsigned OtherCount =
OtherZone ? OtherZone->getOtherResourceCount(OtherCritIdx) : 0;		OtherZone ? OtherZone->getOtherResourceCount(OtherCritIdx) : 0;

bool OtherResLimited = false;		bool OtherResLimited = false;
if (SchedModel->hasInstrSchedModel())		unsigned RemLatency = 0;
		bool RemLatencyComputed = false;
		if (SchedModel->hasInstrSchedModel() && OtherCount != 0) {
		RemLatency = computeRemLatency(CurrZone);
		RemLatencyComputed = true;
OtherResLimited = checkResourceLimit(SchedModel->getLatencyFactor(),		OtherResLimited = checkResourceLimit(SchedModel->getLatencyFactor(),
OtherCount, RemLatency);		OtherCount, RemLatency);
		}

// Schedule aggressively for latency in PostRA mode. We don't check for		// Schedule aggressively for latency in PostRA mode. We don't check for
// acyclic latency during PostRA, and highly out-of-order processors will		// acyclic latency during PostRA, and highly out-of-order processors will
// skip PostRA scheduling.		// skip PostRA scheduling.
if (!OtherResLimited) {		if (!OtherResLimited &&
if (IsPostRA \|\| (RemLatency + CurrZone.getCurrCycle() > Rem.CriticalPath)) {		(IsPostRA \|\| shouldReduceLatency(Policy, CurrZone, !RemLatencyComputed,
		RemLatency))) {
Policy.ReduceLatency \|= true;		Policy.ReduceLatency \|= true;
LLVM_DEBUG(dbgs() << " " << CurrZone.Available.getName()		LLVM_DEBUG(dbgs() << " " << CurrZone.Available.getName()
<< " RemainingLatency " << RemLatency << " + "		<< " RemainingLatency " << RemLatency << " + "
<< CurrZone.getCurrCycle() << "c > CritPath "		<< CurrZone.getCurrCycle() << "c > CritPath "
<< Rem.CriticalPath << "\n");		<< Rem.CriticalPath << "\n");
}		}
}
// If the same resource is limiting inside and outside the zone, do nothing.		// If the same resource is limiting inside and outside the zone, do nothing.
if (CurrZone.getZoneCritResIdx() == OtherCritIdx)		if (CurrZone.getZoneCritResIdx() == OtherCritIdx)
return;		return;

LLVM_DEBUG(if (CurrZone.isResourceLimited()) {		LLVM_DEBUG(if (CurrZone.isResourceLimited()) {
dbgs() << " " << CurrZone.Available.getName() << " ResourceLimited: "		dbgs() << " " << CurrZone.Available.getName() << " ResourceLimited: "
<< SchedModel->getResourceName(CurrZone.getZoneCritResIdx()) << "\n";		<< SchedModel->getResourceName(CurrZone.getZoneCritResIdx()) << "\n";
} if (OtherResLimited) dbgs()		} if (OtherResLimited) dbgs()
▲ Show 20 Lines • Show All 1,186 Lines • Show Last 20 Lines