This is an archive of the discontinued LLVM Phabricator instance.

Make the insertion of predicate deps in the schedule graph not quadratic in the number of predicate deps.
Needs ReviewPublic

Authored by chandlerc on Jun 16 2016, 3:37 AM.

Download Raw Diff

This revision needs review, but there are no reviewers specified.

Details

Reviewers: None

Summary

Before this we would do a linear scan of all existing predicate deps to
check for overlap. When inserting N non-duplicate predicate deps,
this trivially has O(N^2).

Unfortunately, there are two different definitions of "overlap" used so
a single map is insufficient. Instead, we use an index map and a count
map. The the count map handles the preds that aren't required if there
is an exitsing edge, and the index map allows us to find an exact
existing dep and update it in constant time w.r.t. the number of
predicate deps.

This doesn't fix the linear scan over *successor* deps, but for whatever
reason, even with the *insane* schedule graph formed by
test/CodeGen/AMDGPU/spill-scavenge-offset.ll, I can't see that really
hot on my profile. If I can find a way to make that show up, I'll look
at fixing that linear scan as well.

However, one possible reason I can't see this is because fixing this
quadratic behavior immediately uncovers a second quadratic behavior. I'm
going to try to fix that next.

This fix alone is good for a 20% to 55% speed up in the above test case
prior to r272860 which somewhat avoided triggering this quadratic
behavior.. A debug build for me drops from 40s to 32s for the entire
test, and an optimized build from 6s to 4.5s. This shaves about 4s off
of my 'ninja cehck-llvm' time in debug builds where this test is one of
the tall poles. I had really hoped for more dramatic improvements but
there appears to be too much overhead and too many other quadratic
things going on...

Given that this requires two map data structures, one of which with
a decidedly non-trivial key, I'm not 100% certain this the best
approach. It would be so much nicer to have tiered structures from SUnit
to the set of deps on that edge... But that looks like a much more
invasive change. Thoughts? Personally, I still lean toward not having
a quadratic algorithm. =]

Note that these quadratic algorithms impact both the SDAG scheduling and
MI scheduling because both build the Schedule DAG. =[ =[ =[ =[

Diff Detail

Event Timeline

chandlerc updated this revision to Diff 60957.Jun 16 2016, 3:37 AM

chandlerc retitled this revision from to Make the insertion of predicate deps in the schedule graph not quadratic in the number of predicate deps..

chandlerc updated this object.

chandlerc added a subscriber: llvm-commits.

Herald added subscribers: mcrosier, MatzeB. · View Herald TranscriptJun 16 2016, 3:37 AM

MatzeB added a subscriber: atrick.Jun 16 2016, 3:50 PM

Looks reasonable to me

lib/CodeGen/ScheduleDAG.cpp
90–92	range loop?

TL;DR: The DAG builder is never supposed to form "insane" DAGs.

Why do we have a compile-time stress test checked into unit tests if we don't want to bloat make check time? This test should be attached to a bug report instead!

It's well understood that scheduling DAGs are quadratic if they naively represent all dependencies. That is a problem regardless of how the predecessor/successor edges are uniqued. We don't want to use quadratic space either.

The normal strategy for dealing with the DAG size is pinch off the dependencies at some reasonable limit. Why isn't that happening? For example, that's why we don't include stack pointer manipulation in the DAG. Are these register or memory dependencies? Support for alias analysis was recently enabled, and I expect to see situations like this as fall-out. If that's the case, the fix is to teach the DAG builder to better limit the number of memory dependencies that are being tracked.

The obvious downside to this patch is increasing the size of the DAG and constant overhead. That needs to be measured. I don't think the common case should be penalized, particularly because there are other ways to prevent pathological behavior in the DAG builder.

Sorry I don't have time to investigate this test case. I wish I did.

[Note: There are a lot of innefficiencies in the scheduling DAG because it's shared with ISEL. What makes that especially painful is that it's quite silly for ISEL to build this DAG at all].

Revision Contents

Path

Size

include/

llvm/

CodeGen/

ScheduleDAG.h

31 lines

lib/

CodeGen/

ScheduleDAG.cpp

157 lines

Diff 60957

include/llvm/CodeGen/ScheduleDAG.h

Show All 37 Lines	namespace llvm {
class MCInstrDesc;		class MCInstrDesc;
class TargetMachine;		class TargetMachine;
class TargetRegisterClass;		class TargetRegisterClass;
template<class Graph> class GraphWriter;		template<class Graph> class GraphWriter;

/// SDep - Scheduling dependency. This represents one direction of an		/// SDep - Scheduling dependency. This represents one direction of an
/// edge in the scheduling DAG.		/// edge in the scheduling DAG.
class SDep {		class SDep {
		friend struct DenseMapInfo<SDep>;

public:		public:
/// Kind - These are the different kinds of scheduling dependencies.		/// Kind - These are the different kinds of scheduling dependencies.
enum Kind {		enum Kind {
Data, ///< Regular data dependence (aka true-dependence).		Data, ///< Regular data dependence (aka true-dependence).
Anti, ///< A register anti-dependedence (aka WAR).		Anti, ///< A register anti-dependedence (aka WAR).
Output, ///< A register output-dependence (aka WAW).		Output, ///< A register output-dependence (aka WAW).
Order ///< Any other ordering dependency.		Order ///< Any other ordering dependency.
};		};
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	SUnit *OrigNode; // If not this, the node from which
// (SD scheduling only)		// (SD scheduling only)

const MCSchedClassDesc *SchedClass; // NULL or resolved SchedClass.		const MCSchedClassDesc *SchedClass; // NULL or resolved SchedClass.

// Preds/Succs - The SUnits before/after us in the graph.		// Preds/Succs - The SUnits before/after us in the graph.
SmallVector<SDep, 4> Preds; // All sunit predecessors.		SmallVector<SDep, 4> Preds; // All sunit predecessors.
SmallVector<SDep, 4> Succs; // All sunit successors.		SmallVector<SDep, 4> Succs; // All sunit successors.

		// Fast lookup for preds.
		SmallDenseMap<SDep, int, 4> PredDepIndices;
		SmallDenseMap<SUnit *, int, 4> PredUnitCounts;

typedef SmallVectorImpl<SDep>::iterator pred_iterator;		typedef SmallVectorImpl<SDep>::iterator pred_iterator;
typedef SmallVectorImpl<SDep>::iterator succ_iterator;		typedef SmallVectorImpl<SDep>::iterator succ_iterator;
typedef SmallVectorImpl<SDep>::const_iterator const_pred_iterator;		typedef SmallVectorImpl<SDep>::const_iterator const_pred_iterator;
typedef SmallVectorImpl<SDep>::const_iterator const_succ_iterator;		typedef SmallVectorImpl<SDep>::const_iterator const_succ_iterator;

unsigned NodeNum; // Entry # of node in the node vector.		unsigned NodeNum; // Entry # of node in the node vector.
unsigned NodeQueueId; // Queue id of node.		unsigned NodeQueueId; // Queue id of node.
unsigned NumPreds; // # of SDep::Data preds.		unsigned NumPreds; // # of SDep::Data preds.
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	inline bool SDep::overlaps(const SDep &Other) const {
case Output:		case Output:
return Contents.Reg == Other.Contents.Reg;		return Contents.Reg == Other.Contents.Reg;
case Order:		case Order:
return Contents.OrdKind == Other.Contents.OrdKind;		return Contents.OrdKind == Other.Contents.OrdKind;
}		}
llvm_unreachable("Invalid dependency kind!");		llvm_unreachable("Invalid dependency kind!");
}		}

		/// Specialize DenseMapInfo for SDep.
		template <> struct DenseMapInfo<SDep> {
		static inline SDep getEmptyKey() {
		return SDep(DenseMapInfo<SUnit *>::getEmptyKey(), SDep::Data, 0);
		}
		static inline SDep getTombstoneKey() {
		return SDep(DenseMapInfo<SUnit *>::getTombstoneKey(), SDep::Data, 0);
		}
		static inline unsigned getHashValue(const SDep &D) {
		typedef std::pair<PointerIntPair<SUnit *, 2, SDep::Kind>, unsigned> PairTy;
		switch (D.Dep.getInt()) {
		case SDep::Data:
		case SDep::Anti:
		case SDep::Output:
		return DenseMapInfo<PairTy>::getHashValue({D.Dep, D.Contents.Reg});
		case SDep::Order:
		return DenseMapInfo<PairTy>::getHashValue({D.Dep, D.Contents.OrdKind});
		}
		llvm_unreachable("Invalid dependency kind!");
		}
		static inline bool isEqual(const SDep &LHS, const SDep &RHS) {
		return LHS.overlaps(RHS);
		}
		};

//// getSUnit - Return the SUnit to which this edge points.		//// getSUnit - Return the SUnit to which this edge points.
inline SUnit *SDep::getSUnit() const { return Dep.getPointer(); }		inline SUnit *SDep::getSUnit() const { return Dep.getPointer(); }

//// setSUnit - Assign the SUnit to which this edge points.		//// setSUnit - Assign the SUnit to which this edge points.
inline void SDep::setSUnit(SUnit *SU) { Dep.setPointer(SU); }		inline void SDep::setSUnit(SUnit *SU) { Dep.setPointer(SU); }

/// getKind - Return an enum value representing the kind of the dependence.		/// getKind - Return an enum value representing the kind of the dependence.
inline SDep::Kind SDep::getKind() const { return Dep.getInt(); }		inline SDep::Kind SDep::getKind() const { return Dep.getInt(); }
▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

lib/CodeGen/ScheduleDAG.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	const MCInstrDesc ScheduleDAG::getNodeDesc(const SDNode Node) const {
if (!Node \|\| !Node->isMachineOpcode()) return nullptr;		if (!Node \|\| !Node->isMachineOpcode()) return nullptr;
return &TII->get(Node->getMachineOpcode());		return &TII->get(Node->getMachineOpcode());
}		}

/// addPred - This adds the specified edge as a pred of the current node if		/// addPred - This adds the specified edge as a pred of the current node if
/// not already. It also adds the current node as a successor of the		/// not already. It also adds the current node as a successor of the
/// specified node.		/// specified node.
bool SUnit::addPred(const SDep &D, bool Required) {		bool SUnit::addPred(const SDep &D, bool Required) {
// If this node already has this dependence, don't add a redundant one.		int &PredUnitCount = PredUnitCounts[D.getSUnit()];
for (SmallVectorImpl<SDep>::iterator I = Preds.begin(), E = Preds.end();		if (!Required && PredUnitCount > 0)
I != E; ++I) {		// Zero-latency weak edges may be added purely for heuristic ordering.
// Zero-latency weak edges may be added purely for heuristic ordering. Don't		// Don't add them if another kind of edge already exists.
// add them if another kind of edge already exists.
if (!Required && I->getSUnit() == D.getSUnit())
return false;		return false;
if (I->overlaps(D)) {

		auto PredIndexIAndInserted = PredDepIndices.insert({D, Preds.size()});
		int PredIndex = PredIndexIAndInserted.first->second;
		bool IndexInserted = PredIndexIAndInserted.second;

		// If this node already has this dependence, don't add a redundant one.
		if (!IndexInserted) {
		assert(PredUnitCount > 0 &&
		"Can't have a zero count for this unit but found an existing dep!");
		SDep &ExistingD = Preds[PredIndex];
		assert(ExistingD.overlaps(D) && "Mapped to a non-overlapping dep!");

// Extend the latency if needed. Equivalent to removePred(I) + addPred(D).		// Extend the latency if needed. Equivalent to removePred(I) + addPred(D).
if (I->getLatency() < D.getLatency()) {		if (ExistingD.getLatency() < D.getLatency()) {
SUnit *PredSU = I->getSUnit();		SUnit *PredSU = ExistingD.getSUnit();
// Find the corresponding successor in N.		// Find the corresponding successor in N.
SDep ForwardD = *I;		SDep ForwardD = ExistingD;
ForwardD.setSUnit(this);		ForwardD.setSUnit(this);
for (SmallVectorImpl<SDep>::iterator II = PredSU->Succs.begin(),		for (SmallVectorImpl<SDep>::iterator II = PredSU->Succs.begin(),
EE = PredSU->Succs.end(); II != EE; ++II) {		EE = PredSU->Succs.end();
		II != EE; ++II) {
		arsenmUnsubmitted Not Done Reply Inline Actions range loop? arsenm: range loop?
if (*II == ForwardD) {		if (*II == ForwardD) {
II->setLatency(D.getLatency());		II->setLatency(D.getLatency());
break;		break;
}		}
}		}
I->setLatency(D.getLatency());		ExistingD.setLatency(D.getLatency());
}		}
return false;		return false;
}		}
}
// Now add a corresponding succ to N.		// Otherwise add the pred and a corresponding succ to N.
		Preds.push_back(D);
		++PredUnitCount;

SDep P = D;		SDep P = D;
P.setSUnit(this);		P.setSUnit(this);
SUnit *N = D.getSUnit();		SUnit *N = D.getSUnit();
// Update the bookkeeping.		// Update the bookkeeping.
if (D.getKind() == SDep::Data) {		if (D.getKind() == SDep::Data) {
assert(NumPreds < UINT_MAX && "NumPreds will overflow!");		assert(NumPreds < UINT_MAX && "NumPreds will overflow!");
assert(N->NumSuccs < UINT_MAX && "NumSuccs will overflow!");		assert(N->NumSuccs < UINT_MAX && "NumSuccs will overflow!");
++NumPreds;		++NumPreds;
Show All 12 Lines	if (!isScheduled) {
if (D.isWeak()) {		if (D.isWeak()) {
++N->WeakSuccsLeft;		++N->WeakSuccsLeft;
}		}
else {		else {
assert(N->NumSuccsLeft < UINT_MAX && "NumSuccsLeft will overflow!");		assert(N->NumSuccsLeft < UINT_MAX && "NumSuccsLeft will overflow!");
++N->NumSuccsLeft;		++N->NumSuccsLeft;
}		}
}		}
Preds.push_back(D);
N->Succs.push_back(P);		N->Succs.push_back(P);
if (P.getLatency() != 0) {		if (P.getLatency() != 0) {
this->setDepthDirty();		this->setDepthDirty();
N->setHeightDirty();		N->setHeightDirty();
}		}
return true;		return true;
}		}

/// removePred - This removes the specified edge as a pred of the current		/// removePred - This removes the specified edge as a pred of the current
/// node if it exists. It also removes the current node as a successor of		/// node if it exists. It also removes the current node as a successor of
/// the specified node.		/// the specified node.
void SUnit::removePred(const SDep &D) {		void SUnit::removePred(const SDep &D) {
// Find the matching predecessor.		// Find the matching predecessor.
for (SmallVectorImpl<SDep>::iterator I = Preds.begin(), E = Preds.end();		auto PredIndexI = PredDepIndices.find(D);
I != E; ++I)		if (PredIndexI == PredDepIndices.end())
if (*I == D) {		return;

// Find the corresponding successor in N.		// Find the corresponding successor in N.
SDep P = D;		SDep P = D;
P.setSUnit(this);		P.setSUnit(this);
SUnit *N = D.getSUnit();		SUnit *N = D.getSUnit();
SmallVectorImpl<SDep>::iterator Succ = std::find(N->Succs.begin(),		SmallVectorImpl<SDep>::iterator Succ =
N->Succs.end(), P);		std::find(N->Succs.begin(), N->Succs.end(), P);
assert(Succ != N->Succs.end() && "Mismatching preds / succs lists!");		assert(Succ != N->Succs.end() && "Mismatching preds / succs lists!");
N->Succs.erase(Succ);		N->Succs.erase(Succ);
Preds.erase(I);
		// Nuke the pred.
		int PredIndex = PredIndexI->second;
		Preds.erase(Preds.begin() + PredIndex);
		PredDepIndices.erase(PredIndexI);
		// To avoid random hops in the PredDepIndices table and constant hashing, we
		// just walk the table and decrement every index above the one just removed.
		for (auto &IndexPair : PredDepIndices)
		if (IndexPair.second > PredIndex)
		--IndexPair.second;
		int &PredUnitCount = PredUnitCounts[D.getSUnit()];
		--PredUnitCount;
		assert(PredUnitCount >= 0 && "Cannot have negative counts for an SUnit!");

// Update the bookkeeping.		// Update the bookkeeping.
if (P.getKind() == SDep::Data) {		if (P.getKind() == SDep::Data) {
assert(NumPreds > 0 && "NumPreds will underflow!");		assert(NumPreds > 0 && "NumPreds will underflow!");
assert(N->NumSuccs > 0 && "NumSuccs will underflow!");		assert(N->NumSuccs > 0 && "NumSuccs will underflow!");
--NumPreds;		--NumPreds;
--N->NumSuccs;		--N->NumSuccs;
}		}
if (!N->isScheduled) {		if (!N->isScheduled) {
if (D.isWeak())		if (D.isWeak())
--WeakPredsLeft;		--WeakPredsLeft;
else {		else {
assert(NumPredsLeft > 0 && "NumPredsLeft will underflow!");		assert(NumPredsLeft > 0 && "NumPredsLeft will underflow!");
--NumPredsLeft;		--NumPredsLeft;
}		}
}		}
if (!isScheduled) {		if (!isScheduled) {
if (D.isWeak())		if (D.isWeak())
--N->WeakSuccsLeft;		--N->WeakSuccsLeft;
else {		else {
assert(N->NumSuccsLeft > 0 && "NumSuccsLeft will underflow!");		assert(N->NumSuccsLeft > 0 && "NumSuccsLeft will underflow!");
--N->NumSuccsLeft;		--N->NumSuccsLeft;
}		}
}		}
if (P.getLatency() != 0) {		if (P.getLatency() != 0) {
this->setDepthDirty();		this->setDepthDirty();
N->setHeightDirty();		N->setHeightDirty();
}		}
return;
}
}		}

void SUnit::setDepthDirty() {		void SUnit::setDepthDirty() {
if (!isDepthCurrent) return;		if (!isDepthCurrent) return;
SmallVector<SUnit*, 8> WorkList;		SmallVector<SUnit*, 8> WorkList;
WorkList.push_back(this);		WorkList.push_back(this);
do {		do {
SUnit *SU = WorkList.pop_back_val();		SUnit *SU = WorkList.pop_back_val();
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	void SUnit::biasCriticalPath() {

SUnit::pred_iterator BestI = Preds.begin();		SUnit::pred_iterator BestI = Preds.begin();
unsigned MaxDepth = BestI->getSUnit()->getDepth();		unsigned MaxDepth = BestI->getSUnit()->getDepth();
for (SUnit::pred_iterator I = std::next(BestI), E = Preds.end(); I != E;		for (SUnit::pred_iterator I = std::next(BestI), E = Preds.end(); I != E;
++I) {		++I) {
if (I->getKind() == SDep::Data && I->getSUnit()->getDepth() > MaxDepth)		if (I->getKind() == SDep::Data && I->getSUnit()->getDepth() > MaxDepth)
BestI = I;		BestI = I;
}		}
if (BestI != Preds.begin())		if (BestI != Preds.begin()) {
		int &BestIndex = PredDepIndices[*BestI];
		int &BeginIndex = PredDepIndices[*Preds.begin()];
		std::swap(BeginIndex, BestIndex);
std::swap(Preds.begin(), BestI);		std::swap(Preds.begin(), BestI);
}		}
		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
/// SUnit - Scheduling unit. It's an wrapper around either a single SDNode or		/// SUnit - Scheduling unit. It's an wrapper around either a single SDNode or
/// a group of nodes flagged together.		/// a group of nodes flagged together.
void SUnit::dump(const ScheduleDAG *G) const {		void SUnit::dump(const ScheduleDAG *G) const {
dbgs() << "SU(" << NodeNum << "): ";		dbgs() << "SU(" << NodeNum << "): ";
G->dumpNode(this);		G->dumpNode(this);
}		}
▲ Show 20 Lines • Show All 322 Lines • Show Last 20 Lines