This is an archive of the discontinued LLVM Phabricator instance.

[ScheduleDAGRRList] Recompute topological ordering on demand.
ClosedPublic

Authored by fhahn on Apr 2 2019, 6:23 AM.

Download Raw Diff

Details

Reviewers

MatzeB
atrick
efriedma
niravd
paquette

Commits

rG258a425c69f0: [ScheduleDAGRRList] Recompute topological ordering on demand.
rL358583: [ScheduleDAGRRList] Recompute topological ordering on demand.

Summary

Currently there is a single point in ScheduleDAGRRList, where we
actually query the topological order (besides init code). Currently we
are recomputing the order after adding a node (which does not have
predecessors) and then we add predecessors edge-by-edge.

We can avoid adding edges one-by-one after we added a new node. In that case, we can
just rebuild the order from scratch after adding the edges to the DAG
and avoid all the updates to the ordering.

Also, we can delay updating the DAG until we query the DAG, if we keep a
list of added edges. Depending on the number of updates, we can either
apply them when needed or recompute the order from scratch.

This brings down the geomean compile time for of CTMark with -O1 down 0.2% on X86,
with no regressions.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 29945
Build 29944: arc lint + arc unit

Event Timeline

fhahn created this revision.Apr 2 2019, 6:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2019, 6:23 AM

Herald added subscribers: jdoerfert, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B29945: Diff 193275.Apr 2 2019, 6:25 AM

fhahn mentioned this in D59722: [ScheduleDAG] Avoid unnecessary recomputation of topological order..Apr 2 2019, 6:26 AM

efriedma added inline comments.Apr 2 2019, 12:03 PM

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1488	Where exactly do we query the topological order relative to this call to fixOrder()? Do we need to fixOrder() inside the loop?

fhahn marked an inline comment as done.Apr 2 2019, 12:37 PM

fhahn added inline comments.

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1488	We only query it in the loop below (WillCreateCycle call). In case we add new predecessors, we also exit the loop. And we only enter the loop, if there are some interferences. Maybe it's worth adding a comment here?

efriedma added inline comments.Apr 2 2019, 12:44 PM

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1488	Probably worth adding a comment, yes. By the way, how frequently do we hit this case in practice?

Add comment and make even lazier.

Harbormaster completed remote builds in B29996: Diff 193451.Apr 3 2019, 2:21 AM

fhahn mentioned this in D60187: [ScheduleDAG] Add statistics for maintaining the topological order..Apr 3 2019, 2:29 AM

fhahn marked 2 inline comments as done.Apr 3 2019, 2:46 AM

fhahn added inline comments.

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1488	By the way, how frequently do we hit this case in practice? I haven't looked at this call here in isolation, but I gather statistics on the impact of the number of InitDAGTopologicalSorting and AddPred calls. Number of InitDAGTopologicalSorting calls for CTMark, X86, -O1 patch base CTMark/kimwitu++/kc.test 25671.0 46740.0 CTMark/Bullet/bullet.test 16228.0 36873.0 CTMark/mafft/pairlocalalign.test 14290.0 28695.0 CTMark/lencod/lencod.test 19006.0 37880.0 CTMark/sqlite3/sqlite3.test 22447.0 49409.0 CTMark/ClamAV/clamscan.test 23087.0. 48369.0 CTMark/7zip/7zip-benchmark.test 41045.0 98737.0 CTMark/tramp3d-v4/tramp3d-v4.test 22057.0 49214.0 CTMark/SPASS/SPASS.test 19596.0 40652.0 CTMark/consumer-typeset/consumer-typeset.test 16567.0 33208.0 Number of AddPred calls for CTMark, X86, -O1 patch base CTMark/kimwitu++/kc.test 2645.0 2702.0 CTMark/Bullet/bullet.test 3619.0 3643.0 CTMark/mafft/pairlocalalign.test 3427.0 3455.0 CTMark/lencod/lencod.test 3902.0 4108.0 CTMark/sqlite3/sqlite3.test 3147.0 3179.0 CTMark/ClamAV/clamscan.test 3764.0 3821.0 CTMark/7zip/7zip-benchmark.test 7308.0 7393.0 CTMark/tramp3d-v4/tramp3d-v4.test 4311.0 4315.0 CTMark/SPASS/SPASS.test 4512.0 4577.0 CTMark/consumer-typeset/consumer-typeset.test 3695.0 3734.0 Also, the latest version of the patch improves compile-time on CTMark X86, -O1 a bit more: negative diff means a reduction in compile time. Program diff test-suite...ark/tramp3d-v4/tramp3d-v4.test -0.6% test-suite :: CTMark/Bullet/bullet.test -0.4% test-suite...:: CTMark/sqlite3/sqlite3.test -0.4% test-suite :: CTMark/kimwitu++/kc.test -0.4% test-suite...TMark/7zip/7zip-benchmark.test -0.3% test-suite :: CTMark/lencod/lencod.test -0.3% test-suite :: CTMark/SPASS/SPASS.test -0.3% test-suite...-typeset/consumer-typeset.test -0.3% test-suite...:: CTMark/ClamAV/clamscan.test -0.2% test-suite...Mark/mafft/pairlocalalign.test 0.0% Geomean difference -0.3%

efriedma added inline comments.Apr 3 2019, 1:13 PM

llvm/lib/CodeGen/ScheduleDAG.cpp
710	How terrible would it be to just call fixOrder from here, instead of making the callers check? That makes it impossible for callers to mess up, and the check should be cheap. I guess it also might be possible to add some cheap checks here to avoid calling fixOrder; for example, if TargetSU has no successors.

fhahn marked 2 inline comments as done.Apr 3 2019, 1:23 PM

fhahn added inline comments.

llvm/lib/CodeGen/ScheduleDAG.cpp
710	I guess the impact would not be too big, in fact that was what I initially had. I'll measure it and get back. I think it would be OK to push responsibility to the caller in a way and the assertion should catch any errors (the message should probably point to fixOrder()).

Fix order whenever it is queried.

Harbormaster completed remote builds in B30062: Diff 193737.Apr 4 2019, 9:53 AM

fhahn added inline comments.Apr 4 2019, 9:58 AM

llvm/lib/CodeGen/ScheduleDAG.cpp
710	I gathered numbers with having FixOrder() in IsReachable & WillCreateCycle. The numbers shifted a bit, but we still have roughly the same overall gain. It seems in some cases one approach is slightly better and for some cases the other one. I guess we should go for the safer one for now?

LGTM

This revision is now accepted and ready to land.Apr 4 2019, 11:57 AM

fhahn mentioned this in rL358058: [ScheduleDAG] Add statistics for maintaining the topological order..Apr 10 2019, 2:01 AM

fhahn mentioned this in rG83443c9a9ec7: [ScheduleDAG] Add statistics for maintaining the topological order..Apr 10 2019, 2:06 AM

Closed by commit rL358583: [ScheduleDAGRRList] Recompute topological ordering on demand. (authored by fhahn). · Explain WhyApr 17 2019, 8:03 AM

This revision was automatically updated to reflect the committed changes.

fhahn mentioned this in D60839: [ScheduleDAGInstrs] Compute topological ordering on demand..Apr 17 2019, 2:51 PM

fhahn mentioned this in rL361253: [ScheduleDAGInstrs] Compute topological ordering on demand..May 21 2019, 6:02 AM

fhahn mentioned this in rGf9b28e53c7d1: [ScheduleDAGInstrs] Compute topological ordering on demand..

sidorovd mentioned this in rG29f5deac5337: [ScheduleDAGInstrs] Compute topological ordering on demand..May 30 2019, 10:50 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

ScheduleDAG.h

18 lines

lib/

CodeGen/

ScheduleDAG.cpp

32 lines

SelectionDAG/

ScheduleDAGRRList.cpp

48 lines

Diff 193275

llvm/include/llvm/CodeGen/ScheduleDAG.h

Show First 20 Lines • Show All 685 Lines • ▼ Show 20 Lines	#endif
/// methods for dynamically updating the ordering as new edges are added.		/// methods for dynamically updating the ordering as new edges are added.
///		///
/// This allows a very fast implementation of IsReachable, for example.		/// This allows a very fast implementation of IsReachable, for example.
class ScheduleDAGTopologicalSort {		class ScheduleDAGTopologicalSort {
/// A reference to the ScheduleDAG's SUnits.		/// A reference to the ScheduleDAG's SUnits.
std::vector<SUnit> &SUnits;		std::vector<SUnit> &SUnits;
SUnit *ExitSU;		SUnit *ExitSU;

		// Have any new nodes been added?
		bool Dirty = false;

		// Outstanding added edges, that have not been applied to the ordering.
		SmallVector<std::pair<SUnit , SUnit >, 16> Updates;

/// Maps topological index to the node number.		/// Maps topological index to the node number.
std::vector<int> Index2Node;		std::vector<int> Index2Node;
/// Maps the node number to its topological index.		/// Maps the node number to its topological index.
std::vector<int> Node2Index;		std::vector<int> Node2Index;
/// a set of nodes visited during a DFS traversal.		/// a set of nodes visited during a DFS traversal.
BitVector Visited;		BitVector Visited;

/// Makes a DFS traversal and mark all nodes affected by the edge insertion.		/// Makes a DFS traversal and mark all nodes affected by the edge insertion.
Show All 27 Lines	public:

/// Returns true if addPred(TargetSU, SU) creates a cycle.		/// Returns true if addPred(TargetSU, SU) creates a cycle.
bool WillCreateCycle(SUnit TargetSU, SUnit SU);		bool WillCreateCycle(SUnit TargetSU, SUnit SU);

/// Updates the topological ordering to accommodate an edge to be		/// Updates the topological ordering to accommodate an edge to be
/// added from SUnit \p X to SUnit \p Y.		/// added from SUnit \p X to SUnit \p Y.
void AddPred(SUnit Y, SUnit X);		void AddPred(SUnit Y, SUnit X);

		/// Queues an update to the topological ordering to accommodate an edge to be
		/// added from SUnit \p X to SUnit \p Y.
		void AddPredQueued(SUnit Y, SUnit X);

/// Updates the topological ordering to accommodate an an edge to be		/// Updates the topological ordering to accommodate an an edge to be
/// removed from the specified node \p N from the predecessors of the		/// removed from the specified node \p N from the predecessors of the
/// current node \p M.		/// current node \p M.
void RemovePred(SUnit M, SUnit N);		void RemovePred(SUnit M, SUnit N);

		/// Fix the ordering, by either recomputing from scratch or by applying
		/// any outstanding updates. Uses a heuristic to estimate what will be
		/// cheaper.
		void fixOrder();

		/// Mark the ordering as temporarily broken, after a new node has been added.
		void MarkDirty() { Dirty = true; }

typedef std::vector<int>::iterator iterator;		typedef std::vector<int>::iterator iterator;
typedef std::vector<int>::const_iterator const_iterator;		typedef std::vector<int>::const_iterator const_iterator;
iterator begin() { return Index2Node.begin(); }		iterator begin() { return Index2Node.begin(); }
const_iterator begin() const { return Index2Node.begin(); }		const_iterator begin() const { return Index2Node.begin(); }
iterator end() { return Index2Node.end(); }		iterator end() { return Index2Node.end(); }
const_iterator end() const { return Index2Node.end(); }		const_iterator end() const { return Index2Node.end(); }

typedef std::vector<int>::reverse_iterator reverse_iterator;		typedef std::vector<int>::reverse_iterator reverse_iterator;
Show All 10 Lines

llvm/lib/CodeGen/ScheduleDAG.cpp

Show First 20 Lines • Show All 424 Lines • ▼ Show 20 Lines	if (isBottomUp) {
}		}
}		}
}		}
assert(!AnyNotSched);		assert(!AnyNotSched);
return SUnits.size() - DeadNodes;		return SUnits.size() - DeadNodes;
}		}
#endif		#endif


void ScheduleDAGTopologicalSort::InitDAGTopologicalSorting() {		void ScheduleDAGTopologicalSort::InitDAGTopologicalSorting() {
// The idea of the algorithm is taken from		// The idea of the algorithm is taken from
// "Online algorithms for managing the topological order of		// "Online algorithms for managing the topological order of
// a directed acyclic graph" by David J. Pearce and Paul H.J. Kelly		// a directed acyclic graph" by David J. Pearce and Paul H.J. Kelly
// This is the MNR algorithm, which was first introduced by		// This is the MNR algorithm, which was first introduced by
// A. Marchetti-Spaccamela, U. Nanni and H. Rohnert in		// A. Marchetti-Spaccamela, U. Nanni and H. Rohnert in
// "Maintaining a topological order under edge insertions".		// "Maintaining a topological order under edge insertions".
//		//
Show All 11 Lines	void ScheduleDAGTopologicalSort::InitDAGTopologicalSorting() {
//		//
// The algorithm first computes a topological ordering for the DAG by		// The algorithm first computes a topological ordering for the DAG by
// initializing the Index2Node and Node2Index arrays and then tries to keep		// initializing the Index2Node and Node2Index arrays and then tries to keep
// the ordering up-to-date after edge insertions by reordering the DAG.		// the ordering up-to-date after edge insertions by reordering the DAG.
//		//
// On insertion of the edge X->Y, the algorithm first marks by calling DFS		// On insertion of the edge X->Y, the algorithm first marks by calling DFS
// the nodes reachable from Y, and then shifts them using Shift to lie		// the nodes reachable from Y, and then shifts them using Shift to lie
// immediately after X in Index2Node.		// immediately after X in Index2Node.

		// Cancel pending updates, mark as valid.
		Dirty = false;
		Updates.clear();

unsigned DAGSize = SUnits.size();		unsigned DAGSize = SUnits.size();
std::vector<SUnit*> WorkList;		std::vector<SUnit*> WorkList;
WorkList.reserve(DAGSize);		WorkList.reserve(DAGSize);

Index2Node.resize(DAGSize);		Index2Node.resize(DAGSize);
Node2Index.resize(DAGSize);		Node2Index.resize(DAGSize);

// Initialize the data structures.		// Initialize the data structures.
Show All 36 Lines	for (SUnit &SU : SUnits) {
for (const SDep &PD : SU.Preds) {		for (const SDep &PD : SU.Preds) {
assert(Node2Index[SU.NodeNum] > Node2Index[PD.getSUnit()->NodeNum] &&		assert(Node2Index[SU.NodeNum] > Node2Index[PD.getSUnit()->NodeNum] &&
"Wrong topological sorting");		"Wrong topological sorting");
}		}
}		}
#endif		#endif
}		}

		void ScheduleDAGTopologicalSort::fixOrder() {
		// Recompute from scratch after new nodes have been added.
		if (Dirty) {
		InitDAGTopologicalSorting();
		return;
		}

		// Otherwise apply updates one-by-one.
		for (auto &U : Updates)
		AddPred(U.first, U.second);
		Updates.clear();
		}

		void ScheduleDAGTopologicalSort::AddPredQueued(SUnit Y, SUnit X) {
		// Recomputing the order from scratch is likely more efficient than applying
		// updates one-by-one for too many updates. The current cut-off is arbitrarily
		// chosen.
		Dirty \|= Updates.size() > 10;

		if (Dirty)
		return;

		Updates.emplace_back(Y, X);
		}

void ScheduleDAGTopologicalSort::AddPred(SUnit Y, SUnit X) {		void ScheduleDAGTopologicalSort::AddPred(SUnit Y, SUnit X) {
int UpperBound, LowerBound;		int UpperBound, LowerBound;
LowerBound = Node2Index[Y->NodeNum];		LowerBound = Node2Index[Y->NodeNum];
UpperBound = Node2Index[X->NodeNum];		UpperBound = Node2Index[X->NodeNum];
bool HasLoop = false;		bool HasLoop = false;
// Is Ord(X) < Ord(Y) ?		// Is Ord(X) < Ord(Y) ?
if (LowerBound < UpperBound) {		if (LowerBound < UpperBound) {
// Update the topological order.		// Update the topological order.
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	for (const SDep &PredDep : TargetSU->Preds)
if (PredDep.isAssignedRegDep() &&		if (PredDep.isAssignedRegDep() &&
IsReachable(SU, PredDep.getSUnit()))		IsReachable(SU, PredDep.getSUnit()))
return true;		return true;
return false;		return false;
}		}

bool ScheduleDAGTopologicalSort::IsReachable(const SUnit *SU,		bool ScheduleDAGTopologicalSort::IsReachable(const SUnit *SU,
const SUnit *TargetSU) {		const SUnit *TargetSU) {
		assert(!Dirty && Updates.empty() && "Topological order is outdated");
		efriedmaUnsubmitted Not Done Reply Inline Actions How terrible would it be to just call fixOrder from here, instead of making the callers check? That makes it impossible for callers to mess up, and the check should be cheap. I guess it also might be possible to add some cheap checks here to avoid calling fixOrder; for example, if TargetSU has no successors. efriedma: How terrible would it be to just call fixOrder from here, instead of making the callers check?
		fhahnAuthorUnsubmitted Done Reply Inline Actions I guess the impact would not be too big, in fact that was what I initially had. I'll measure it and get back. I think it would be OK to push responsibility to the caller in a way and the assertion should catch any errors (the message should probably point to fixOrder()). fhahn: I guess the impact would not be too big, in fact that was what I initially had. I'll measure it…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I gathered numbers with having FixOrder() in IsReachable & WillCreateCycle. The numbers shifted a bit, but we still have roughly the same overall gain. It seems in some cases one approach is slightly better and for some cases the other one. I guess we should go for the safer one for now? fhahn: I gathered numbers with having FixOrder() in IsReachable & WillCreateCycle. The numbers shifted…
// If insertion of the edge SU->TargetSU would create a cycle		// If insertion of the edge SU->TargetSU would create a cycle
// then there is a path from TargetSU to SU.		// then there is a path from TargetSU to SU.
int UpperBound, LowerBound;		int UpperBound, LowerBound;
LowerBound = Node2Index[TargetSU->NodeNum];		LowerBound = Node2Index[TargetSU->NodeNum];
UpperBound = Node2Index[SU->NodeNum];		UpperBound = Node2Index[SU->NodeNum];
bool HasLoop = false;		bool HasLoop = false;
// Is Ord(TargetSU) < Ord(SU) ?		// Is Ord(TargetSU) < Ord(SU) ?
if (LowerBound < UpperBound) {		if (LowerBound < UpperBound) {
Show All 17 Lines

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	public:
/// create a cycle.		/// create a cycle.
bool WillCreateCycle(SUnit SU, SUnit TargetSU) {		bool WillCreateCycle(SUnit SU, SUnit TargetSU) {
return Topo.WillCreateCycle(SU, TargetSU);		return Topo.WillCreateCycle(SU, TargetSU);
}		}

/// AddPred - adds a predecessor edge to SUnit SU.		/// AddPred - adds a predecessor edge to SUnit SU.
/// This returns true if this is a new predecessor.		/// This returns true if this is a new predecessor.
/// Updates the topological ordering if required.		/// Updates the topological ordering if required.
		void AddPredQueued(SUnit *SU, const SDep &D) {
		Topo.AddPredQueued(SU, D.getSUnit());
		SU->addPred(D);
		}

void AddPred(SUnit *SU, const SDep &D) {		void AddPred(SUnit *SU, const SDep &D) {
Topo.AddPred(SU, D.getSUnit());		Topo.AddPred(SU, D.getSUnit());
SU->addPred(D);		SU->addPred(D);
}		}


/// RemovePred - removes a predecessor edge from SUnit SU.		/// RemovePred - removes a predecessor edge from SUnit SU.
/// This returns true if an edge was removed.		/// This returns true if an edge was removed.
/// Updates the topological ordering if required.		/// Updates the topological ordering if required.
void RemovePred(SUnit *SU, const SDep &D) {		void RemovePred(SUnit *SU, const SDep &D) {
Topo.RemovePred(SU, D.getSUnit());		Topo.RemovePred(SU, D.getSUnit());
SU->removePred(D);		SU->removePred(D);
}		}

Show All 23 Lines	private:
bool DelayForLiveRegsBottomUp(SUnit*, SmallVectorImpl<unsigned>&);		bool DelayForLiveRegsBottomUp(SUnit*, SmallVectorImpl<unsigned>&);

void releaseInterferences(unsigned Reg = 0);		void releaseInterferences(unsigned Reg = 0);

SUnit *PickNodeToScheduleBottomUp();		SUnit *PickNodeToScheduleBottomUp();
void ListScheduleBottomUp();		void ListScheduleBottomUp();

/// CreateNewSUnit - Creates a new SUnit and returns a pointer to it.		/// CreateNewSUnit - Creates a new SUnit and returns a pointer to it.
/// Updates the topological ordering if required.
SUnit CreateNewSUnit(SDNode N) {		SUnit CreateNewSUnit(SDNode N) {
unsigned NumSUnits = SUnits.size();		unsigned NumSUnits = SUnits.size();
SUnit *NewNode = newSUnit(N);		SUnit *NewNode = newSUnit(N);
// Update the topological ordering.		// Update the topological ordering.
if (NewNode->NodeNum >= NumSUnits)		if (NewNode->NodeNum >= NumSUnits)
Topo.InitDAGTopologicalSorting();		Topo.MarkDirty();
return NewNode;		return NewNode;
}		}

/// CreateClone - Creates a new SUnit from an existing one.		/// CreateClone - Creates a new SUnit from an existing one.
/// Updates the topological ordering if required.
SUnit CreateClone(SUnit N) {		SUnit CreateClone(SUnit N) {
unsigned NumSUnits = SUnits.size();		unsigned NumSUnits = SUnits.size();
SUnit *NewNode = Clone(N);		SUnit *NewNode = Clone(N);
// Update the topological ordering.		// Update the topological ordering.
if (NewNode->NodeNum >= NumSUnits)		if (NewNode->NodeNum >= NumSUnits)
Topo.InitDAGTopologicalSorting();		Topo.MarkDirty();
return NewNode;		return NewNode;
}		}

/// forceUnitLatencies - Register-pressure-reducing scheduling doesn't		/// forceUnitLatencies - Register-pressure-reducing scheduling doesn't
/// need actual latency information but the hybrid scheduler does.		/// need actual latency information but the hybrid scheduler does.
bool forceUnitLatencies() const override {		bool forceUnitLatencies() const override {
return !NeedLatency;		return !NeedLatency;
}		}
▲ Show 20 Lines • Show All 717 Lines • ▼ Show 20 Lines	SUnit ScheduleDAGRRList::TryUnfoldSU(SUnit SU) {

bool isNewN = true;		bool isNewN = true;
SUnit *NewSU;		SUnit *NewSU;
// This can only happen when isNewLoad is false.		// This can only happen when isNewLoad is false.
if (N->getNodeId() != -1) {		if (N->getNodeId() != -1) {
NewSU = &SUnits[N->getNodeId()];		NewSU = &SUnits[N->getNodeId()];
// If NewSU has already been scheduled, we need to clone it, but this		// If NewSU has already been scheduled, we need to clone it, but this
// negates the benefit to unfolding so just return SU.		// negates the benefit to unfolding so just return SU.
if (NewSU->isScheduled)		if (NewSU->isScheduled) {
return SU;		return SU;
		}
isNewN = false;		isNewN = false;
} else {		} else {
NewSU = CreateNewSUnit(N);		NewSU = CreateNewSUnit(N);
N->setNodeId(NewSU->NodeNum);		N->setNodeId(NewSU->NodeNum);

const MCInstrDesc &MCID = TII->get(N->getMachineOpcode());		const MCInstrDesc &MCID = TII->get(N->getMachineOpcode());
for (unsigned i = 0; i != MCID.getNumOperands(); ++i) {		for (unsigned i = 0; i != MCID.getNumOperands(); ++i) {
if (MCID.getOperandConstraint(i, MCOI::TIED_TO) != -1) {		if (MCID.getOperandConstraint(i, MCOI::TIED_TO) != -1) {
Show All 36 Lines	for (SDep &Succ : SU->Succs) {
else		else
NodeSuccs.push_back(Succ);		NodeSuccs.push_back(Succ);
}		}

// Now assign edges to the newly-created nodes.		// Now assign edges to the newly-created nodes.
for (const SDep &Pred : ChainPreds) {		for (const SDep &Pred : ChainPreds) {
RemovePred(SU, Pred);		RemovePred(SU, Pred);
if (isNewLoad)		if (isNewLoad)
AddPred(LoadSU, Pred);		AddPredQueued(LoadSU, Pred);
}		}
for (const SDep &Pred : LoadPreds) {		for (const SDep &Pred : LoadPreds) {
RemovePred(SU, Pred);		RemovePred(SU, Pred);
if (isNewLoad)		if (isNewLoad)
AddPred(LoadSU, Pred);		AddPredQueued(LoadSU, Pred);
}		}
for (const SDep &Pred : NodePreds) {		for (const SDep &Pred : NodePreds) {
RemovePred(SU, Pred);		RemovePred(SU, Pred);
AddPred(NewSU, Pred);		AddPredQueued(NewSU, Pred);
}		}
for (SDep D : NodeSuccs) {		for (SDep D : NodeSuccs) {
SUnit *SuccDep = D.getSUnit();		SUnit *SuccDep = D.getSUnit();
D.setSUnit(SU);		D.setSUnit(SU);
RemovePred(SuccDep, D);		RemovePred(SuccDep, D);
D.setSUnit(NewSU);		D.setSUnit(NewSU);
AddPred(SuccDep, D);		AddPredQueued(SuccDep, D);
// Balance register pressure.		// Balance register pressure.
if (AvailableQueue->tracksRegPressure() && SuccDep->isScheduled &&		if (AvailableQueue->tracksRegPressure() && SuccDep->isScheduled &&
!D.isCtrl() && NewSU->NumRegDefsLeft > 0)		!D.isCtrl() && NewSU->NumRegDefsLeft > 0)
--NewSU->NumRegDefsLeft;		--NewSU->NumRegDefsLeft;
}		}
for (SDep D : ChainSuccs) {		for (SDep D : ChainSuccs) {
SUnit *SuccDep = D.getSUnit();		SUnit *SuccDep = D.getSUnit();
D.setSUnit(SU);		D.setSUnit(SU);
RemovePred(SuccDep, D);		RemovePred(SuccDep, D);
if (isNewLoad) {		if (isNewLoad) {
D.setSUnit(LoadSU);		D.setSUnit(LoadSU);
AddPred(SuccDep, D);		AddPredQueued(SuccDep, D);
}		}
}		}

// Add a data dependency to reflect that NewSU reads the value defined		// Add a data dependency to reflect that NewSU reads the value defined
// by LoadSU.		// by LoadSU.
SDep D(LoadSU, SDep::Data, 0);		SDep D(LoadSU, SDep::Data, 0);
D.setLatency(LoadSU->Latency);		D.setLatency(LoadSU->Latency);
AddPred(NewSU, D);		AddPredQueued(NewSU, D);

if (isNewLoad)		if (isNewLoad)
AvailableQueue->addNode(LoadSU);		AvailableQueue->addNode(LoadSU);
if (isNewN)		if (isNewN)
AvailableQueue->addNode(NewSU);		AvailableQueue->addNode(NewSU);

++NumUnfolds;		++NumUnfolds;

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	SUnit ScheduleDAGRRList::CopyAndMoveSuccessors(SUnit SU) {
}		}

LLVM_DEBUG(dbgs() << " Duplicating SU #" << SU->NodeNum << "\n");		LLVM_DEBUG(dbgs() << " Duplicating SU #" << SU->NodeNum << "\n");
NewSU = CreateClone(SU);		NewSU = CreateClone(SU);

// New SUnit has the exact same predecessors.		// New SUnit has the exact same predecessors.
for (SDep &Pred : SU->Preds)		for (SDep &Pred : SU->Preds)
if (!Pred.isArtificial())		if (!Pred.isArtificial())
AddPred(NewSU, Pred);		AddPredQueued(NewSU, Pred);

// Only copy scheduled successors. Cut them from old node's successor		// Only copy scheduled successors. Cut them from old node's successor
// list and move them over.		// list and move them over.
SmallVector<std::pair<SUnit *, SDep>, 4> DelDeps;		SmallVector<std::pair<SUnit *, SDep>, 4> DelDeps;
for (SDep &Succ : SU->Succs) {		for (SDep &Succ : SU->Succs) {
if (Succ.isArtificial())		if (Succ.isArtificial())
continue;		continue;
SUnit *SuccSU = Succ.getSUnit();		SUnit *SuccSU = Succ.getSUnit();
if (SuccSU->isScheduled) {		if (SuccSU->isScheduled) {
SDep D = Succ;		SDep D = Succ;
D.setSUnit(NewSU);		D.setSUnit(NewSU);
AddPred(SuccSU, D);		AddPredQueued(SuccSU, D);
D.setSUnit(SU);		D.setSUnit(SU);
DelDeps.push_back(std::make_pair(SuccSU, D));		DelDeps.push_back(std::make_pair(SuccSU, D));
}		}
}		}
for (auto &DelDep : DelDeps)		for (auto &DelDep : DelDeps)
RemovePred(DelDep.first, DelDep.second);		RemovePred(DelDep.first, DelDep.second);

AvailableQueue->updateNode(SU);		AvailableQueue->updateNode(SU);
Show All 22 Lines	void ScheduleDAGRRList::InsertCopiesAndMoveSuccs(SUnit *SU, unsigned Reg,
SmallVector<std::pair<SUnit *, SDep>, 4> DelDeps;		SmallVector<std::pair<SUnit *, SDep>, 4> DelDeps;
for (SDep &Succ : SU->Succs) {		for (SDep &Succ : SU->Succs) {
if (Succ.isArtificial())		if (Succ.isArtificial())
continue;		continue;
SUnit *SuccSU = Succ.getSUnit();		SUnit *SuccSU = Succ.getSUnit();
if (SuccSU->isScheduled) {		if (SuccSU->isScheduled) {
SDep D = Succ;		SDep D = Succ;
D.setSUnit(CopyToSU);		D.setSUnit(CopyToSU);
AddPred(SuccSU, D);		AddPredQueued(SuccSU, D);
DelDeps.push_back(std::make_pair(SuccSU, Succ));		DelDeps.push_back(std::make_pair(SuccSU, Succ));
}		}
else {		else {
// Avoid scheduling the def-side copy before other successors. Otherwise		// Avoid scheduling the def-side copy before other successors. Otherwise
// we could introduce another physreg interference on the copy and		// we could introduce another physreg interference on the copy and
// continue inserting copies indefinitely.		// continue inserting copies indefinitely.
AddPred(SuccSU, SDep(CopyFromSU, SDep::Artificial));		AddPredQueued(SuccSU, SDep(CopyFromSU, SDep::Artificial));
}		}
}		}
for (auto &DelDep : DelDeps)		for (auto &DelDep : DelDeps)
RemovePred(DelDep.first, DelDep.second);		RemovePred(DelDep.first, DelDep.second);

SDep FromDep(SU, SDep::Data, Reg);		SDep FromDep(SU, SDep::Data, Reg);
FromDep.setLatency(SU->Latency);		FromDep.setLatency(SU->Latency);
AddPred(CopyFromSU, FromDep);		AddPredQueued(CopyFromSU, FromDep);
SDep ToDep(CopyFromSU, SDep::Data, 0);		SDep ToDep(CopyFromSU, SDep::Data, 0);
ToDep.setLatency(CopyFromSU->Latency);		ToDep.setLatency(CopyFromSU->Latency);
AddPred(CopyToSU, ToDep);		AddPredQueued(CopyToSU, ToDep);

AvailableQueue->updateNode(SU);		AvailableQueue->updateNode(SU);
AvailableQueue->addNode(CopyFromSU);		AvailableQueue->addNode(CopyFromSU);
AvailableQueue->addNode(CopyToSU);		AvailableQueue->addNode(CopyToSU);
Copies.push_back(CopyFromSU);		Copies.push_back(CopyFromSU);
Copies.push_back(CopyToSU);		Copies.push_back(CopyToSU);

		Topo.InitDAGTopologicalSorting();
++NumPRCopies;		++NumPRCopies;
}		}

/// getPhysicalRegisterVT - Returns the ValueType of the physical register		/// getPhysicalRegisterVT - Returns the ValueType of the physical register
/// definition of the specified node.		/// definition of the specified node.
/// FIXME: Move to SelectionDAG?		/// FIXME: Move to SelectionDAG?
static MVT getPhysicalRegisterVT(SDNode *N, unsigned Reg,		static MVT getPhysicalRegisterVT(SDNode *N, unsigned Reg,
const TargetInstrInfo *TII) {		const TargetInstrInfo *TII) {
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	while (CurSU) {
}		}
CurSU = AvailableQueue->pop();		CurSU = AvailableQueue->pop();
}		}
};		};
FindAvailableNode();		FindAvailableNode();
if (CurSU)		if (CurSU)
return CurSU;		return CurSU;

		if (!Interferences.empty())
		Topo.fixOrder();
		efriedmaUnsubmitted Not Done Reply Inline Actions Where exactly do we query the topological order relative to this call to fixOrder()? Do we need to fixOrder() inside the loop? efriedma: Where exactly do we query the topological order relative to this call to fixOrder()? Do we…
		fhahnAuthorUnsubmitted Done Reply Inline Actions We only query it in the loop below (WillCreateCycle call). In case we add new predecessors, we also exit the loop. And we only enter the loop, if there are some interferences. Maybe it's worth adding a comment here? fhahn: We only query it in the loop below (WillCreateCycle call). In case we add new predecessors, we…
		efriedmaUnsubmitted Done Reply Inline Actions Probably worth adding a comment, yes. By the way, how frequently do we hit this case in practice? efriedma: Probably worth adding a comment, yes. By the way, how frequently do we hit this case in…
		fhahnAuthorUnsubmitted Done Reply Inline Actions By the way, how frequently do we hit this case in practice? I haven't looked at this call here in isolation, but I gather statistics on the impact of the number of InitDAGTopologicalSorting and AddPred calls. Number of InitDAGTopologicalSorting calls for CTMark, X86, -O1 patch base CTMark/kimwitu++/kc.test 25671.0 46740.0 CTMark/Bullet/bullet.test 16228.0 36873.0 CTMark/mafft/pairlocalalign.test 14290.0 28695.0 CTMark/lencod/lencod.test 19006.0 37880.0 CTMark/sqlite3/sqlite3.test 22447.0 49409.0 CTMark/ClamAV/clamscan.test 23087.0. 48369.0 CTMark/7zip/7zip-benchmark.test 41045.0 98737.0 CTMark/tramp3d-v4/tramp3d-v4.test 22057.0 49214.0 CTMark/SPASS/SPASS.test 19596.0 40652.0 CTMark/consumer-typeset/consumer-typeset.test 16567.0 33208.0 Number of AddPred calls for CTMark, X86, -O1 patch base CTMark/kimwitu++/kc.test 2645.0 2702.0 CTMark/Bullet/bullet.test 3619.0 3643.0 CTMark/mafft/pairlocalalign.test 3427.0 3455.0 CTMark/lencod/lencod.test 3902.0 4108.0 CTMark/sqlite3/sqlite3.test 3147.0 3179.0 CTMark/ClamAV/clamscan.test 3764.0 3821.0 CTMark/7zip/7zip-benchmark.test 7308.0 7393.0 CTMark/tramp3d-v4/tramp3d-v4.test 4311.0 4315.0 CTMark/SPASS/SPASS.test 4512.0 4577.0 CTMark/consumer-typeset/consumer-typeset.test 3695.0 3734.0 Also, the latest version of the patch improves compile-time on CTMark X86, -O1 a bit more: negative diff means a reduction in compile time. Program diff test-suite...ark/tramp3d-v4/tramp3d-v4.test -0.6% test-suite :: CTMark/Bullet/bullet.test -0.4% test-suite...:: CTMark/sqlite3/sqlite3.test -0.4% test-suite :: CTMark/kimwitu++/kc.test -0.4% test-suite...TMark/7zip/7zip-benchmark.test -0.3% test-suite :: CTMark/lencod/lencod.test -0.3% test-suite :: CTMark/SPASS/SPASS.test -0.3% test-suite...-typeset/consumer-typeset.test -0.3% test-suite...:: CTMark/ClamAV/clamscan.test -0.2% test-suite...Mark/mafft/pairlocalalign.test 0.0% Geomean difference -0.3% fhahn: > By the way, how frequently do we hit this case in practice? I haven't looked at this call…
// All candidates are delayed due to live physical reg dependencies.		// All candidates are delayed due to live physical reg dependencies.
// Try backtracking, code duplication, or inserting cross class copies		// Try backtracking, code duplication, or inserting cross class copies
// to resolve it.		// to resolve it.
for (SUnit *TrySU : Interferences) {		for (SUnit *TrySU : Interferences) {
SmallVectorImpl<unsigned> &LRegs = LRegsMap[TrySU];		SmallVectorImpl<unsigned> &LRegs = LRegsMap[TrySU];

// Try unscheduling up to the point where it's safe to schedule		// Try unscheduling up to the point where it's safe to schedule
// this node.		// this node.
Show All 13 Lines	if (!WillCreateCycle(TrySU, BtSU)) {
// requires the physical reg dep.		// requires the physical reg dep.
if (BtSU->isAvailable) {		if (BtSU->isAvailable) {
BtSU->isAvailable = false;		BtSU->isAvailable = false;
if (!BtSU->isPending)		if (!BtSU->isPending)
AvailableQueue->remove(BtSU);		AvailableQueue->remove(BtSU);
}		}
LLVM_DEBUG(dbgs() << "ARTIFICIAL edge from SU(" << BtSU->NodeNum		LLVM_DEBUG(dbgs() << "ARTIFICIAL edge from SU(" << BtSU->NodeNum
<< ") to SU(" << TrySU->NodeNum << ")\n");		<< ") to SU(" << TrySU->NodeNum << ")\n");
AddPred(TrySU, SDep(BtSU, SDep::Artificial));		AddPredQueued(TrySU, SDep(BtSU, SDep::Artificial));

// If one or more successors has been unscheduled, then the current		// If one or more successors has been unscheduled, then the current
// node is no longer available.		// node is no longer available.
if (!TrySU->isAvailable \|\| !TrySU->NodeQueueId) {		if (!TrySU->isAvailable \|\| !TrySU->NodeQueueId) {
LLVM_DEBUG(dbgs() << "TrySU not available; choosing node from queue\n");		LLVM_DEBUG(dbgs() << "TrySU not available; choosing node from queue\n");
CurSU = AvailableQueue->pop();		CurSU = AvailableQueue->pop();
} else {		} else {
LLVM_DEBUG(dbgs() << "TrySU available\n");		LLVM_DEBUG(dbgs() << "TrySU available\n");
Show All 37 Lines	if (DestRC != RC) {
report_fatal_error("Can't handle live physical register dependency!");		report_fatal_error("Can't handle live physical register dependency!");
}		}
if (!NewDef) {		if (!NewDef) {
// Issue copies, these can be expensive cross register class copies.		// Issue copies, these can be expensive cross register class copies.
SmallVector<SUnit*, 2> Copies;		SmallVector<SUnit*, 2> Copies;
InsertCopiesAndMoveSuccs(LRDef, Reg, DestRC, RC, Copies);		InsertCopiesAndMoveSuccs(LRDef, Reg, DestRC, RC, Copies);
LLVM_DEBUG(dbgs() << " Adding an edge from SU #" << TrySU->NodeNum		LLVM_DEBUG(dbgs() << " Adding an edge from SU #" << TrySU->NodeNum
<< " to SU #" << Copies.front()->NodeNum << "\n");		<< " to SU #" << Copies.front()->NodeNum << "\n");
AddPred(TrySU, SDep(Copies.front(), SDep::Artificial));		AddPredQueued(TrySU, SDep(Copies.front(), SDep::Artificial));
NewDef = Copies.back();		NewDef = Copies.back();
}		}

LLVM_DEBUG(dbgs() << " Adding an edge from SU #" << NewDef->NodeNum		LLVM_DEBUG(dbgs() << " Adding an edge from SU #" << NewDef->NodeNum
<< " to SU #" << TrySU->NodeNum << "\n");		<< " to SU #" << TrySU->NodeNum << "\n");
LiveRegDefs[Reg] = NewDef;		LiveRegDefs[Reg] = NewDef;
AddPred(NewDef, SDep(TrySU, SDep::Artificial));		AddPredQueued(NewDef, SDep(TrySU, SDep::Artificial));
TrySU->isAvailable = false;		TrySU->isAvailable = false;
CurSU = NewDef;		CurSU = NewDef;
}		}
assert(CurSU && "Unable to resolve live physical register dependencies!");		assert(CurSU && "Unable to resolve live physical register dependencies!");
return CurSU;		return CurSU;
}		}

/// ListScheduleBottomUp - The main loop of list scheduling for bottom-up		/// ListScheduleBottomUp - The main loop of list scheduling for bottom-up
▲ Show 20 Lines • Show All 1,593 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ScheduleDAGRRList] Recompute topological ordering on demand.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 193275

llvm/include/llvm/CodeGen/ScheduleDAG.h

llvm/lib/CodeGen/ScheduleDAG.cpp

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp

[ScheduleDAGRRList] Recompute topological ordering on demand.
ClosedPublic