This is an archive of the discontinued LLVM Phabricator instance.

[ScheduleDAGRRList] Recompute topological ordering on demand.
ClosedPublic

Authored by fhahn on Apr 2 2019, 6:23 AM.

Download Raw Diff

Details

Reviewers

MatzeB
atrick
efriedma
niravd
paquette

Commits

rG258a425c69f0: [ScheduleDAGRRList] Recompute topological ordering on demand.
rL358583: [ScheduleDAGRRList] Recompute topological ordering on demand.

Summary

Currently there is a single point in ScheduleDAGRRList, where we
actually query the topological order (besides init code). Currently we
are recomputing the order after adding a node (which does not have
predecessors) and then we add predecessors edge-by-edge.

We can avoid adding edges one-by-one after we added a new node. In that case, we can
just rebuild the order from scratch after adding the edges to the DAG
and avoid all the updates to the ordering.

Also, we can delay updating the DAG until we query the DAG, if we keep a
list of added edges. Depending on the number of updates, we can either
apply them when needed or recompute the order from scratch.

This brings down the geomean compile time for of CTMark with -O1 down 0.2% on X86,
with no regressions.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 30062
Build 30061: arc lint + arc unit

Event Timeline

fhahn created this revision.Apr 2 2019, 6:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2019, 6:23 AM

Herald added subscribers: jdoerfert, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B29945: Diff 193275.Apr 2 2019, 6:25 AM

fhahn mentioned this in D59722: [ScheduleDAG] Avoid unnecessary recomputation of topological order..Apr 2 2019, 6:26 AM

efriedma added inline comments.Apr 2 2019, 12:03 PM

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1489	Where exactly do we query the topological order relative to this call to fixOrder()? Do we need to fixOrder() inside the loop?

fhahn marked an inline comment as done.Apr 2 2019, 12:37 PM

fhahn added inline comments.

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1489	We only query it in the loop below (WillCreateCycle call). In case we add new predecessors, we also exit the loop. And we only enter the loop, if there are some interferences. Maybe it's worth adding a comment here?

efriedma added inline comments.Apr 2 2019, 12:44 PM

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1489	Probably worth adding a comment, yes. By the way, how frequently do we hit this case in practice?

Add comment and make even lazier.

Harbormaster completed remote builds in B29996: Diff 193451.Apr 3 2019, 2:21 AM

fhahn mentioned this in D60187: [ScheduleDAG] Add statistics for maintaining the topological order..Apr 3 2019, 2:29 AM

fhahn marked 2 inline comments as done.Apr 3 2019, 2:46 AM

fhahn added inline comments.

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
1489	By the way, how frequently do we hit this case in practice? I haven't looked at this call here in isolation, but I gather statistics on the impact of the number of InitDAGTopologicalSorting and AddPred calls. Number of InitDAGTopologicalSorting calls for CTMark, X86, -O1 patch base CTMark/kimwitu++/kc.test 25671.0 46740.0 CTMark/Bullet/bullet.test 16228.0 36873.0 CTMark/mafft/pairlocalalign.test 14290.0 28695.0 CTMark/lencod/lencod.test 19006.0 37880.0 CTMark/sqlite3/sqlite3.test 22447.0 49409.0 CTMark/ClamAV/clamscan.test 23087.0. 48369.0 CTMark/7zip/7zip-benchmark.test 41045.0 98737.0 CTMark/tramp3d-v4/tramp3d-v4.test 22057.0 49214.0 CTMark/SPASS/SPASS.test 19596.0 40652.0 CTMark/consumer-typeset/consumer-typeset.test 16567.0 33208.0 Number of AddPred calls for CTMark, X86, -O1 patch base CTMark/kimwitu++/kc.test 2645.0 2702.0 CTMark/Bullet/bullet.test 3619.0 3643.0 CTMark/mafft/pairlocalalign.test 3427.0 3455.0 CTMark/lencod/lencod.test 3902.0 4108.0 CTMark/sqlite3/sqlite3.test 3147.0 3179.0 CTMark/ClamAV/clamscan.test 3764.0 3821.0 CTMark/7zip/7zip-benchmark.test 7308.0 7393.0 CTMark/tramp3d-v4/tramp3d-v4.test 4311.0 4315.0 CTMark/SPASS/SPASS.test 4512.0 4577.0 CTMark/consumer-typeset/consumer-typeset.test 3695.0 3734.0 Also, the latest version of the patch improves compile-time on CTMark X86, -O1 a bit more: negative diff means a reduction in compile time. Program diff test-suite...ark/tramp3d-v4/tramp3d-v4.test -0.6% test-suite :: CTMark/Bullet/bullet.test -0.4% test-suite...:: CTMark/sqlite3/sqlite3.test -0.4% test-suite :: CTMark/kimwitu++/kc.test -0.4% test-suite...TMark/7zip/7zip-benchmark.test -0.3% test-suite :: CTMark/lencod/lencod.test -0.3% test-suite :: CTMark/SPASS/SPASS.test -0.3% test-suite...-typeset/consumer-typeset.test -0.3% test-suite...:: CTMark/ClamAV/clamscan.test -0.2% test-suite...Mark/mafft/pairlocalalign.test 0.0% Geomean difference -0.3%

efriedma added inline comments.Apr 3 2019, 1:13 PM

llvm/lib/CodeGen/ScheduleDAG.cpp
717	How terrible would it be to just call fixOrder from here, instead of making the callers check? That makes it impossible for callers to mess up, and the check should be cheap. I guess it also might be possible to add some cheap checks here to avoid calling fixOrder; for example, if TargetSU has no successors.

fhahn marked 2 inline comments as done.Apr 3 2019, 1:23 PM

fhahn added inline comments.

llvm/lib/CodeGen/ScheduleDAG.cpp
717	I guess the impact would not be too big, in fact that was what I initially had. I'll measure it and get back. I think it would be OK to push responsibility to the caller in a way and the assertion should catch any errors (the message should probably point to fixOrder()).

Fix order whenever it is queried.

Harbormaster completed remote builds in B30062: Diff 193737.Apr 4 2019, 9:53 AM

fhahn added inline comments.Apr 4 2019, 9:58 AM

llvm/lib/CodeGen/ScheduleDAG.cpp
717	I gathered numbers with having FixOrder() in IsReachable & WillCreateCycle. The numbers shifted a bit, but we still have roughly the same overall gain. It seems in some cases one approach is slightly better and for some cases the other one. I guess we should go for the safer one for now?

LGTM

This revision is now accepted and ready to land.Apr 4 2019, 11:57 AM

fhahn mentioned this in rL358058: [ScheduleDAG] Add statistics for maintaining the topological order..Apr 10 2019, 2:01 AM

fhahn mentioned this in rG83443c9a9ec7: [ScheduleDAG] Add statistics for maintaining the topological order..Apr 10 2019, 2:06 AM

Closed by commit rL358583: [ScheduleDAGRRList] Recompute topological ordering on demand. (authored by fhahn). · Explain WhyApr 17 2019, 8:03 AM

This revision was automatically updated to reflect the committed changes.

fhahn mentioned this in D60839: [ScheduleDAGInstrs] Compute topological ordering on demand..Apr 17 2019, 2:51 PM

fhahn mentioned this in rL361253: [ScheduleDAGInstrs] Compute topological ordering on demand..May 21 2019, 6:02 AM

fhahn mentioned this in rGf9b28e53c7d1: [ScheduleDAGInstrs] Compute topological ordering on demand..

sidorovd mentioned this in rG29f5deac5337: [ScheduleDAGInstrs] Compute topological ordering on demand..May 30 2019, 10:50 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

ScheduleDAG.h

19 lines

lib/

CodeGen/

ScheduleDAG.cpp

32 lines

SelectionDAG/

ScheduleDAGRRList.cpp

60 lines

Diff 193737

llvm/include/llvm/CodeGen/ScheduleDAG.h

Show First 20 Lines • Show All 685 Lines • ▼ Show 20 Lines	#endif
/// methods for dynamically updating the ordering as new edges are added.		/// methods for dynamically updating the ordering as new edges are added.
///		///
/// This allows a very fast implementation of IsReachable, for example.		/// This allows a very fast implementation of IsReachable, for example.
class ScheduleDAGTopologicalSort {		class ScheduleDAGTopologicalSort {
/// A reference to the ScheduleDAG's SUnits.		/// A reference to the ScheduleDAG's SUnits.
std::vector<SUnit> &SUnits;		std::vector<SUnit> &SUnits;
SUnit *ExitSU;		SUnit *ExitSU;

		// Have any new nodes been added?
		bool Dirty = false;

		// Outstanding added edges, that have not been applied to the ordering.
		SmallVector<std::pair<SUnit , SUnit >, 16> Updates;

/// Maps topological index to the node number.		/// Maps topological index to the node number.
std::vector<int> Index2Node;		std::vector<int> Index2Node;
/// Maps the node number to its topological index.		/// Maps the node number to its topological index.
std::vector<int> Node2Index;		std::vector<int> Node2Index;
/// a set of nodes visited during a DFS traversal.		/// a set of nodes visited during a DFS traversal.
BitVector Visited;		BitVector Visited;

/// Makes a DFS traversal and mark all nodes affected by the edge insertion.		/// Makes a DFS traversal and mark all nodes affected by the edge insertion.
/// These nodes will later get new topological indexes by means of the Shift		/// These nodes will later get new topological indexes by means of the Shift
/// method.		/// method.
void DFS(const SUnit *SU, int UpperBound, bool& HasLoop);		void DFS(const SUnit *SU, int UpperBound, bool& HasLoop);

/// Reassigns topological indexes for the nodes in the DAG to		/// Reassigns topological indexes for the nodes in the DAG to
/// preserve the topological ordering.		/// preserve the topological ordering.
void Shift(BitVector& Visited, int LowerBound, int UpperBound);		void Shift(BitVector& Visited, int LowerBound, int UpperBound);

/// Assigns the topological index to the node n.		/// Assigns the topological index to the node n.
void Allocate(int n, int index);		void Allocate(int n, int index);

		/// Fix the ordering, by either recomputing from scratch or by applying
		/// any outstanding updates. Uses a heuristic to estimate what will be
		/// cheaper.
		void FixOrder();

public:		public:
ScheduleDAGTopologicalSort(std::vector<SUnit> &SUnits, SUnit *ExitSU);		ScheduleDAGTopologicalSort(std::vector<SUnit> &SUnits, SUnit *ExitSU);

/// Creates the initial topological ordering from the DAG to be scheduled.		/// Creates the initial topological ordering from the DAG to be scheduled.
void InitDAGTopologicalSorting();		void InitDAGTopologicalSorting();

/// Returns an array of SUs that are both in the successor		/// Returns an array of SUs that are both in the successor
/// subtree of StartSU and in the predecessor subtree of TargetSU.		/// subtree of StartSU and in the predecessor subtree of TargetSU.
/// StartSU and TargetSU are not in the array.		/// StartSU and TargetSU are not in the array.
/// Success is false if TargetSU is not in the successor subtree of		/// Success is false if TargetSU is not in the successor subtree of
/// StartSU, else it is true.		/// StartSU, else it is true.
std::vector<int> GetSubGraph(const SUnit &StartSU, const SUnit &TargetSU,		std::vector<int> GetSubGraph(const SUnit &StartSU, const SUnit &TargetSU,
bool &Success);		bool &Success);

/// Checks if \p SU is reachable from \p TargetSU.		/// Checks if \p SU is reachable from \p TargetSU.
bool IsReachable(const SUnit SU, const SUnit TargetSU);		bool IsReachable(const SUnit SU, const SUnit TargetSU);

/// Returns true if addPred(TargetSU, SU) creates a cycle.		/// Returns true if addPred(TargetSU, SU) creates a cycle.
bool WillCreateCycle(SUnit TargetSU, SUnit SU);		bool WillCreateCycle(SUnit TargetSU, SUnit SU);

/// Updates the topological ordering to accommodate an edge to be		/// Updates the topological ordering to accommodate an edge to be
/// added from SUnit \p X to SUnit \p Y.		/// added from SUnit \p X to SUnit \p Y.
void AddPred(SUnit Y, SUnit X);		void AddPred(SUnit Y, SUnit X);

		/// Queues an update to the topological ordering to accommodate an edge to
		/// be added from SUnit \p X to SUnit \p Y.
		void AddPredQueued(SUnit Y, SUnit X);

/// Updates the topological ordering to accommodate an an edge to be		/// Updates the topological ordering to accommodate an an edge to be
/// removed from the specified node \p N from the predecessors of the		/// removed from the specified node \p N from the predecessors of the
/// current node \p M.		/// current node \p M.
void RemovePred(SUnit M, SUnit N);		void RemovePred(SUnit M, SUnit N);

		/// Mark the ordering as temporarily broken, after a new node has been
		/// added.
		void MarkDirty() { Dirty = true; }

typedef std::vector<int>::iterator iterator;		typedef std::vector<int>::iterator iterator;
typedef std::vector<int>::const_iterator const_iterator;		typedef std::vector<int>::const_iterator const_iterator;
iterator begin() { return Index2Node.begin(); }		iterator begin() { return Index2Node.begin(); }
const_iterator begin() const { return Index2Node.begin(); }		const_iterator begin() const { return Index2Node.begin(); }
iterator end() { return Index2Node.end(); }		iterator end() { return Index2Node.end(); }
const_iterator end() const { return Index2Node.end(); }		const_iterator end() const { return Index2Node.end(); }

typedef std::vector<int>::reverse_iterator reverse_iterator;		typedef std::vector<int>::reverse_iterator reverse_iterator;
Show All 10 Lines

llvm/lib/CodeGen/ScheduleDAG.cpp

Show First 20 Lines • Show All 455 Lines • ▼ Show 20 Lines	void ScheduleDAGTopologicalSort::InitDAGTopologicalSorting() {
//		//
// The algorithm first computes a topological ordering for the DAG by		// The algorithm first computes a topological ordering for the DAG by
// initializing the Index2Node and Node2Index arrays and then tries to keep		// initializing the Index2Node and Node2Index arrays and then tries to keep
// the ordering up-to-date after edge insertions by reordering the DAG.		// the ordering up-to-date after edge insertions by reordering the DAG.
//		//
// On insertion of the edge X->Y, the algorithm first marks by calling DFS		// On insertion of the edge X->Y, the algorithm first marks by calling DFS
// the nodes reachable from Y, and then shifts them using Shift to lie		// the nodes reachable from Y, and then shifts them using Shift to lie
// immediately after X in Index2Node.		// immediately after X in Index2Node.

		// Cancel pending updates, mark as valid.
		Dirty = false;
		Updates.clear();

unsigned DAGSize = SUnits.size();		unsigned DAGSize = SUnits.size();
std::vector<SUnit*> WorkList;		std::vector<SUnit*> WorkList;
WorkList.reserve(DAGSize);		WorkList.reserve(DAGSize);

Index2Node.resize(DAGSize);		Index2Node.resize(DAGSize);
Node2Index.resize(DAGSize);		Node2Index.resize(DAGSize);

// Initialize the data structures.		// Initialize the data structures.
Show All 37 Lines	for (SUnit &SU : SUnits) {
for (const SDep &PD : SU.Preds) {		for (const SDep &PD : SU.Preds) {
assert(Node2Index[SU.NodeNum] > Node2Index[PD.getSUnit()->NodeNum] &&		assert(Node2Index[SU.NodeNum] > Node2Index[PD.getSUnit()->NodeNum] &&
"Wrong topological sorting");		"Wrong topological sorting");
}		}
}		}
#endif		#endif
}		}

		void ScheduleDAGTopologicalSort::FixOrder() {
		// Recompute from scratch after new nodes have been added.
		if (Dirty) {
		InitDAGTopologicalSorting();
		return;
		}

		// Otherwise apply updates one-by-one.
		for (auto &U : Updates)
		AddPred(U.first, U.second);
		Updates.clear();
		}

		void ScheduleDAGTopologicalSort::AddPredQueued(SUnit Y, SUnit X) {
		// Recomputing the order from scratch is likely more efficient than applying
		// updates one-by-one for too many updates. The current cut-off is arbitrarily
		// chosen.
		Dirty = Dirty \|\| Updates.size() > 10;

		if (Dirty)
		return;

		Updates.emplace_back(Y, X);
		}

void ScheduleDAGTopologicalSort::AddPred(SUnit Y, SUnit X) {		void ScheduleDAGTopologicalSort::AddPred(SUnit Y, SUnit X) {
int UpperBound, LowerBound;		int UpperBound, LowerBound;
LowerBound = Node2Index[Y->NodeNum];		LowerBound = Node2Index[Y->NodeNum];
UpperBound = Node2Index[X->NodeNum];		UpperBound = Node2Index[X->NodeNum];
bool HasLoop = false;		bool HasLoop = false;
// Is Ord(X) < Ord(Y) ?		// Is Ord(X) < Ord(Y) ?
if (LowerBound < UpperBound) {		if (LowerBound < UpperBound) {
// Update the topological order.		// Update the topological order.
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	void ScheduleDAGTopologicalSort::Shift(BitVector& Visited, int LowerBound,

for (unsigned LI : L) {		for (unsigned LI : L) {
Allocate(LI, i - shift);		Allocate(LI, i - shift);
i = i + 1;		i = i + 1;
}		}
}		}

bool ScheduleDAGTopologicalSort::WillCreateCycle(SUnit TargetSU, SUnit SU) {		bool ScheduleDAGTopologicalSort::WillCreateCycle(SUnit TargetSU, SUnit SU) {
		FixOrder();
// Is SU reachable from TargetSU via successor edges?		// Is SU reachable from TargetSU via successor edges?
if (IsReachable(SU, TargetSU))		if (IsReachable(SU, TargetSU))
return true;		return true;
for (const SDep &PredDep : TargetSU->Preds)		for (const SDep &PredDep : TargetSU->Preds)
if (PredDep.isAssignedRegDep() &&		if (PredDep.isAssignedRegDep() &&
IsReachable(SU, PredDep.getSUnit()))		IsReachable(SU, PredDep.getSUnit()))
return true;		return true;
return false;		return false;
}		}

bool ScheduleDAGTopologicalSort::IsReachable(const SUnit *SU,		bool ScheduleDAGTopologicalSort::IsReachable(const SUnit *SU,
const SUnit *TargetSU) {		const SUnit *TargetSU) {
		FixOrder();
		efriedmaUnsubmitted Not Done Reply Inline Actions How terrible would it be to just call fixOrder from here, instead of making the callers check? That makes it impossible for callers to mess up, and the check should be cheap. I guess it also might be possible to add some cheap checks here to avoid calling fixOrder; for example, if TargetSU has no successors. efriedma: How terrible would it be to just call fixOrder from here, instead of making the callers check?
		fhahnAuthorUnsubmitted Done Reply Inline Actions I guess the impact would not be too big, in fact that was what I initially had. I'll measure it and get back. I think it would be OK to push responsibility to the caller in a way and the assertion should catch any errors (the message should probably point to fixOrder()). fhahn: I guess the impact would not be too big, in fact that was what I initially had. I'll measure it…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I gathered numbers with having FixOrder() in IsReachable & WillCreateCycle. The numbers shifted a bit, but we still have roughly the same overall gain. It seems in some cases one approach is slightly better and for some cases the other one. I guess we should go for the safer one for now? fhahn: I gathered numbers with having FixOrder() in IsReachable & WillCreateCycle. The numbers shifted…
// If insertion of the edge SU->TargetSU would create a cycle		// If insertion of the edge SU->TargetSU would create a cycle
// then there is a path from TargetSU to SU.		// then there is a path from TargetSU to SU.
int UpperBound, LowerBound;		int UpperBound, LowerBound;
LowerBound = Node2Index[TargetSU->NodeNum];		LowerBound = Node2Index[TargetSU->NodeNum];
UpperBound = Node2Index[SU->NodeNum];		UpperBound = Node2Index[SU->NodeNum];
bool HasLoop = false;		bool HasLoop = false;
// Is Ord(TargetSU) < Ord(SU) ?		// Is Ord(TargetSU) < Ord(SU) ?
if (LowerBound < UpperBound) {		if (LowerBound < UpperBound) {
Show All 17 Lines

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	public:
}		}

/// WillCreateCycle - Returns true if adding an edge from SU to TargetSU will		/// WillCreateCycle - Returns true if adding an edge from SU to TargetSU will
/// create a cycle.		/// create a cycle.
bool WillCreateCycle(SUnit SU, SUnit TargetSU) {		bool WillCreateCycle(SUnit SU, SUnit TargetSU) {
return Topo.WillCreateCycle(SU, TargetSU);		return Topo.WillCreateCycle(SU, TargetSU);
}		}

		/// AddPredQueued - Queues and update to add a predecessor edge to SUnit SU.
		/// This returns true if this is a new predecessor.
		/// Does NOT update the topological ordering! It just queues an update.
		void AddPredQueued(SUnit *SU, const SDep &D) {
		Topo.AddPredQueued(SU, D.getSUnit());
		SU->addPred(D);
		}

/// AddPred - adds a predecessor edge to SUnit SU.		/// AddPred - adds a predecessor edge to SUnit SU.
/// This returns true if this is a new predecessor.		/// This returns true if this is a new predecessor.
/// Updates the topological ordering if required.		/// Updates the topological ordering if required.
void AddPred(SUnit *SU, const SDep &D) {		void AddPred(SUnit *SU, const SDep &D) {
Topo.AddPred(SU, D.getSUnit());		Topo.AddPred(SU, D.getSUnit());
SU->addPred(D);		SU->addPred(D);
}		}

Show All 31 Lines	private:
bool DelayForLiveRegsBottomUp(SUnit*, SmallVectorImpl<unsigned>&);		bool DelayForLiveRegsBottomUp(SUnit*, SmallVectorImpl<unsigned>&);

void releaseInterferences(unsigned Reg = 0);		void releaseInterferences(unsigned Reg = 0);

SUnit *PickNodeToScheduleBottomUp();		SUnit *PickNodeToScheduleBottomUp();
void ListScheduleBottomUp();		void ListScheduleBottomUp();

/// CreateNewSUnit - Creates a new SUnit and returns a pointer to it.		/// CreateNewSUnit - Creates a new SUnit and returns a pointer to it.
/// Updates the topological ordering if required.
SUnit CreateNewSUnit(SDNode N) {		SUnit CreateNewSUnit(SDNode N) {
unsigned NumSUnits = SUnits.size();		unsigned NumSUnits = SUnits.size();
SUnit *NewNode = newSUnit(N);		SUnit *NewNode = newSUnit(N);
// Update the topological ordering.		// Update the topological ordering.
if (NewNode->NodeNum >= NumSUnits)		if (NewNode->NodeNum >= NumSUnits)
Topo.InitDAGTopologicalSorting();		Topo.MarkDirty();
return NewNode;		return NewNode;
}		}

/// CreateClone - Creates a new SUnit from an existing one.		/// CreateClone - Creates a new SUnit from an existing one.
/// Updates the topological ordering if required.
SUnit CreateClone(SUnit N) {		SUnit CreateClone(SUnit N) {
unsigned NumSUnits = SUnits.size();		unsigned NumSUnits = SUnits.size();
SUnit *NewNode = Clone(N);		SUnit *NewNode = Clone(N);
// Update the topological ordering.		// Update the topological ordering.
if (NewNode->NodeNum >= NumSUnits)		if (NewNode->NodeNum >= NumSUnits)
Topo.InitDAGTopologicalSorting();		Topo.MarkDirty();
return NewNode;		return NewNode;
}		}

/// forceUnitLatencies - Register-pressure-reducing scheduling doesn't		/// forceUnitLatencies - Register-pressure-reducing scheduling doesn't
/// need actual latency information but the hybrid scheduler does.		/// need actual latency information but the hybrid scheduler does.
bool forceUnitLatencies() const override {		bool forceUnitLatencies() const override {
return !NeedLatency;		return !NeedLatency;
}		}
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	void ScheduleDAGRRList::Schedule() {
LiveRegGens.reset(new SUnit*[TRI->getNumRegs() + 1]());		LiveRegGens.reset(new SUnit*[TRI->getNumRegs() + 1]());
CallSeqEndForStart.clear();		CallSeqEndForStart.clear();
assert(Interferences.empty() && LRegsMap.empty() && "stale Interferences");		assert(Interferences.empty() && LRegsMap.empty() && "stale Interferences");

// Build the scheduling graph.		// Build the scheduling graph.
BuildSchedGraph(nullptr);		BuildSchedGraph(nullptr);

LLVM_DEBUG(dump());		LLVM_DEBUG(dump());
Topo.InitDAGTopologicalSorting();		Topo.MarkDirty();

AvailableQueue->initNodes(SUnits);		AvailableQueue->initNodes(SUnits);

HazardRec->Reset();		HazardRec->Reset();

// Execute the actual scheduling loop.		// Execute the actual scheduling loop.
ListScheduleBottomUp();		ListScheduleBottomUp();

▲ Show 20 Lines • Show All 635 Lines • ▼ Show 20 Lines	SUnit ScheduleDAGRRList::TryUnfoldSU(SUnit SU) {

bool isNewN = true;		bool isNewN = true;
SUnit *NewSU;		SUnit *NewSU;
// This can only happen when isNewLoad is false.		// This can only happen when isNewLoad is false.
if (N->getNodeId() != -1) {		if (N->getNodeId() != -1) {
NewSU = &SUnits[N->getNodeId()];		NewSU = &SUnits[N->getNodeId()];
// If NewSU has already been scheduled, we need to clone it, but this		// If NewSU has already been scheduled, we need to clone it, but this
// negates the benefit to unfolding so just return SU.		// negates the benefit to unfolding so just return SU.
if (NewSU->isScheduled)		if (NewSU->isScheduled) {
return SU;		return SU;
		}
isNewN = false;		isNewN = false;
} else {		} else {
NewSU = CreateNewSUnit(N);		NewSU = CreateNewSUnit(N);
N->setNodeId(NewSU->NodeNum);		N->setNodeId(NewSU->NodeNum);

const MCInstrDesc &MCID = TII->get(N->getMachineOpcode());		const MCInstrDesc &MCID = TII->get(N->getMachineOpcode());
for (unsigned i = 0; i != MCID.getNumOperands(); ++i) {		for (unsigned i = 0; i != MCID.getNumOperands(); ++i) {
if (MCID.getOperandConstraint(i, MCOI::TIED_TO) != -1) {		if (MCID.getOperandConstraint(i, MCOI::TIED_TO) != -1) {
Show All 36 Lines	for (SDep &Succ : SU->Succs) {
else		else
NodeSuccs.push_back(Succ);		NodeSuccs.push_back(Succ);
}		}

// Now assign edges to the newly-created nodes.		// Now assign edges to the newly-created nodes.
for (const SDep &Pred : ChainPreds) {		for (const SDep &Pred : ChainPreds) {
RemovePred(SU, Pred);		RemovePred(SU, Pred);
if (isNewLoad)		if (isNewLoad)
AddPred(LoadSU, Pred);		AddPredQueued(LoadSU, Pred);
}		}
for (const SDep &Pred : LoadPreds) {		for (const SDep &Pred : LoadPreds) {
RemovePred(SU, Pred);		RemovePred(SU, Pred);
if (isNewLoad)		if (isNewLoad)
AddPred(LoadSU, Pred);		AddPredQueued(LoadSU, Pred);
}		}
for (const SDep &Pred : NodePreds) {		for (const SDep &Pred : NodePreds) {
RemovePred(SU, Pred);		RemovePred(SU, Pred);
AddPred(NewSU, Pred);		AddPredQueued(NewSU, Pred);
}		}
for (SDep D : NodeSuccs) {		for (SDep D : NodeSuccs) {
SUnit *SuccDep = D.getSUnit();		SUnit *SuccDep = D.getSUnit();
D.setSUnit(SU);		D.setSUnit(SU);
RemovePred(SuccDep, D);		RemovePred(SuccDep, D);
D.setSUnit(NewSU);		D.setSUnit(NewSU);
AddPred(SuccDep, D);		AddPredQueued(SuccDep, D);
// Balance register pressure.		// Balance register pressure.
if (AvailableQueue->tracksRegPressure() && SuccDep->isScheduled &&		if (AvailableQueue->tracksRegPressure() && SuccDep->isScheduled &&
!D.isCtrl() && NewSU->NumRegDefsLeft > 0)		!D.isCtrl() && NewSU->NumRegDefsLeft > 0)
--NewSU->NumRegDefsLeft;		--NewSU->NumRegDefsLeft;
}		}
for (SDep D : ChainSuccs) {		for (SDep D : ChainSuccs) {
SUnit *SuccDep = D.getSUnit();		SUnit *SuccDep = D.getSUnit();
D.setSUnit(SU);		D.setSUnit(SU);
RemovePred(SuccDep, D);		RemovePred(SuccDep, D);
if (isNewLoad) {		if (isNewLoad) {
D.setSUnit(LoadSU);		D.setSUnit(LoadSU);
AddPred(SuccDep, D);		AddPredQueued(SuccDep, D);
}		}
}		}

// Add a data dependency to reflect that NewSU reads the value defined		// Add a data dependency to reflect that NewSU reads the value defined
// by LoadSU.		// by LoadSU.
SDep D(LoadSU, SDep::Data, 0);		SDep D(LoadSU, SDep::Data, 0);
D.setLatency(LoadSU->Latency);		D.setLatency(LoadSU->Latency);
AddPred(NewSU, D);		AddPredQueued(NewSU, D);

if (isNewLoad)		if (isNewLoad)
AvailableQueue->addNode(LoadSU);		AvailableQueue->addNode(LoadSU);
if (isNewN)		if (isNewN)
AvailableQueue->addNode(NewSU);		AvailableQueue->addNode(NewSU);

++NumUnfolds;		++NumUnfolds;

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	SUnit ScheduleDAGRRList::CopyAndMoveSuccessors(SUnit SU) {
}		}

LLVM_DEBUG(dbgs() << " Duplicating SU #" << SU->NodeNum << "\n");		LLVM_DEBUG(dbgs() << " Duplicating SU #" << SU->NodeNum << "\n");
NewSU = CreateClone(SU);		NewSU = CreateClone(SU);

// New SUnit has the exact same predecessors.		// New SUnit has the exact same predecessors.
for (SDep &Pred : SU->Preds)		for (SDep &Pred : SU->Preds)
if (!Pred.isArtificial())		if (!Pred.isArtificial())
AddPred(NewSU, Pred);		AddPredQueued(NewSU, Pred);

// Only copy scheduled successors. Cut them from old node's successor		// Only copy scheduled successors. Cut them from old node's successor
// list and move them over.		// list and move them over.
SmallVector<std::pair<SUnit *, SDep>, 4> DelDeps;		SmallVector<std::pair<SUnit *, SDep>, 4> DelDeps;
for (SDep &Succ : SU->Succs) {		for (SDep &Succ : SU->Succs) {
if (Succ.isArtificial())		if (Succ.isArtificial())
continue;		continue;
SUnit *SuccSU = Succ.getSUnit();		SUnit *SuccSU = Succ.getSUnit();
if (SuccSU->isScheduled) {		if (SuccSU->isScheduled) {
SDep D = Succ;		SDep D = Succ;
D.setSUnit(NewSU);		D.setSUnit(NewSU);
AddPred(SuccSU, D);		AddPredQueued(SuccSU, D);
D.setSUnit(SU);		D.setSUnit(SU);
DelDeps.push_back(std::make_pair(SuccSU, D));		DelDeps.push_back(std::make_pair(SuccSU, D));
}		}
}		}
for (auto &DelDep : DelDeps)		for (auto &DelDep : DelDeps)
RemovePred(DelDep.first, DelDep.second);		RemovePred(DelDep.first, DelDep.second);

AvailableQueue->updateNode(SU);		AvailableQueue->updateNode(SU);
Show All 22 Lines	void ScheduleDAGRRList::InsertCopiesAndMoveSuccs(SUnit *SU, unsigned Reg,
SmallVector<std::pair<SUnit *, SDep>, 4> DelDeps;		SmallVector<std::pair<SUnit *, SDep>, 4> DelDeps;
for (SDep &Succ : SU->Succs) {		for (SDep &Succ : SU->Succs) {
if (Succ.isArtificial())		if (Succ.isArtificial())
continue;		continue;
SUnit *SuccSU = Succ.getSUnit();		SUnit *SuccSU = Succ.getSUnit();
if (SuccSU->isScheduled) {		if (SuccSU->isScheduled) {
SDep D = Succ;		SDep D = Succ;
D.setSUnit(CopyToSU);		D.setSUnit(CopyToSU);
AddPred(SuccSU, D);		AddPredQueued(SuccSU, D);
DelDeps.push_back(std::make_pair(SuccSU, Succ));		DelDeps.push_back(std::make_pair(SuccSU, Succ));
}		}
else {		else {
// Avoid scheduling the def-side copy before other successors. Otherwise		// Avoid scheduling the def-side copy before other successors. Otherwise
// we could introduce another physreg interference on the copy and		// we could introduce another physreg interference on the copy and
// continue inserting copies indefinitely.		// continue inserting copies indefinitely.
AddPred(SuccSU, SDep(CopyFromSU, SDep::Artificial));		AddPredQueued(SuccSU, SDep(CopyFromSU, SDep::Artificial));
}		}
}		}
for (auto &DelDep : DelDeps)		for (auto &DelDep : DelDeps)
RemovePred(DelDep.first, DelDep.second);		RemovePred(DelDep.first, DelDep.second);

SDep FromDep(SU, SDep::Data, Reg);		SDep FromDep(SU, SDep::Data, Reg);
FromDep.setLatency(SU->Latency);		FromDep.setLatency(SU->Latency);
AddPred(CopyFromSU, FromDep);		AddPredQueued(CopyFromSU, FromDep);
SDep ToDep(CopyFromSU, SDep::Data, 0);		SDep ToDep(CopyFromSU, SDep::Data, 0);
ToDep.setLatency(CopyFromSU->Latency);		ToDep.setLatency(CopyFromSU->Latency);
AddPred(CopyToSU, ToDep);		AddPredQueued(CopyToSU, ToDep);

AvailableQueue->updateNode(SU);		AvailableQueue->updateNode(SU);
AvailableQueue->addNode(CopyFromSU);		AvailableQueue->addNode(CopyFromSU);
AvailableQueue->addNode(CopyToSU);		AvailableQueue->addNode(CopyToSU);
Copies.push_back(CopyFromSU);		Copies.push_back(CopyFromSU);
Copies.push_back(CopyToSU);		Copies.push_back(CopyToSU);

++NumPRCopies;		++NumPRCopies;
▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	while (CurSU) {
}		}
CurSU = AvailableQueue->pop();		CurSU = AvailableQueue->pop();
}		}
};		};
FindAvailableNode();		FindAvailableNode();
if (CurSU)		if (CurSU)
return CurSU;		return CurSU;

		// We query the topological order in the loop body, so make sure outstanding
		// updates are applied before entering it (we only enter the loop if there
		efriedmaUnsubmitted Not Done Reply Inline Actions Where exactly do we query the topological order relative to this call to fixOrder()? Do we need to fixOrder() inside the loop? efriedma: Where exactly do we query the topological order relative to this call to fixOrder()? Do we…
		fhahnAuthorUnsubmitted Done Reply Inline Actions We only query it in the loop below (WillCreateCycle call). In case we add new predecessors, we also exit the loop. And we only enter the loop, if there are some interferences. Maybe it's worth adding a comment here? fhahn: We only query it in the loop below (WillCreateCycle call). In case we add new predecessors, we…
		efriedmaUnsubmitted Done Reply Inline Actions Probably worth adding a comment, yes. By the way, how frequently do we hit this case in practice? efriedma: Probably worth adding a comment, yes. By the way, how frequently do we hit this case in…
		fhahnAuthorUnsubmitted Done Reply Inline Actions By the way, how frequently do we hit this case in practice? I haven't looked at this call here in isolation, but I gather statistics on the impact of the number of InitDAGTopologicalSorting and AddPred calls. Number of InitDAGTopologicalSorting calls for CTMark, X86, -O1 patch base CTMark/kimwitu++/kc.test 25671.0 46740.0 CTMark/Bullet/bullet.test 16228.0 36873.0 CTMark/mafft/pairlocalalign.test 14290.0 28695.0 CTMark/lencod/lencod.test 19006.0 37880.0 CTMark/sqlite3/sqlite3.test 22447.0 49409.0 CTMark/ClamAV/clamscan.test 23087.0. 48369.0 CTMark/7zip/7zip-benchmark.test 41045.0 98737.0 CTMark/tramp3d-v4/tramp3d-v4.test 22057.0 49214.0 CTMark/SPASS/SPASS.test 19596.0 40652.0 CTMark/consumer-typeset/consumer-typeset.test 16567.0 33208.0 Number of AddPred calls for CTMark, X86, -O1 patch base CTMark/kimwitu++/kc.test 2645.0 2702.0 CTMark/Bullet/bullet.test 3619.0 3643.0 CTMark/mafft/pairlocalalign.test 3427.0 3455.0 CTMark/lencod/lencod.test 3902.0 4108.0 CTMark/sqlite3/sqlite3.test 3147.0 3179.0 CTMark/ClamAV/clamscan.test 3764.0 3821.0 CTMark/7zip/7zip-benchmark.test 7308.0 7393.0 CTMark/tramp3d-v4/tramp3d-v4.test 4311.0 4315.0 CTMark/SPASS/SPASS.test 4512.0 4577.0 CTMark/consumer-typeset/consumer-typeset.test 3695.0 3734.0 Also, the latest version of the patch improves compile-time on CTMark X86, -O1 a bit more: negative diff means a reduction in compile time. Program diff test-suite...ark/tramp3d-v4/tramp3d-v4.test -0.6% test-suite :: CTMark/Bullet/bullet.test -0.4% test-suite...:: CTMark/sqlite3/sqlite3.test -0.4% test-suite :: CTMark/kimwitu++/kc.test -0.4% test-suite...TMark/7zip/7zip-benchmark.test -0.3% test-suite :: CTMark/lencod/lencod.test -0.3% test-suite :: CTMark/SPASS/SPASS.test -0.3% test-suite...-typeset/consumer-typeset.test -0.3% test-suite...:: CTMark/ClamAV/clamscan.test -0.2% test-suite...Mark/mafft/pairlocalalign.test 0.0% Geomean difference -0.3% fhahn: > By the way, how frequently do we hit this case in practice? I haven't looked at this call…
		// are some interferences). If we make changes to the ordering, we exit
		// the loop.

// All candidates are delayed due to live physical reg dependencies.		// All candidates are delayed due to live physical reg dependencies.
// Try backtracking, code duplication, or inserting cross class copies		// Try backtracking, code duplication, or inserting cross class copies
// to resolve it.		// to resolve it.
for (SUnit *TrySU : Interferences) {		for (SUnit *TrySU : Interferences) {
SmallVectorImpl<unsigned> &LRegs = LRegsMap[TrySU];		SmallVectorImpl<unsigned> &LRegs = LRegsMap[TrySU];

// Try unscheduling up to the point where it's safe to schedule		// Try unscheduling up to the point where it's safe to schedule
// this node.		// this node.
Show All 13 Lines	if (!WillCreateCycle(TrySU, BtSU)) {
// requires the physical reg dep.		// requires the physical reg dep.
if (BtSU->isAvailable) {		if (BtSU->isAvailable) {
BtSU->isAvailable = false;		BtSU->isAvailable = false;
if (!BtSU->isPending)		if (!BtSU->isPending)
AvailableQueue->remove(BtSU);		AvailableQueue->remove(BtSU);
}		}
LLVM_DEBUG(dbgs() << "ARTIFICIAL edge from SU(" << BtSU->NodeNum		LLVM_DEBUG(dbgs() << "ARTIFICIAL edge from SU(" << BtSU->NodeNum
<< ") to SU(" << TrySU->NodeNum << ")\n");		<< ") to SU(" << TrySU->NodeNum << ")\n");
AddPred(TrySU, SDep(BtSU, SDep::Artificial));		AddPredQueued(TrySU, SDep(BtSU, SDep::Artificial));

// If one or more successors has been unscheduled, then the current		// If one or more successors has been unscheduled, then the current
// node is no longer available.		// node is no longer available.
if (!TrySU->isAvailable \|\| !TrySU->NodeQueueId) {		if (!TrySU->isAvailable \|\| !TrySU->NodeQueueId) {
LLVM_DEBUG(dbgs() << "TrySU not available; choosing node from queue\n");		LLVM_DEBUG(dbgs() << "TrySU not available; choosing node from queue\n");
CurSU = AvailableQueue->pop();		CurSU = AvailableQueue->pop();
} else {		} else {
LLVM_DEBUG(dbgs() << "TrySU available\n");		LLVM_DEBUG(dbgs() << "TrySU available\n");
Show All 37 Lines	if (DestRC != RC) {
report_fatal_error("Can't handle live physical register dependency!");		report_fatal_error("Can't handle live physical register dependency!");
}		}
if (!NewDef) {		if (!NewDef) {
// Issue copies, these can be expensive cross register class copies.		// Issue copies, these can be expensive cross register class copies.
SmallVector<SUnit*, 2> Copies;		SmallVector<SUnit*, 2> Copies;
InsertCopiesAndMoveSuccs(LRDef, Reg, DestRC, RC, Copies);		InsertCopiesAndMoveSuccs(LRDef, Reg, DestRC, RC, Copies);
LLVM_DEBUG(dbgs() << " Adding an edge from SU #" << TrySU->NodeNum		LLVM_DEBUG(dbgs() << " Adding an edge from SU #" << TrySU->NodeNum
<< " to SU #" << Copies.front()->NodeNum << "\n");		<< " to SU #" << Copies.front()->NodeNum << "\n");
AddPred(TrySU, SDep(Copies.front(), SDep::Artificial));		AddPredQueued(TrySU, SDep(Copies.front(), SDep::Artificial));
NewDef = Copies.back();		NewDef = Copies.back();
}		}

LLVM_DEBUG(dbgs() << " Adding an edge from SU #" << NewDef->NodeNum		LLVM_DEBUG(dbgs() << " Adding an edge from SU #" << NewDef->NodeNum
<< " to SU #" << TrySU->NodeNum << "\n");		<< " to SU #" << TrySU->NodeNum << "\n");
LiveRegDefs[Reg] = NewDef;		LiveRegDefs[Reg] = NewDef;
AddPred(NewDef, SDep(TrySU, SDep::Artificial));		AddPredQueued(NewDef, SDep(TrySU, SDep::Artificial));
TrySU->isAvailable = false;		TrySU->isAvailable = false;
CurSU = NewDef;		CurSU = NewDef;
}		}
assert(CurSU && "Unable to resolve live physical register dependencies!");		assert(CurSU && "Unable to resolve live physical register dependencies!");
return CurSU;		return CurSU;
}		}

/// ListScheduleBottomUp - The main loop of list scheduling for bottom-up		/// ListScheduleBottomUp - The main loop of list scheduling for bottom-up
▲ Show 20 Lines • Show All 1,432 Lines • ▼ Show 20 Lines	LLVM_DEBUG(
<< " to guide scheduling in the presence of multiple uses\n");		<< " to guide scheduling in the presence of multiple uses\n");
for (unsigned i = 0; i != PredSU->Succs.size(); ++i) {		for (unsigned i = 0; i != PredSU->Succs.size(); ++i) {
SDep Edge = PredSU->Succs[i];		SDep Edge = PredSU->Succs[i];
assert(!Edge.isAssignedRegDep());		assert(!Edge.isAssignedRegDep());
SUnit *SuccSU = Edge.getSUnit();		SUnit *SuccSU = Edge.getSUnit();
if (SuccSU != &SU) {		if (SuccSU != &SU) {
Edge.setSUnit(PredSU);		Edge.setSUnit(PredSU);
scheduleDAG->RemovePred(SuccSU, Edge);		scheduleDAG->RemovePred(SuccSU, Edge);
scheduleDAG->AddPred(&SU, Edge);		scheduleDAG->AddPredQueued(&SU, Edge);
Edge.setSUnit(&SU);		Edge.setSUnit(&SU);
scheduleDAG->AddPred(SuccSU, Edge);		scheduleDAG->AddPredQueued(SuccSU, Edge);
--i;		--i;
}		}
}		}
outer_loop_continue:;		outer_loop_continue:;
}		}
}		}

/// AddPseudoTwoAddrDeps - If two nodes share an operand and one of them uses		/// AddPseudoTwoAddrDeps - If two nodes share an operand and one of them uses
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	for (unsigned j = 0; j != NumOps; ++j) {
if (!canClobberReachingPhysRegUse(SuccSU, &SU, scheduleDAG, TII, TRI) &&		if (!canClobberReachingPhysRegUse(SuccSU, &SU, scheduleDAG, TII, TRI) &&
(!canClobber(SuccSU, DUSU) \|\|		(!canClobber(SuccSU, DUSU) \|\|
(isLiveOut && !hasOnlyLiveOutUses(SuccSU)) \|\|		(isLiveOut && !hasOnlyLiveOutUses(SuccSU)) \|\|
(!SU.isCommutable && SuccSU->isCommutable)) &&		(!SU.isCommutable && SuccSU->isCommutable)) &&
!scheduleDAG->IsReachable(SuccSU, &SU)) {		!scheduleDAG->IsReachable(SuccSU, &SU)) {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< " Adding a pseudo-two-addr edge from SU #"		<< " Adding a pseudo-two-addr edge from SU #"
<< SU.NodeNum << " to SU #" << SuccSU->NodeNum << "\n");		<< SU.NodeNum << " to SU #" << SuccSU->NodeNum << "\n");
scheduleDAG->AddPred(&SU, SDep(SuccSU, SDep::Artificial));		scheduleDAG->AddPredQueued(&SU, SDep(SuccSU, SDep::Artificial));
}		}
}		}
}		}
}		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Public Constructor Functions		// Public Constructor Functions
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ScheduleDAGRRList] Recompute topological ordering on demand.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 193737

llvm/include/llvm/CodeGen/ScheduleDAG.h

llvm/lib/CodeGen/ScheduleDAG.cpp

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp

[ScheduleDAGRRList] Recompute topological ordering on demand.
ClosedPublic