This is an archive of the discontinued LLVM Phabricator instance.

[ScheduleDAG] Avoid unnecessary recomputation of topological order.
ClosedPublic

Authored by fhahn on Mar 22 2019, 3:05 PM.

Download Raw Diff

Details

Reviewers

MatzeB
atrick
efriedma
niravd
paquette

Commits

rGec25a71eb7fc: [ScheduleDAG] Avoid unnecessary recomputation of topological order.

Summary

In some cases ScheduleDAGRRList has to add new nodes to resolve problems
with interfering physical registers. When new nodes are added, it
completely re-computes the topological order, which can take a long
time, but is unnecessary. We only add nodes one by one, and initially
they do not have any predecessors. So we can just insert them at the end
of the vector. Later we add predecessors, but the helper function
properly updates the topological order much more efficiently. With this
change, the compile time for the program below drops from 300s to 30s on
my machine.

define i11129 @test1() {
  %L1 = load i11129, i11129* undef
  %B30 = ashr i11129 %L1, %L1
  store i11129 %B30, i11129* undef
  ret i11129 %L1
}

This should be generally beneficial, as we can skip a large amount of
work. Theoretically there are some scenarios where we might not safe
much, e.g. when we add a dependency between the first and last node.
Then we would have to shift all nodes. But we still do not have to spend
the time re-computing the initial order.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Mar 22 2019, 3:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2019, 3:05 PM

Herald added subscribers: jdoerfert, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B29514: Diff 191956.Mar 22 2019, 3:06 PM

LGTM

Out of curiosity, do you have any numbers for how this impacts compile time on, say, CTMark? What percentage of an improvement can we expect from this?

This revision is now accepted and ready to land.Mar 22 2019, 4:22 PM

In D59722#1440346, @paquette wrote:

LGTM

Out of curiosity, do you have any numbers for how this impacts compile time on, say, CTMark? What percentage of an improvement can we expect from this?

I ran CTmark in a few configurations on X86 (-O3 X86, -O0 X86, -O3 + LTO ARM64) and overall it seems neutral: some gains and losses up to 0.5%, but that's also the run-to-run difference on some runs on my machine.

After a bit more benchmarking, I think this patch makes things slightly worse in the general case. I've put up a patch that updates ScheduleDAGRRList to update the topological order on demand, D60125, which gives small, but stable improvements on CTMark. I have to revisit this patch and see how we can deal with extreme cases, without making things worse in the general case.

In D59722#1451350, @fhahn wrote:

After a bit more benchmarking, I think this patch makes things slightly worse in the general case. I've put up a patch that updates ScheduleDAGRRList to update the topological order on demand, D60125, which gives small, but stable improvements on CTMark. I have to revisit this patch and see how we can deal with extreme cases, without making things worse in the general case.

Just benchmark again, it looks like currently there are a few clear improvements and no real regressions: http://llvm-compile-time-tracker.com/compare.php?from=7873376bb36b4f9646fbc26d6da88e2edbf796e4&to=d44cb1460dd6ccad74ea96ce295d804f9e291bf3&stat=instructions

I'll land the patch shortly.

This revision was not accepted when it landed; it landed in state Changes Planned.May 31 2020, 3:43 AM

Closed by commit rGec25a71eb7fc: [ScheduleDAG] Avoid unnecessary recomputation of topological order. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

ScheduleDAG.h

4 lines

lib/

CodeGen/

ScheduleDAG.cpp

8 lines

SelectionDAG/

ScheduleDAGRRList.cpp

4 lines

Diff 267494

llvm/include/llvm/CodeGen/ScheduleDAG.h

Show First 20 Lines • Show All 718 Lines • ▼ Show 20 Lines	class ScheduleDAGTopologicalSort {
/// Fix the ordering, by either recomputing from scratch or by applying		/// Fix the ordering, by either recomputing from scratch or by applying
/// any outstanding updates. Uses a heuristic to estimate what will be		/// any outstanding updates. Uses a heuristic to estimate what will be
/// cheaper.		/// cheaper.
void FixOrder();		void FixOrder();

public:		public:
ScheduleDAGTopologicalSort(std::vector<SUnit> &SUnits, SUnit *ExitSU);		ScheduleDAGTopologicalSort(std::vector<SUnit> &SUnits, SUnit *ExitSU);

		/// Add a SUnit without predecessors to the end of the topological order. It
		/// also must be the first new node added to the DAG.
		void AddSUnitWithoutPredecessors(const SUnit *SU);

/// Creates the initial topological ordering from the DAG to be scheduled.		/// Creates the initial topological ordering from the DAG to be scheduled.
void InitDAGTopologicalSorting();		void InitDAGTopologicalSorting();

/// Returns an array of SUs that are both in the successor		/// Returns an array of SUs that are both in the successor
/// subtree of StartSU and in the predecessor subtree of TargetSU.		/// subtree of StartSU and in the predecessor subtree of TargetSU.
/// StartSU and TargetSU are not in the array.		/// StartSU and TargetSU are not in the array.
/// Success is false if TargetSU is not in the successor subtree of		/// Success is false if TargetSU is not in the successor subtree of
/// StartSU, else it is true.		/// StartSU, else it is true.
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/lib/CodeGen/ScheduleDAG.cpp

Show First 20 Lines • Show All 707 Lines • ▼ Show 20 Lines	if (IsReachable(SU, TargetSU))
return true;		return true;
for (const SDep &PredDep : TargetSU->Preds)		for (const SDep &PredDep : TargetSU->Preds)
if (PredDep.isAssignedRegDep() &&		if (PredDep.isAssignedRegDep() &&
IsReachable(SU, PredDep.getSUnit()))		IsReachable(SU, PredDep.getSUnit()))
return true;		return true;
return false;		return false;
}		}

		void ScheduleDAGTopologicalSort::AddSUnitWithoutPredecessors(const SUnit *SU) {
		assert(SU->NodeNum == Index2Node.size() && "Node cannot be added at the end");
		assert(SU->NumPreds == 0 && "Can only add SU's with no predecessors");
		Node2Index.push_back(Index2Node.size());
		Index2Node.push_back(SU->NodeNum);
		Visited.resize(Node2Index.size());
		}

bool ScheduleDAGTopologicalSort::IsReachable(const SUnit *SU,		bool ScheduleDAGTopologicalSort::IsReachable(const SUnit *SU,
const SUnit *TargetSU) {		const SUnit *TargetSU) {
FixOrder();		FixOrder();
// If insertion of the edge SU->TargetSU would create a cycle		// If insertion of the edge SU->TargetSU would create a cycle
// then there is a path from TargetSU to SU.		// then there is a path from TargetSU to SU.
int UpperBound, LowerBound;		int UpperBound, LowerBound;
LowerBound = Node2Index[TargetSU->NodeNum];		LowerBound = Node2Index[TargetSU->NodeNum];
UpperBound = Node2Index[SU->NodeNum];		UpperBound = Node2Index[SU->NodeNum];
Show All 20 Lines

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp

Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	private:
void ListScheduleBottomUp();		void ListScheduleBottomUp();

/// CreateNewSUnit - Creates a new SUnit and returns a pointer to it.		/// CreateNewSUnit - Creates a new SUnit and returns a pointer to it.
SUnit CreateNewSUnit(SDNode N) {		SUnit CreateNewSUnit(SDNode N) {
unsigned NumSUnits = SUnits.size();		unsigned NumSUnits = SUnits.size();
SUnit *NewNode = newSUnit(N);		SUnit *NewNode = newSUnit(N);
// Update the topological ordering.		// Update the topological ordering.
if (NewNode->NodeNum >= NumSUnits)		if (NewNode->NodeNum >= NumSUnits)
Topo.MarkDirty();		Topo.AddSUnitWithoutPredecessors(NewNode);
return NewNode;		return NewNode;
}		}

/// CreateClone - Creates a new SUnit from an existing one.		/// CreateClone - Creates a new SUnit from an existing one.
SUnit CreateClone(SUnit N) {		SUnit CreateClone(SUnit N) {
unsigned NumSUnits = SUnits.size();		unsigned NumSUnits = SUnits.size();
SUnit *NewNode = Clone(N);		SUnit *NewNode = Clone(N);
// Update the topological ordering.		// Update the topological ordering.
if (NewNode->NodeNum >= NumSUnits)		if (NewNode->NodeNum >= NumSUnits)
Topo.MarkDirty();		Topo.AddSUnitWithoutPredecessors(NewNode);
return NewNode;		return NewNode;
}		}

/// forceUnitLatencies - Register-pressure-reducing scheduling doesn't		/// forceUnitLatencies - Register-pressure-reducing scheduling doesn't
/// need actual latency information but the hybrid scheduler does.		/// need actual latency information but the hybrid scheduler does.
bool forceUnitLatencies() const override {		bool forceUnitLatencies() const override {
return !NeedLatency;		return !NeedLatency;
}		}
▲ Show 20 Lines • Show All 2,888 Lines • Show Last 20 Lines