This is an archive of the discontinued LLVM Phabricator instance.

Use a much more efficient (linear instead of quadratic?) algorithm for computing the height: standard DFS mincost algorithm for a DAG.
AcceptedPublic

Authored by chandlerc on Jun 16 2016, 4:05 AM.

Download Raw Diff

Details

Reviewers

Summary

Not only is the old algorithm essentially BFS instead of DFS, it weirdly
can mark heights as dirty which then walks preds. Very strang. I think
a direct DFS that just computes the height of everything in the DAG from
the bottom up and avoids re-computing already computed heights is much
easier to understand.

For test/CodeGen/AMDGPU/spill-scavenge-offset.ll, prior to it having any
scheduler turned off, computing the height was over 25% of the runtime.
With this patch, it is completely gone. I'm measuring improvements from
32s to 24s (25%) in debug builds and 4.5s to 3.17s (30%) in an optimized
build for thes test.

Glancing at it, the depth computation probably needs the same treatment,
but I've not yet found a test that exercises this (I've not looked too
hard yet though).

As with my prior patch, this impacts both SDAG scheduling and MI
scheduling, but for different reasons -- in this case, both schedulers
call the ScheduleDAG's getHeight routine heavily and were causing it
show up in profiles for spill-scavenge-offset.ll.

Diff Detail

Event Timeline

chandlerc updated this revision to Diff 60961.Jun 16 2016, 4:05 AM

chandlerc retitled this revision from to Use a much more efficient (linear instead of quadratic?) algorithm for computing the height: standard DFS mincost algorithm for a DAG..

chandlerc updated this object.

chandlerc added a subscriber: llvm-commits.

Herald added subscribers: mcrosier, MatzeB. · View Herald TranscriptJun 16 2016, 4:05 AM

MatzeB added a subscriber: atrick.Jun 16 2016, 3:50 PM

Awesome. Proper DFS is how I've done it in the past.

ComputeDepth should be rewritten too.

This revision is now accepted and ready to land.Jun 16 2016, 9:45 PM

Look like patch was not committed.

Revision Contents

Path

Size

lib/

CodeGen/

ScheduleDAG.cpp

82 lines

Diff 60961

lib/CodeGen/ScheduleDAG.cpp

Show First 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	if (Done) {
Cur->isDepthCurrent = true;		Cur->isDepthCurrent = true;
}		}
} while (!WorkList.empty());		} while (!WorkList.empty());
}		}

/// ComputeHeight - Calculate the maximal path from the node to the entry.		/// ComputeHeight - Calculate the maximal path from the node to the entry.
///		///
void SUnit::ComputeHeight() {		void SUnit::ComputeHeight() {
SmallVector<SUnit*, 8> WorkList;		if (Succs.empty()) {
WorkList.push_back(this);		Height = 0;
do {		isHeightCurrent = true;
SUnit *Cur = WorkList.back();		return;
		}

bool Done = true;		SmallVector<std::pair<SUnit *, SUnit::const_succ_iterator>, 8> Stack;
unsigned MaxSuccHeight = 0;		#ifndef NDEBUG
for (SUnit::const_succ_iterator I = Cur->Succs.begin(),		SmallPtrSet<SUnit *, 8> Visited;
E = Cur->Succs.end(); I != E; ++I) {		Visited.insert(this);
		#endif
		Height = 0;
		SUnit *Cur = this;
		SUnit::const_succ_iterator I = Succs.begin();
		for (;;) {
		assert(!Cur->isHeightCurrent &&
		"No need to compute the height once current!");

		SUnit::const_succ_iterator E = Cur->Succs.end();
		assert(I != E && "Got an end iterator into the stack!");

		do {
SUnit *SuccSU = I->getSUnit();		SUnit *SuccSU = I->getSUnit();
if (SuccSU->isHeightCurrent)		assert(!Visited.count(SuccSU) && "Found a cycle!");
MaxSuccHeight = std::max(MaxSuccHeight,
SuccSU->Height + I->getLatency());		if (!SuccSU->isHeightCurrent) {
else {		if (SuccSU->Succs.empty()) {
Done = false;		SuccSU->Height = 0;
WorkList.push_back(SuccSU);		SuccSU->isHeightCurrent = true;
		} else {
		// We have a successor without a current height.
		// Push the current position onto the stack and process that
		// successor.
		Stack.push_back({Cur, I});
		Cur = SuccSU;
		#ifndef NDEBUG
		Visited.insert(Cur);
		#endif
		I = Cur->Succs.begin();
		E = Cur->Succs.end();
		continue;
}		}
}		}

if (Done) {		// Max the current height against this successor's height.
WorkList.pop_back();		Cur->Height = std::max(Cur->Height, SuccSU->Height + I->getLatency());
if (MaxSuccHeight != Cur->Height) {		++I;
Cur->setHeightDirty();		} while (I != E);
Cur->Height = MaxSuccHeight;
}		// We've examined all successors of this SU, our height is now current.
Cur->isHeightCurrent = true;		Cur->isHeightCurrent = true;

		// If there are no further nodes to process, we're finished.
		if (Stack.empty())
		return;

		// Otherwise pop the stack and continue.
		#ifndef NDEBUG
		Visited.erase(Cur);
		#endif
		std::tie(Cur, I) = Stack.pop_back_val();
}		}
} while (!WorkList.empty());
}		}

void SUnit::biasCriticalPath() {		void SUnit::biasCriticalPath() {
if (NumPreds < 2)		if (NumPreds < 2)
return;		return;

SUnit::pred_iterator BestI = Preds.begin();		SUnit::pred_iterator BestI = Preds.begin();
unsigned MaxDepth = BestI->getSUnit()->getDepth();		unsigned MaxDepth = BestI->getSUnit()->getDepth();
▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines