This is an archive of the discontinued LLVM Phabricator instance.

MachineScheduler: Enable macro fusion in post-RA scheduler
AbandonedPublic

Authored by MatzeB on Sep 22 2016, 6:22 PM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
jonpa
atrick
kparzysz

Summary

The post-RA scheduler should respect macro fusion opportunities.

This also changes the strategy to enforce adjacent scheduling: While we
previously added a weak clustering edge between the fusing nodes we now
add strong artificial ordering edges towards all other nodes to enforce
the order. This was found necessary to avoid cases in which the cost
heuristic for the weak edges did not have any effect because some nodes
were still in the pending queue and not even considered by
tryCandidate().

Diff Detail

Repository: rL LLVM

Event Timeline

MatzeB updated this revision to Diff 72232.Sep 22 2016, 6:22 PM

MatzeB retitled this revision from to MachineScheduler: Enable macro fusion in post-RA scheduler.

MatzeB updated this object.

MatzeB added reviewers: atrick, • tstellarAMD, jonpa.

MatzeB set the repository for this revision to rL LLVM.

MatzeB added a subscriber: llvm-commits.

Herald added subscribers: wdng, mcrosier, MatzeB. · View Herald TranscriptSep 22 2016, 6:22 PM

MatzeB added a reviewer: kparzysz.Sep 23 2016, 1:24 PM

This looks fairly straightforward... LGTM.

lib/CodeGen/MachineScheduler.cpp
2833	delay -> Delay

This revision is now accepted and ready to land.Sep 26 2016, 5:16 AM

It turned out the approach taken previously was not enough: Currently nodes predicted to stall will end up in the pending queue and not even get consider in tryCandidate() for the usual heuristics. However a possible stalls should not get in the way of the macrofusion heuristic so instead of adjusting the picking heuristic this patch adds artificial scheduling edges to the roots to enforce the adjacent scheduling.

PS: In some internal discussions we decided that a nice long term solution would be to merge the fusing nodes instead in the preparation step (by creating an instruction bundle and merging the ScheduleDAG nodes). However merging nodes after creating the scheduling DAG turns out to be tricky because the existing code expects a fixed number of SUnits, we would need to update or recompute the topological ordering, etc. So to get this specific problem under control I found adding edges to root nodes a robust and simpler solution for now.

Marking a DAG node as "dead" for the purpose of scheduling should be an easy thing to do, relative to supporting instruction bundles. But adding the extra DAG edges is also a fine solution, just not quite as direct.

Update the patch so that addFusionEdges() does not walk over all edges in the graph anymore. (Just all nodes and the predecessors edges of ExitSU now).

MatzeB added a child revision: D25140: ScheduleDAGInstrs: Add condjump deps in addSchedBarrierDeps().Oct 3 2016, 5:30 PM

ping

Doesn't this force macro fusion for all targets/subtargets? If we wanted to do that, we wouldn't need the cluster edge and scheduler heuristic anymore. Shouldn't there be a TII->forceMacroFusion() option?

In D24855#572492, @atrick wrote:

Doesn't this force macro fusion for all targets/subtargets? If we wanted to do that, we wouldn't need the cluster edge and scheduler heuristic anymore. Shouldn't there be a TII->forceMacroFusion() option?

If the target doesn't implement TII::shouldScheduleAdjacent() then no fusion will happen. Of course this commit forces fusion to be respected even in the presence of possible stalls in the scheduling model. This is a switch of priorities and indeed does not allow you any more to insert the macrofusion check at any place in the heuristic (the scheduling model/stalls aren't part of that heuristic either so I couldn't just move the cluster edge heuristic to an earlier place in tryCandidate() so I went this route).

Do you think a TII->forceMacroFusion() hook is necessary? Currently the only targets implementing shouldScheduleAdjacent() are X86 and AArch64 and they should both prefer fusion over reported stalls, making the cluster/weak solution untested/dead code.

On x86 it's meant to be a heuristic. If you think all subtargets should instead force macro-fusion before scheduling (and have benchmarks to prove it) then we should delete the code that implements the heuristic.

In D24855#572495, @atrick wrote:

On x86 it's meant to be a heuristic. If you think all subtargets should instead force macro-fusion before scheduling (and have benchmarks to prove it) then we should delete the code that implements the heuristic.

My experience has been that this mostly matters/changes the outcome when scheduling top-down, which currently happens only in the PostRAScheduler. The combination of PostRAScheduling and MacroOpFusion enabled only seems to happen in the BtVer2/Jaguar scheduling model at the moment for which I have no hardware to test. So I have no good way of benchmarking this (but also no indication why this would ever be good as a heuristic).

Anyway I to keep the possibility of weak edges, I'd add make shouldScheduleAdjacent() return an enum value to indicate whether weak/hard edges should be used. I'll update that in the next days.

This is out of date. Nowadays targets can decide themselfes whether they want post-ra fusion by overriding TaretPassConfig::createPostMachineScheduler() and adding a dag mutation there.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

ScheduleDAG.h

22 lines

lib/

CodeGen/

MachineScheduler.cpp

93 lines

test/

CodeGen/

AArch64/

postmisched-fusion.mir

23 lines

Diff 72232

include/llvm/CodeGen/ScheduleDAG.h

Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	public:
bool isPending : 1; // True once pending.		bool isPending : 1; // True once pending.
bool isAvailable : 1; // True once available.		bool isAvailable : 1; // True once available.
bool isScheduled : 1; // True once scheduled.		bool isScheduled : 1; // True once scheduled.
bool isScheduleHigh : 1; // True if preferable to schedule high.		bool isScheduleHigh : 1; // True if preferable to schedule high.
bool isScheduleLow : 1; // True if preferable to schedule low.		bool isScheduleLow : 1; // True if preferable to schedule low.
bool isCloned : 1; // True if this node has been cloned.		bool isCloned : 1; // True if this node has been cloned.
bool isUnbuffered : 1; // Uses an unbuffered resource.		bool isUnbuffered : 1; // Uses an unbuffered resource.
bool hasReservedResource : 1; // Uses a reserved resource.		bool hasReservedResource : 1; // Uses a reserved resource.
		bool clusteredWithBottom : 1; // Node clustered with bottom boundary.
Sched::Preference SchedulingPref; // Scheduling preference.		Sched::Preference SchedulingPref; // Scheduling preference.

private:		private:
bool isDepthCurrent : 1; // True if Depth is current.		bool isDepthCurrent : 1; // True if Depth is current.
bool isHeightCurrent : 1; // True if Height is current.		bool isHeightCurrent : 1; // True if Height is current.
unsigned Depth; // Node depth.		unsigned Depth; // Node depth.
unsigned Height; // Node height.		unsigned Height; // Node height.
public:		public:
Show All 10 Lines	SUnit(SDNode *node, unsigned nodenum)
NodeNum(nodenum), NodeQueueId(0), NumPreds(0), NumSuccs(0),		NodeNum(nodenum), NodeQueueId(0), NumPreds(0), NumSuccs(0),
NumPredsLeft(0), NumSuccsLeft(0), WeakPredsLeft(0), WeakSuccsLeft(0),		NumPredsLeft(0), NumSuccsLeft(0), WeakPredsLeft(0), WeakSuccsLeft(0),
NumRegDefsLeft(0), Latency(0), isVRegCycle(false), isCall(false),		NumRegDefsLeft(0), Latency(0), isVRegCycle(false), isCall(false),
isCallOp(false), isTwoAddress(false), isCommutable(false),		isCallOp(false), isTwoAddress(false), isCommutable(false),
hasPhysRegUses(false), hasPhysRegDefs(false), hasPhysRegClobbers(false),		hasPhysRegUses(false), hasPhysRegDefs(false), hasPhysRegClobbers(false),
isPending(false), isAvailable(false), isScheduled(false),		isPending(false), isAvailable(false), isScheduled(false),
isScheduleHigh(false), isScheduleLow(false), isCloned(false),		isScheduleHigh(false), isScheduleLow(false), isCloned(false),
isUnbuffered(false), hasReservedResource(false),		isUnbuffered(false), hasReservedResource(false),
SchedulingPref(Sched::None), isDepthCurrent(false),		clusteredWithBottom(false), SchedulingPref(Sched::None),
isHeightCurrent(false), Depth(0), Height(0), TopReadyCycle(0),		isDepthCurrent(false), isHeightCurrent(false), Depth(0), Height(0),
BotReadyCycle(0), CopyDstRC(nullptr), CopySrcRC(nullptr) {}		TopReadyCycle(0), BotReadyCycle(0), CopyDstRC(nullptr),
		CopySrcRC(nullptr) {}

/// SUnit - Construct an SUnit for post-regalloc scheduling to represent		/// SUnit - Construct an SUnit for post-regalloc scheduling to represent
/// a MachineInstr.		/// a MachineInstr.
SUnit(MachineInstr *instr, unsigned nodenum)		SUnit(MachineInstr *instr, unsigned nodenum)
: Node(nullptr), Instr(instr), OrigNode(nullptr), SchedClass(nullptr),		: Node(nullptr), Instr(instr), OrigNode(nullptr), SchedClass(nullptr),
NodeNum(nodenum), NodeQueueId(0), NumPreds(0), NumSuccs(0),		NodeNum(nodenum), NodeQueueId(0), NumPreds(0), NumSuccs(0),
NumPredsLeft(0), NumSuccsLeft(0), WeakPredsLeft(0), WeakSuccsLeft(0),		NumPredsLeft(0), NumSuccsLeft(0), WeakPredsLeft(0), WeakSuccsLeft(0),
NumRegDefsLeft(0), Latency(0), isVRegCycle(false), isCall(false),		NumRegDefsLeft(0), Latency(0), isVRegCycle(false), isCall(false),
isCallOp(false), isTwoAddress(false), isCommutable(false),		isCallOp(false), isTwoAddress(false), isCommutable(false),
hasPhysRegUses(false), hasPhysRegDefs(false), hasPhysRegClobbers(false),		hasPhysRegUses(false), hasPhysRegDefs(false), hasPhysRegClobbers(false),
isPending(false), isAvailable(false), isScheduled(false),		isPending(false), isAvailable(false), isScheduled(false),
isScheduleHigh(false), isScheduleLow(false), isCloned(false),		isScheduleHigh(false), isScheduleLow(false), isCloned(false),
isUnbuffered(false), hasReservedResource(false),		isUnbuffered(false), hasReservedResource(false),
SchedulingPref(Sched::None), isDepthCurrent(false),		clusteredWithBottom(false), SchedulingPref(Sched::None),
isHeightCurrent(false), Depth(0), Height(0), TopReadyCycle(0),		isDepthCurrent(false), isHeightCurrent(false), Depth(0), Height(0),
BotReadyCycle(0), CopyDstRC(nullptr), CopySrcRC(nullptr) {}		TopReadyCycle(0), BotReadyCycle(0), CopyDstRC(nullptr),
		CopySrcRC(nullptr) {}

/// SUnit - Construct a placeholder SUnit.		/// SUnit - Construct a placeholder SUnit.
SUnit()		SUnit()
: Node(nullptr), Instr(nullptr), OrigNode(nullptr), SchedClass(nullptr),		: Node(nullptr), Instr(nullptr), OrigNode(nullptr), SchedClass(nullptr),
NodeNum(BoundaryID), NodeQueueId(0), NumPreds(0), NumSuccs(0),		NodeNum(BoundaryID), NodeQueueId(0), NumPreds(0), NumSuccs(0),
NumPredsLeft(0), NumSuccsLeft(0), WeakPredsLeft(0), WeakSuccsLeft(0),		NumPredsLeft(0), NumSuccsLeft(0), WeakPredsLeft(0), WeakSuccsLeft(0),
NumRegDefsLeft(0), Latency(0), isVRegCycle(false), isCall(false),		NumRegDefsLeft(0), Latency(0), isVRegCycle(false), isCall(false),
isCallOp(false), isTwoAddress(false), isCommutable(false),		isCallOp(false), isTwoAddress(false), isCommutable(false),
hasPhysRegUses(false), hasPhysRegDefs(false), hasPhysRegClobbers(false),		hasPhysRegUses(false), hasPhysRegDefs(false), hasPhysRegClobbers(false),
isPending(false), isAvailable(false), isScheduled(false),		isPending(false), isAvailable(false), isScheduled(false),
isScheduleHigh(false), isScheduleLow(false), isCloned(false),		isScheduleHigh(false), isScheduleLow(false), isCloned(false),
isUnbuffered(false), hasReservedResource(false),		isUnbuffered(false), hasReservedResource(false),
SchedulingPref(Sched::None), isDepthCurrent(false),		clusteredWithBottom(false), SchedulingPref(Sched::None),
isHeightCurrent(false), Depth(0), Height(0), TopReadyCycle(0),		isDepthCurrent(false), isHeightCurrent(false), Depth(0), Height(0),
BotReadyCycle(0), CopyDstRC(nullptr), CopySrcRC(nullptr) {}		TopReadyCycle(0), BotReadyCycle(0), CopyDstRC(nullptr),
		CopySrcRC(nullptr) {}

/// \brief Boundary nodes are placeholders for the boundary of the		/// \brief Boundary nodes are placeholders for the boundary of the
/// scheduling region.		/// scheduling region.
///		///
/// BoundaryNodes can have DAG edges, including Data edges, but they do not		/// BoundaryNodes can have DAG edges, including Data edges, but they do not
/// correspond to schedulable entities (e.g. instructions) and do not have a		/// correspond to schedulable entities (e.g. instructions) and do not have a
/// valid ID. Consequently, always check for boundary nodes before accessing		/// valid ID. Consequently, always check for boundary nodes before accessing
/// an assoicative data structure keyed on node ID.		/// an assoicative data structure keyed on node ID.
▲ Show 20 Lines • Show All 412 Lines • Show Last 20 Lines

lib/CodeGen/MachineScheduler.cpp

Show First 20 Lines • Show All 1,530 Lines • ▼ Show 20 Lines	if (Other.modifiesRegister(Reg, &TRI))
return true;		return true;
}		}
return false;		return false;
}		}

/// \brief Callback from DAG postProcessing to create cluster edges to encourage		/// \brief Callback from DAG postProcessing to create cluster edges to encourage
/// fused operations.		/// fused operations.
void MacroFusion::apply(ScheduleDAGInstrs *DAGInstrs) {		void MacroFusion::apply(ScheduleDAGInstrs *DAGInstrs) {
ScheduleDAGMI DAG = static_cast<ScheduleDAGMI>(DAGInstrs);		ScheduleDAGMI &DAG = static_cast<ScheduleDAGMI&>(*DAGInstrs);

// For now, assume targets can only fuse with the branch.		// For now, assume targets can only fuse with the branch.
SUnit &ExitSU = DAG->ExitSU;		SUnit &ExitSU = DAG.ExitSU;
MachineInstr *Branch = ExitSU.getInstr();		MachineInstr *Branch = ExitSU.getInstr();
if (!Branch)		if (!Branch)
return;		return;

for (SUnit &SU : DAG->SUnits) {		for (SUnit &SU : DAG.SUnits) {
// SUnits with successors can't be schedule in front of the ExitSU.		// SUnits with successors can't be schedule in front of the ExitSU.
if (!SU.Succs.empty())		if (!SU.Succs.empty())
continue;		continue;
// We only care if the node writes to a register that the branch reads.		// We only care if the node writes to a register that the branch reads.
MachineInstr *Pred = SU.getInstr();		MachineInstr *Pred = SU.getInstr();
if (!HasDataDep(TRI, Branch, Pred))		if (!HasDataDep(TRI, Branch, Pred))
continue;		continue;

if (!TII.shouldScheduleAdjacent(Pred, Branch))		if (!TII.shouldScheduleAdjacent(Pred, Branch))
continue;		continue;

// Create a single weak edge from SU to ExitSU. The only effect is to cause		// Create a single weak edge from SU to ExitSU. The only effect is to cause
// bottom-up scheduling to heavily prioritize the clustered SU. There is no		// bottom-up scheduling to heavily prioritize the clustered SU.
// need to copy predecessor edges from ExitSU to SU, since top-down		bool Success = DAG.addEdge(&ExitSU, SDep(&SU, SDep::Cluster));
// scheduling cannot prioritize ExitSU anyway. To defer top-down scheduling
// of SU, we could create an artificial edge from the deepest root, but it
// hasn't been needed yet.
bool Success = DAG->addEdge(&ExitSU, SDep(&SU, SDep::Cluster));
(void)Success;		(void)Success;
assert(Success && "No DAG nodes should be reachable from ExitSU");		assert(Success && "No DAG nodes should be reachable from ExitSU");

		// Currently only works for clustering with the ExitSU. If this ever needs
		// to be extended to arbitrary nodes then we probably need to copy the preds
		// of the second to the first node and the succs of the second to the first
		// as weak edges.
		assert(ExitSU.isBoundaryNode() && "Only works for ExitSU for now.");
		// This will defer scheduling of \p SU in top-down scheduling.
		SU.clusteredWithBottom = true;

DEBUG(dbgs() << "Macro Fuse SU(" << SU.NodeNum << ")\n");		DEBUG(dbgs() << "Macro Fuse SU(" << SU.NodeNum << ")\n");
break;		break;
}		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// CopyConstrain - DAG post-processing to encourage copy elimination.		// CopyConstrain - DAG post-processing to encourage copy elimination.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 1,215 Lines • ▼ Show 20 Lines	if (DAG->isTrackingPressure()) {
}		}
}		}
DEBUG(if (Cand.RPDelta.Excess.isValid())		DEBUG(if (Cand.RPDelta.Excess.isValid())
dbgs() << " Try SU(" << Cand.SU->NodeNum << ") "		dbgs() << " Try SU(" << Cand.SU->NodeNum << ") "
<< TRI->getRegPressureSetName(Cand.RPDelta.Excess.getPSet())		<< TRI->getRegPressureSetName(Cand.RPDelta.Excess.getPSet())
<< ":" << Cand.RPDelta.Excess.getUnitInc() << "\n");		<< ":" << Cand.RPDelta.Excess.getUnitInc() << "\n");
}		}

		static bool tryCluster(GenericSchedulerBase::SchedCandidate &Cand,
		GenericSchedulerBase::SchedCandidate &TryCand,
		const ScheduleDAGMI &DAG) {
		// Keep clustered nodes together to encourage downstream peephole
		// optimizations which may reduce resource requirements.
		//
		// This is a best effort to set things up for a post-RA pass. Optimizations
		// like generating loads of multiple registers should ideally be done within
		// the scheduler pass by combining the loads during DAG postprocessing.
		const SUnit *CandNextClusterSU =
		Cand.AtTop ? DAG.getNextClusterSucc() : DAG.getNextClusterPred();
		const SUnit *TryCandNextClusterSU =
		TryCand.AtTop ? DAG.getNextClusterSucc() : DAG.getNextClusterPred();
		return tryGreater(TryCand.SU == TryCandNextClusterSU,
		Cand.SU == CandNextClusterSU, TryCand, Cand,
		GenericScheduler::Cluster);
		}

		static bool tryWeak(GenericSchedulerBase::SchedCandidate &Cand,
		GenericSchedulerBase::SchedCandidate &TryCand) {
		if (Cand.AtTop != TryCand.AtTop)
		return false;

		// Weak edges are for clustering and other constraints.
		if (tryLess(getWeakLeft(TryCand.SU, TryCand.AtTop),
		getWeakLeft(Cand.SU, Cand.AtTop),
		TryCand, Cand, GenericScheduler::Weak))
		return true;

		// delay top node when it should be clustered with the bottom boundary.
		kparzyszUnsubmitted Not Done Reply Inline Actions delay -> Delay kparzysz: delay -> Delay
		return TryCand.AtTop &&
		tryLess(TryCand.SU->clusteredWithBottom, Cand.SU->clusteredWithBottom,
		TryCand, Cand, GenericScheduler::Weak);
		}

/// Apply a set of heursitics to a new candidate. Heuristics are currently		/// Apply a set of heursitics to a new candidate. Heuristics are currently
/// hierarchical. This may be more efficient than a graduated cost model because		/// hierarchical. This may be more efficient than a graduated cost model because
/// we don't need to evaluate all aspects of the model for each node in the		/// we don't need to evaluate all aspects of the model for each node in the
/// queue. But it's really done to make the heuristics easier to debug and		/// queue. But it's really done to make the heuristics easier to debug and
/// statistically analyze.		/// statistically analyze.
///		///
/// \param Cand provides the policy and current best candidate.		/// \param Cand provides the policy and current best candidate.
/// \param TryCand refers to the next SUnit candidate, otherwise uninitialized.		/// \param TryCand refers to the next SUnit candidate, otherwise uninitialized.
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (Rem.IsAcyclicLatencyLimited && !Zone->getCurrMOps() &&
return;		return;

// Prioritize instructions that read unbuffered resources by stall cycles.		// Prioritize instructions that read unbuffered resources by stall cycles.
if (tryLess(Zone->getLatencyStallCycles(TryCand.SU),		if (tryLess(Zone->getLatencyStallCycles(TryCand.SU),
Zone->getLatencyStallCycles(Cand.SU), TryCand, Cand, Stall))		Zone->getLatencyStallCycles(Cand.SU), TryCand, Cand, Stall))
return;		return;
}		}

// Keep clustered nodes together to encourage downstream peephole		if (tryCluster(Cand, TryCand, *DAG))
// optimizations which may reduce resource requirements.
//
// This is a best effort to set things up for a post-RA pass. Optimizations
// like generating loads of multiple registers should ideally be done within
// the scheduler pass by combining the loads during DAG postprocessing.
const SUnit *CandNextClusterSU =
Cand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();
const SUnit *TryCandNextClusterSU =
TryCand.AtTop ? DAG->getNextClusterSucc() : DAG->getNextClusterPred();
if (tryGreater(TryCand.SU == TryCandNextClusterSU,
Cand.SU == CandNextClusterSU,
TryCand, Cand, Cluster))
return;		return;
		if (tryWeak(Cand, TryCand))
if (SameBoundary) {
// Weak edges are for clustering and other constraints.
if (tryLess(getWeakLeft(TryCand.SU, TryCand.AtTop),
getWeakLeft(Cand.SU, Cand.AtTop),
TryCand, Cand, Weak))
return;		return;
}

// Avoid increasing the max pressure of the entire region.		// Avoid increasing the max pressure of the entire region.
if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CurrentMax,		if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CurrentMax,
Cand.RPDelta.CurrentMax,		Cand.RPDelta.CurrentMax,
TryCand, Cand, RegMax, TRI,		TryCand, Cand, RegMax, TRI,
DAG->MF))		DAG->MF))
return;		return;

▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	void PostGenericScheduler::tryCandidate(SchedCandidate &Cand,
SchedCandidate &TryCand) {		SchedCandidate &TryCand) {

// Initialize the candidate if needed.		// Initialize the candidate if needed.
if (!Cand.isValid()) {		if (!Cand.isValid()) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
return;		return;
}		}

		if (tryCluster(Cand, TryCand, *DAG))
		return;
		if (tryWeak(Cand, TryCand))
		return;

// Prioritize instructions that read unbuffered resources by stall cycles.		// Prioritize instructions that read unbuffered resources by stall cycles.
if (tryLess(Top.getLatencyStallCycles(TryCand.SU),		if (tryLess(Top.getLatencyStallCycles(TryCand.SU),
Top.getLatencyStallCycles(Cand.SU), TryCand, Cand, Stall))		Top.getLatencyStallCycles(Cand.SU), TryCand, Cand, Stall))
return;		return;

// Avoid critical resource consumption and balance the schedule.		// Avoid critical resource consumption and balance the schedule.
if (tryLess(TryCand.ResDelta.CritResources, Cand.ResDelta.CritResources,		if (tryLess(TryCand.ResDelta.CritResources, Cand.ResDelta.CritResources,
TryCand, Cand, ResourceReduce))		TryCand, Cand, ResourceReduce))
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
/// scheduled/remaining flags in the DAG nodes.		/// scheduled/remaining flags in the DAG nodes.
void PostGenericScheduler::schedNode(SUnit *SU, bool IsTopNode) {		void PostGenericScheduler::schedNode(SUnit *SU, bool IsTopNode) {
SU->TopReadyCycle = std::max(SU->TopReadyCycle, Top.getCurrCycle());		SU->TopReadyCycle = std::max(SU->TopReadyCycle, Top.getCurrCycle());
Top.bumpNode(SU);		Top.bumpNode(SU);
}		}

/// Create a generic scheduler with no vreg liveness or DAG mutation passes.		/// Create a generic scheduler with no vreg liveness or DAG mutation passes.
static ScheduleDAGInstrs createGenericSchedPostRA(MachineSchedContext C) {		static ScheduleDAGInstrs createGenericSchedPostRA(MachineSchedContext C) {
return new ScheduleDAGMI(C, make_unique<PostGenericScheduler>(C), /IsPostRA=/true);		ScheduleDAGMI *DAG =
		new ScheduleDAGMI(C, make_unique<PostGenericScheduler>(C),
		/IsPostRA=/true);
		if (EnableMacroFusion)
		DAG->addMutation(createMacroFusionDAGMutation(DAG->TII, DAG->TRI));
		return DAG;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ILP Scheduler. Currently for experimental analysis of heuristics.		// ILP Scheduler. Currently for experimental analysis of heuristics.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
/// \brief Order nodes by the ILP metric.		/// \brief Order nodes by the ILP metric.
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

test/CodeGen/AArch64/postmisched-fusion.mir

This file was added.

				# RUN: llc -o - %s -mtriple=aarch64-- -mcpu=cyclone -enable-post-misched -run-pass=postmisched \| FileCheck %s
				# Test that the post machine scheduler respects macro op fusion.
				--- \|
				define void @func0() { ret void }
				...
				---
				# CHECK-LABEL: name: func0
				# CHECK: %xzr = SUBSXri{{.*}}implicit-def %nzcv
				# CHECK-NEXT: Bcc {{.*}}implicit killed %nzcv
				name: func0
				body: \|
				bb.0:
				successors: %bb.1, %bb.2
				%x8 = IMPLICIT_DEF
				%x9 = LDRXui %x8, 0 :: (load 8)
				dead %xzr = SUBSXri %x8, 0, 0, implicit def %nzcv
				%x10 = ADDXri %x9, 13, 0
				Bcc 1, %bb.1, implicit killed %nzcv
				B %bb.2

				bb.1:
				bb.2:
				...

This is an archive of the discontinued LLVM Phabricator instance.

MachineScheduler: Enable macro fusion in post-RA schedulerAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 72232

include/llvm/CodeGen/ScheduleDAG.h

lib/CodeGen/MachineScheduler.cpp

test/CodeGen/AArch64/postmisched-fusion.mir

MachineScheduler: Enable macro fusion in post-RA scheduler
AbandonedPublic