Download Raw Diff

Details

Reviewers

rovka
rengolin
MatzeB
atrick

Commits

rG3d594370933b: Improve machine schedulers for in-order processors
rL298885: Improve machine schedulers for in-order processors

Summary

This patch enables schedulers for in-order processors to specify instructions that
cannot be issued with any other instructions.

Schedulers for such sub-targets can now attach ‘let SingleIssue = 1; // single issue ‘
to any general sched class or individual instruction. The scheduler then does not
allow those instructions to be issued in same cycle with any other instruction.
In this way, the scheduler models more accurately the real pipeline behavior.
Some partial out-of-order processors may also benefit from this.

Diff Detail

Event Timeline

javed.absar created this revision.Mar 8 2017, 9:21 AM

Can't you already get this effect with BufferSize = 0 in a ProcResource?

Hi Matthias:

The single-issue restriction is on the instruction and not a limitation of the ProcResource e.g. for some processor, a load using load/store unit may be allowed dual-issue, but VLDx using the same load/store unit may not be allowed to dual-issue.

I think what Matthias was getting at is that you can add a new ProcResource to model issue constraints. If you have a dual issue machine then any instruction that can't be grouped would simply consume both issue resources. Another option might be to use the begin/end group flags.

A SingleIssue flag seems like special casing a problem that needs a general solution.

That said, if there are multiple target maintainers who think adding a special SingleIssue case would be much better than the alternatives, then it may be worth adding.

The single-issue restriction is on the instruction and not a limitation of the ProcResource e.g. for some processor, a load using load/store unit may be allowed dual-issue, but VLDx using the same load/store unit may not be allowed to dual-issue.

I guess this could still be modeled by making the instruction occupy every processor resource. So I was wondering whether we need the extension of the scheduling model.

In D30744#696124, @atrick wrote:

That said, if there are multiple target maintainers who think adding a special SingleIssue case would be much better than the alternatives, then it may be worth adding.

We could also make the SingleIssue flag syntactic sugar so tablegen adds all available proc resources to the instruction.

In D30744#696106, @atrick wrote:

I think what Matthias was getting at is that you can add a new ProcResource to model issue constraints. If you have a dual issue machine then any instruction that can't be grouped would simply consume both issue resources. Another option might be to use the begin/end group flags.

A SingleIssue flag seems like special casing a problem that needs a general solution.

For those of us quite familiar with the scheduler, using your suggested approach would indeed be a workaround. However, for those not so familiar, but wanting to use machine-scheduler to model their sub-target, the natural way to see ProcResource is as a resource (to do something e.g. perform an add). Using resource to impose restriction on instructions is bit counter-intuitive, and would not occur naturally at first. The sad consequence is - this aspect of the pipeline would not be properly or at all modeled.

In D30744#696136, @MatzeB wrote:

I guess this could still be modeled by making the instruction occupy every processor resource. So I was wondering whether we need the extension of the scheduling model.

Good idea. Unfortunately a syntatic-sugar that adds 'every processor resource' would not work. For example, if div is not-pipelined and is in execution; then an instruction X that does not need UniDiv but is single issue cannot be issued and has to wait for div to finish.
Besides, workarounds destroy simplicity/elegance and having a straightforward bit 'isSingleIssue' that is simple to interpret for anyone and does exactly what it says, would possibly be better.

Separate processor resources should be defined to model issue resources rather than trying to work around it by consuming functional units. The processor resources were always intended to be used for both issue resources and functional units. It just isn't well documented or easily discoverable. It might makes sense to have either a single issue flag or a separate list of issue resources in the tablegen machine model. My argument is that the scheduler itself already supports this functionality.

There at three ways this feature is already generally supported by the scheduler's machine model:

Begin/EndGroup.

This feature was added to model issue constraints that were too awkward to model with resources. It's the most straightforward way to model single issue. Adding syntactic sugar to tablegen's machine model should be trivial.

Issue resources

Define a new type of processor resource with N units. A single-issue instructions takes N unit to issue. This was an intended use of the machine model. The scheduler will model it with the issue counter.

Hardware ports.

x86 models "issue" ports this way.
e.g. def IssueSlots : ProcResGroup<[Slot0, Slot1]>

It was designed to handle micro-ops, so it's slightly misleading. A single-issue instruction on a dual issue machine would simply use both issue slots. Other instructions would use the IssueSlots group. The scheduler models this with a bit for each port.

In D30744#696730, @atrick wrote:

This feature was added to model issue constraints that were too awkward to model with resources. It's the most straightforward way to model single issue. Adding syntactic sugar to tablegen's machine model should be trivial.

Can you please elaborate on this a bit so I can try to develop a patch to add this. Thanks.

Hi Andrew/Matthias:
Would having a "SingleIssue" bit flag, as this patch proposes, be a good user option, even though I agree other workarounds can achieve same effect.
Thanks.

I think a SingleIssue flag would be a good user option. Representing such a flag in the tables generated by the machine model (MCSchedClassDesc) would be redundant. Either begin/endgroup already does the same thing, or it's a bug that should be fixed.

I just looked at the MachineScheduler implementation. I see this TODO:
/// TODO: Also check whether the SU must start a new group.

So, I think your patch is necessary except:

The SubtargetEmitter can simply set the begin/end group flags for SingleIssue

Instead of adding an isSingleIssue() API to TargetSchedule, you should add mustBeginGroup() and mustEndGroup() that checks the existing bits in MCSchedClassDesc.

In the scheduler, mustBeginGroup() is a hazard if CurMOps > 0 when isTop() is true. Similarly, mustEndGroup() is the same hazard for !isTop().

Your patch needs to handle isTop() && mustEndGroup(); likewise, !isTop() && mustBeginGroup() by calling bumpCycle(++NextCycle). Do this just after setting CurrMOps at CurrMOps += IncMOps, and just before checking the MOps hazard at while (CurrMOps >= SchedModel->getIssueWidth()). That obviously needs to be commented: "Bump the cycle count for issue group constraints. This must be done after NextCycle has been adjust for all other stalls. Calling bumpCycle(X) with reduces CurrMOps by one issue group and sets currCycle to X."

Your current implementation of isSingleIssue is actually only half the solution. Implementing the begin/end group bits will make it complete.

Hi Andrew:
Thanks for the help. I have made the changes as you suggested (unless I misunderstood some parts). Please have a look.
Thanks a lot.
Javed

Ping.

Thanks.

I don't think you need the SingleIssue flag in MCSchedule.h any more.

Also, I don't see the hazard checker logic yet. I think you still need to so this:

In the scheduler, mustBeginGroup() is a hazard if CurMOps > 0 when isTop() is true. Similarly, mustEndGroup() is the same hazard for !isTop().

It looks like only half the logic is there now--the part where you bump the cycle when finishing a group. Maybe you need a better test case to exercise the hazard checking logic?

This revision now requires changes to proceed.Mar 22 2017, 10:52 AM

Hi Andrew:

You are right, I had missed out the hazard part. Have fixed things now. Please have a look if it is ok now.
Thanks
Javed

That looks good. But this version of the patch no longer has the SingleIssue flag support in the SubtargetEmitter. I thought you wanted that so the feature was more self-documenting. Your previous implementation looked ok to me. I just don't think you need any flag in the implementation of the model (MCSchedule.h).

Thanks Andrew. OK, I have added back 'SingleIssue' as syntactic sugar, but not in model, as recommended (if I got you right).
Best Regards, Javed.

LGTM. I think you should comment in TargetSchedule.td that SingleIssue is an alias for Begin/EndGroup.

This revision is now accepted and ready to land.Mar 26 2017, 1:32 PM

Closed by commit rL298885: Improve machine schedulers for in-order processors (authored by javed.absar). · Explain WhyMar 27 2017, 1:59 PM

This revision was automatically updated to reflect the committed changes.

Diff 91028

include/llvm/CodeGen/TargetSchedule.h

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	public:
}		}

/// \brief Identify the processor corresponding to the current subtarget.		/// \brief Identify the processor corresponding to the current subtarget.
unsigned getProcessorID() const { return SchedModel.getProcessorID(); }		unsigned getProcessorID() const { return SchedModel.getProcessorID(); }

/// \brief Maximum number of micro-ops that may be scheduled per cycle.		/// \brief Maximum number of micro-ops that may be scheduled per cycle.
unsigned getIssueWidth() const { return SchedModel.IssueWidth; }		unsigned getIssueWidth() const { return SchedModel.IssueWidth; }

		/// \brief Return true if instruction cannot be dual-issued with another.
		bool isSingleIssue(const MachineInstr *MI,
		const MCSchedClassDesc *SC = nullptr) const;

/// \brief Return the number of issue slots required for this MI.		/// \brief Return the number of issue slots required for this MI.
unsigned getNumMicroOps(const MachineInstr *MI,		unsigned getNumMicroOps(const MachineInstr *MI,
const MCSchedClassDesc *SC = nullptr) const;		const MCSchedClassDesc *SC = nullptr) const;

/// \brief Get the number of kinds of resources for this target.		/// \brief Get the number of kinds of resources for this target.
unsigned getNumProcResourceKinds() const {		unsigned getNumProcResourceKinds() const {
return SchedModel.getNumProcResourceKinds();		return SchedModel.getNumProcResourceKinds();
}		}
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	#endif
/// If UseDefaultDefLatency is false and no new machine sched model is		/// If UseDefaultDefLatency is false and no new machine sched model is
/// present this method falls back to TII->getInstrLatency with an empty		/// present this method falls back to TII->getInstrLatency with an empty
/// instruction itinerary (this is so we preserve the previous behavior of the		/// instruction itinerary (this is so we preserve the previous behavior of the
/// if converter after moving it to TargetSchedModel).		/// if converter after moving it to TargetSchedModel).
unsigned computeInstrLatency(const MachineInstr *MI,		unsigned computeInstrLatency(const MachineInstr *MI,
bool UseDefaultDefLatency = true) const;		bool UseDefaultDefLatency = true) const;
unsigned computeInstrLatency(unsigned Opcode) const;		unsigned computeInstrLatency(unsigned Opcode) const;


/// \brief Output dependency latency of a pair of defs of the same register.		/// \brief Output dependency latency of a pair of defs of the same register.
///		///
/// This is typically one cycle.		/// This is typically one cycle.
unsigned computeOutputLatency(const MachineInstr *DefMI, unsigned DefIdx,		unsigned computeOutputLatency(const MachineInstr *DefMI, unsigned DefIdx,
const MachineInstr *DepMI) const;		const MachineInstr *DepMI) const;
};		};

} // namespace llvm		} // namespace llvm

#endif		#endif

include/llvm/MC/MCSchedule.h

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	struct MCSchedClassDesc {			struct MCSchedClassDesc {
	static const unsigned short InvalidNumMicroOps = UINT16_MAX;			static const unsigned short InvalidNumMicroOps = UINT16_MAX;
	static const unsigned short VariantNumMicroOps = UINT16_MAX - 1;			static const unsigned short VariantNumMicroOps = UINT16_MAX - 1;

	#ifndef NDEBUG			#ifndef NDEBUG
	const char* Name;			const char* Name;
	#endif			#endif
	unsigned short NumMicroOps;			unsigned short NumMicroOps;
				bool SingleIssue;
	bool BeginGroup;			bool BeginGroup;
	bool EndGroup;			bool EndGroup;
	unsigned WriteProcResIdx; // First index into WriteProcResTable.			unsigned WriteProcResIdx; // First index into WriteProcResTable.
	unsigned NumWriteProcResEntries;			unsigned NumWriteProcResEntries;
	unsigned WriteLatencyIdx; // First index into WriteLatencyTable.			unsigned WriteLatencyIdx; // First index into WriteLatencyTable.
	unsigned NumWriteLatencyEntries;			unsigned NumWriteLatencyEntries;
	unsigned ReadAdvanceIdx; // First index into ReadAdvanceTable.			unsigned ReadAdvanceIdx; // First index into ReadAdvanceTable.
	unsigned NumReadAdvanceEntries;			unsigned NumReadAdvanceEntries;
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

include/llvm/Target/TargetSchedule.td

Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	class ProcWriteResources<list<ProcResourceKind> resources> {
list<int> ResourceCycles = [];		list<int> ResourceCycles = [];
int Latency = 1;		int Latency = 1;
int NumMicroOps = 1;		int NumMicroOps = 1;
bit BeginGroup = 0;		bit BeginGroup = 0;
bit EndGroup = 0;		bit EndGroup = 0;
// Allow a processor to mark some scheduling classes as unsupported		// Allow a processor to mark some scheduling classes as unsupported
// for stronger verification.		// for stronger verification.
bit Unsupported = 0;		bit Unsupported = 0;
		// Allow a processor to mark some scheduling classes as single-issue
		bit SingleIssue = 0;
SchedMachineModel SchedModel = ?;		SchedMachineModel SchedModel = ?;
}		}

// Define the resources and latency of a SchedWrite. This will be used		// Define the resources and latency of a SchedWrite. This will be used
// directly by targets that have no itinerary classes. In this case,		// directly by targets that have no itinerary classes. In this case,
// SchedWrite is defined by the target, while WriteResources is		// SchedWrite is defined by the target, while WriteResources is
// defined by the subtarget, and maps the SchedWrite to processor		// defined by the subtarget, and maps the SchedWrite to processor
// resources.		// resources.
▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

lib/CodeGen/MachineScheduler.cpp

Show First 20 Lines • Show All 1,127 Lines • ▼ Show 20 Lines	DEBUG(
if (EntrySU.getInstr() != nullptr)		if (EntrySU.getInstr() != nullptr)
EntrySU.dumpAll(this);		EntrySU.dumpAll(this);
for (const SUnit &SU : SUnits) {		for (const SUnit &SU : SUnits) {
SU.dumpAll(this);		SU.dumpAll(this);
if (ShouldTrackPressure) {		if (ShouldTrackPressure) {
dbgs() << " Pressure Diff : ";		dbgs() << " Pressure Diff : ";
getPressureDiff(&SU).dump(*TRI);		getPressureDiff(&SU).dump(*TRI);
}		}
		dbgs() << " Single Issue : ";
		if (SchedModel.isSingleIssue(SU.getInstr()))
		dbgs() << "true;";
		else
		dbgs() << "false;";
dbgs() << '\n';		dbgs() << '\n';
}		}
if (ExitSU.getInstr() != nullptr)		if (ExitSU.getInstr() != nullptr)
ExitSU.dumpAll(this);		ExitSU.dumpAll(this);
);		);
if (ViewMISchedDAGs) viewGraph();		if (ViewMISchedDAGs) viewGraph();

// Initialize ready queues now that the DAG and priority data are finalized.		// Initialize ready queues now that the DAG and priority data are finalized.
▲ Show 20 Lines • Show All 715 Lines • ▼ Show 20 Lines
/// can dispatch per cycle.		/// can dispatch per cycle.
///		///
/// TODO: Also check whether the SU must start a new group.		/// TODO: Also check whether the SU must start a new group.
bool SchedBoundary::checkHazard(SUnit *SU) {		bool SchedBoundary::checkHazard(SUnit *SU) {
if (HazardRec->isEnabled()		if (HazardRec->isEnabled()
&& HazardRec->getHazardType(SU) != ScheduleHazardRecognizer::NoHazard) {		&& HazardRec->getHazardType(SU) != ScheduleHazardRecognizer::NoHazard) {
return true;		return true;
}		}

		if ((CurrMOps > 0) && SchedModel->isSingleIssue(SU->getInstr())) {
		DEBUG(dbgs() << " SU(" << SU->NodeNum
		<< ") not issued (single issue instruction)\n");
		return true;
		}

unsigned uops = SchedModel->getNumMicroOps(SU->getInstr());		unsigned uops = SchedModel->getNumMicroOps(SU->getInstr());
if ((CurrMOps > 0) && (CurrMOps + uops > SchedModel->getIssueWidth())) {		if ((CurrMOps > 0) && (CurrMOps + uops > SchedModel->getIssueWidth())) {
DEBUG(dbgs() << " SU(" << SU->NodeNum << ") uops="		DEBUG(dbgs() << " SU(" << SU->NodeNum << ") uops="
<< SchedModel->getNumMicroOps(SU->getInstr()) << '\n');		<< SchedModel->getNumMicroOps(SU->getInstr()) << '\n');
return true;		return true;
}		}

if (SchedModel->hasInstrSchedModel() && SU->hasReservedResource) {		if (SchedModel->hasInstrSchedModel() && SU->hasReservedResource) {
const MCSchedClassDesc *SC = DAG->getSchedClass(SU);		const MCSchedClassDesc *SC = DAG->getSchedClass(SU);
for (TargetSchedModel::ProcResIter		for (TargetSchedModel::ProcResIter
PI = SchedModel->getWriteProcResBegin(SC),		PI = SchedModel->getWriteProcResBegin(SC),
PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
unsigned NRCycle = getNextResourceCycle(PI->ProcResourceIdx, PI->Cycles);		unsigned NRCycle = getNextResourceCycle(PI->ProcResourceIdx, PI->Cycles);
if (NRCycle > CurrCycle) {		if (NRCycle > CurrCycle) {
#ifndef NDEBUG		#ifndef NDEBUG
▲ Show 20 Lines • Show All 1,647 Lines • Show Last 20 Lines

lib/CodeGen/TargetSchedule.cpp

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	void TargetSchedModel::init(const MCSchedModel &sm,
}		}
MicroOpFactor = ResourceLCM / SchedModel.IssueWidth;		MicroOpFactor = ResourceLCM / SchedModel.IssueWidth;
for (unsigned Idx = 0; Idx < NumRes; ++Idx) {		for (unsigned Idx = 0; Idx < NumRes; ++Idx) {
unsigned NumUnits = SchedModel.getProcResource(Idx)->NumUnits;		unsigned NumUnits = SchedModel.getProcResource(Idx)->NumUnits;
ResourceFactors[Idx] = NumUnits ? (ResourceLCM / NumUnits) : 0;		ResourceFactors[Idx] = NumUnits ? (ResourceLCM / NumUnits) : 0;
}		}
}		}

		/// Returns true only if instruction is specified as single issue.
		bool TargetSchedModel::isSingleIssue(const MachineInstr *MI,
		const MCSchedClassDesc *SC) const {
		if (hasInstrSchedModel()) {
		if (!SC)
		SC = resolveSchedClass(MI);
		if (SC->isValid())
		return SC->SingleIssue;
		}
		return false;
		}

unsigned TargetSchedModel::getNumMicroOps(const MachineInstr *MI,		unsigned TargetSchedModel::getNumMicroOps(const MachineInstr *MI,
const MCSchedClassDesc *SC) const {		const MCSchedClassDesc *SC) const {
if (hasInstrItineraries()) {		if (hasInstrItineraries()) {
int UOps = InstrItins.getNumMicroOps(MI->getDesc().getSchedClass());		int UOps = InstrItins.getNumMicroOps(MI->getDesc().getSchedClass());
return (UOps >= 0) ? UOps : TII->getNumMicroOps(&InstrItins, *MI);		return (UOps >= 0) ? UOps : TII->getNumMicroOps(&InstrItins, *MI);
}		}
if (hasInstrSchedModel()) {		if (hasInstrSchedModel()) {
if (!SC)		if (!SC)
▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

lib/Target/ARM/ARMScheduleR52.td

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines


// Cortex-R52 specific SchedWrites for use with InstRW		// Cortex-R52 specific SchedWrites for use with InstRW
def R52WriteMAC : SchedWriteRes<[R52UnitMAC]> { let Latency = 4; }		def R52WriteMAC : SchedWriteRes<[R52UnitMAC]> { let Latency = 4; }
def R52WriteMACHi : SchedWriteRes<[R52UnitMAC]> {		def R52WriteMACHi : SchedWriteRes<[R52UnitMAC]> {
let Latency = 4; let NumMicroOps = 0;		let Latency = 4; let NumMicroOps = 0;
}		}
def R52WriteDIV : SchedWriteRes<[R52UnitDiv]> {		def R52WriteDIV : SchedWriteRes<[R52UnitDiv]> {
let Latency = 8; let ResourceCycles = [8]; // not pipelined		let Latency = 8; let ResourceCycles = [8]; // not pipelined
}		}
def R52WriteLd : SchedWriteRes<[R52UnitLd]> { let Latency = 4; }		def R52WriteLd : SchedWriteRes<[R52UnitLd]> { let Latency = 4; }
def R52WriteST : SchedWriteRes<[R52UnitLd]> { let Latency = 4; }		def R52WriteST : SchedWriteRes<[R52UnitLd]> { let Latency = 4; }
def R52WriteAdr : SchedWriteRes<[]> { let Latency = 0; }		def R52WriteAdr : SchedWriteRes<[]> { let Latency = 0; }
def R52WriteCC : SchedWriteRes<[]> { let Latency = 0; }		def R52WriteCC : SchedWriteRes<[]> { let Latency = 0; }
def R52WriteALU_EX1 : SchedWriteRes<[R52UnitALU]> { let Latency = 2; }		def R52WriteALU_EX1 : SchedWriteRes<[R52UnitALU]> { let Latency = 2; }
def R52WriteALU_EX2 : SchedWriteRes<[R52UnitALU]> { let Latency = 3; }		def R52WriteALU_EX2 : SchedWriteRes<[R52UnitALU]> { let Latency = 3; }
def R52WriteALU_WRI : SchedWriteRes<[R52UnitALU]> { let Latency = 4; }		def R52WriteALU_WRI : SchedWriteRes<[R52UnitALU]> { let Latency = 4; }
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	def : InstRW< [R52WriteALU_WRI, R52Read_ISS, R52Read_ISS, R52Read_ISS],
(instregex "USAD8", "t2USAD8", "tUSAD8","USADA8", "t2USADA8", "tUSADA8") >;		(instregex "USAD8", "t2USAD8", "tUSAD8","USADA8", "t2USADA8", "tUSADA8") >;

// Integer Multiply		// Integer Multiply
def : InstRW<[R52WriteMAC, R52Read_ISS, R52Read_ISS],		def : InstRW<[R52WriteMAC, R52Read_ISS, R52Read_ISS],
(instregex "MULS", "MUL", "SMMUL", "SMMULR", "SMULBB", "SMULBT",		(instregex "MULS", "MUL", "SMMUL", "SMMULR", "SMULBB", "SMULBT",
"SMULTB", "SMULTT", "SMULWB", "SMULWT", "SMUSD", "SMUSDXi", "t2MUL",		"SMULTB", "SMULTT", "SMULWB", "SMULWT", "SMUSD", "SMUSDXi", "t2MUL",
"t2SMMUL", "t2SMMULR", "t2SMULBB", "t2SMULBT", "t2SMULTB", "t2SMULTT",		"t2SMMUL", "t2SMMULR", "t2SMULBB", "t2SMULBT", "t2SMULTB", "t2SMULTT",
"t2SMULWB", "t2SMULWT", "t2SMUSD")>;		"t2SMULWB", "t2SMULWT", "t2SMUSD")>;

// Multiply Accumulate		// Multiply Accumulate
// Even for 64-bit accumulation (or Long), the single MAC is used (not ALUs).		// Even for 64-bit accumulation (or Long), the single MAC is used (not ALUs).
// The store pipeline is used partly for 64-bit operations.		// The store pipeline is used partly for 64-bit operations.
def : InstRW<[R52WriteMAC, R52Read_ISS, R52Read_ISS, R52Read_ISS],		def : InstRW<[R52WriteMAC, R52Read_ISS, R52Read_ISS, R52Read_ISS],
(instregex "MLAS", "MLA", "MLS", "SMMLA", "SMMLAR", "SMMLS", "SMMLSR",		(instregex "MLAS", "MLA", "MLS", "SMMLA", "SMMLAR", "SMMLS", "SMMLSR",
"t2MLA", "t2MLS", "t2MLAS", "t2SMMLA", "t2SMMLAR", "t2SMMLS", "t2SMMLSR",		"t2MLA", "t2MLS", "t2MLAS", "t2SMMLA", "t2SMMLAR", "t2SMMLS", "t2SMMLSR",
"SMUAD", "SMUADX", "t2SMUAD", "t2SMUADX",		"SMUAD", "SMUADX", "t2SMUAD", "t2SMUADX",
"SMLABB", "SMLABT", "SMLATB", "SMLATT", "SMLSD", "SMLSDX",		"SMLABB", "SMLABT", "SMLATB", "SMLATT", "SMLSD", "SMLSDX",
▲ Show 20 Lines • Show All 449 Lines • ▼ Show 20 Lines

// Vector Load/Stores. Can issue only in slot-0. Can dual-issue with		// Vector Load/Stores. Can issue only in slot-0. Can dual-issue with
// another instruction in slot-1, but only in the last issue.		// another instruction in slot-1, but only in the last issue.
def R52WriteVLD1Mem : SchedWriteRes<[R52UnitLd]> { let Latency = 5;}		def R52WriteVLD1Mem : SchedWriteRes<[R52UnitLd]> { let Latency = 5;}
def R52WriteVLD2Mem : SchedWriteRes<[R52UnitLd]> {		def R52WriteVLD2Mem : SchedWriteRes<[R52UnitLd]> {
let Latency = 6;		let Latency = 6;
let NumMicroOps = 3;		let NumMicroOps = 3;
let ResourceCycles = [2];		let ResourceCycles = [2];
		let SingleIssue = 1;
}		}
def R52WriteVLD3Mem : SchedWriteRes<[R52UnitLd]> {		def R52WriteVLD3Mem : SchedWriteRes<[R52UnitLd]> {
let Latency = 7;		let Latency = 7;
let NumMicroOps = 5;		let NumMicroOps = 5;
let ResourceCycles = [3];		let ResourceCycles = [3];
		let SingleIssue = 1;
}		}
def R52WriteVLD4Mem : SchedWriteRes<[R52UnitLd]> {		def R52WriteVLD4Mem : SchedWriteRes<[R52UnitLd]> {
let Latency = 8;		let Latency = 8;
let NumMicroOps = 7;		let NumMicroOps = 7;
let ResourceCycles = [4];		let ResourceCycles = [4];
		let SingleIssue = 1;
}		}
def R52WriteVST1Mem : SchedWriteRes<[R52UnitLd]> {		def R52WriteVST1Mem : SchedWriteRes<[R52UnitLd]> {
let Latency = 5;		let Latency = 5;
let NumMicroOps = 1;		let NumMicroOps = 1;
let ResourceCycles = [1];		let ResourceCycles = [1];
}		}
def R52WriteVST2Mem : SchedWriteRes<[R52UnitLd]> {		def R52WriteVST2Mem : SchedWriteRes<[R52UnitLd]> {
let Latency = 6;		let Latency = 6;
▲ Show 20 Lines • Show All 275 Lines • Show Last 20 Lines

test/CodeGen/ARM/single-issue-r52.mir

This file was added.

				# RUN: llc -o /dev/null %s -mtriple=arm-eabi -mcpu=cortex-r52 -run-pass machine-scheduler -enable-misched -debug-only=misched 2>&1 \| FileCheck %s --check-prefix=CHECK
				# REQUIRES: asserts
				--- \|
				; ModuleID = 'foo.ll'
				source_filename = "foo.ll"
				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "arm---eabi"

				%struct.__neon_int8x8x4_t = type { <8 x i8>, <8 x i8>, <8 x i8>, <8 x i8> }
				; Function Attrs: nounwind
				define <8 x i8> @foo(i8* %A) {
				%tmp1 = call %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4.v8i8.p0i8(i8* %A, i32 8)
				%tmp2 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 0
				%tmp3 = extractvalue %struct.__neon_int8x8x4_t %tmp1, 1
				%tmp4 = add <8 x i8> %tmp2, %tmp3
				ret <8 x i8> %tmp4
				}
				declare %struct.__neon_int8x8x4_t @llvm.arm.neon.vld4.v8i8.p0i8(i8*, i32)

				# CHECK: ******** MI Scheduling ********
				# CHECK: ScheduleDAGMILive::schedule starting
				# CHECK: SU(1): %vreg1<def> = VLD4d8Pseudo %vreg0, 8, pred:14, pred:%noreg; mem:LD32[%A](align=8) QQPR:%vreg1 GPR:%vreg0
				# CHECK: Latency : 8
				# CHECK: Single Issue : true;
				# CHECK: SU(2): %vreg4<def> = VADDv8i8 %vreg1:dsub_0, %vreg1:dsub_1, pred:14, pred:%noreg; DPR:%vreg4 QQPR:%vreg1
				# CHECK: Latency : 5
				# CHECK: Single Issue : false;
				# CHECK: SU(3): %vreg5<def>, %vreg6<def> = VMOVRRD %vreg4, pred:14, pred:%noreg; GPR:%vreg5,%vreg6 DPR:%vreg4
				# CHECK: Latency : 4
				# CHECK: Single Issue : false;

				...
				---
				name: foo
				alignment: 2
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr }
				- { id: 1, class: qqpr }
				- { id: 2, class: dpr }
				- { id: 3, class: dpr }
				- { id: 4, class: dpr }
				- { id: 5, class: gpr }
				- { id: 6, class: gpr }
				liveins:
				- { reg: '%r0', virtual-reg: '%0' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 0
				adjustsStack: false
				hasCalls: false
				maxCallFrameSize: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				body: \|
				bb.0 (%ir-block.0):
				liveins: %r0

				%0 = COPY %r0
				%1 = VLD4d8Pseudo %0, 8, 14, _ :: (load 32 from %ir.A, align 8)
				%4 = VADDv8i8 %1.dsub_0, %1.dsub_1, 14, _
				%5, %6 = VMOVRRD %4, 14, _
				%r0 = COPY %5
				%r1 = COPY %6
				BX_RET 14, _, implicit %r0, implicit killed %r1

				...

utils/TableGen/SubtargetEmitter.cpp

Show First 20 Lines • Show All 806 Lines • ▼ Show 20 Lines	void SubtargetEmitter::GenSchedClassTables(const CodeGenProcModel &ProcModel,
std::vector<MCSchedClassDesc> &SCTab = SchedTables.ProcSchedClasses.back();		std::vector<MCSchedClassDesc> &SCTab = SchedTables.ProcSchedClasses.back();
for (const CodeGenSchedClass &SC : SchedModels.schedClasses()) {		for (const CodeGenSchedClass &SC : SchedModels.schedClasses()) {
DEBUG(SC.dump(&SchedModels));		DEBUG(SC.dump(&SchedModels));

SCTab.resize(SCTab.size() + 1);		SCTab.resize(SCTab.size() + 1);
MCSchedClassDesc &SCDesc = SCTab.back();		MCSchedClassDesc &SCDesc = SCTab.back();
// SCDesc.Name is guarded by NDEBUG		// SCDesc.Name is guarded by NDEBUG
SCDesc.NumMicroOps = 0;		SCDesc.NumMicroOps = 0;
		SCDesc.SingleIssue = false;
SCDesc.BeginGroup = false;		SCDesc.BeginGroup = false;
SCDesc.EndGroup = false;		SCDesc.EndGroup = false;
SCDesc.WriteProcResIdx = 0;		SCDesc.WriteProcResIdx = 0;
SCDesc.WriteLatencyIdx = 0;		SCDesc.WriteLatencyIdx = 0;
SCDesc.ReadAdvanceIdx = 0;		SCDesc.ReadAdvanceIdx = 0;

// A Variant SchedClass has no resources of its own.		// A Variant SchedClass has no resources of its own.
bool HasVariants = false;		bool HasVariants = false;
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	for (unsigned W : Writes) {

// Mark the parent class as invalid for unsupported write types.		// Mark the parent class as invalid for unsupported write types.
if (WriteRes->getValueAsBit("Unsupported")) {		if (WriteRes->getValueAsBit("Unsupported")) {
SCDesc.NumMicroOps = MCSchedClassDesc::InvalidNumMicroOps;		SCDesc.NumMicroOps = MCSchedClassDesc::InvalidNumMicroOps;
break;		break;
}		}
WLEntry.Cycles += WriteRes->getValueAsInt("Latency");		WLEntry.Cycles += WriteRes->getValueAsInt("Latency");
SCDesc.NumMicroOps += WriteRes->getValueAsInt("NumMicroOps");		SCDesc.NumMicroOps += WriteRes->getValueAsInt("NumMicroOps");
		SCDesc.SingleIssue = SCDesc.SingleIssue \|\| WriteRes->getValueAsBit("SingleIssue");
SCDesc.BeginGroup \|= WriteRes->getValueAsBit("BeginGroup");		SCDesc.BeginGroup \|= WriteRes->getValueAsBit("BeginGroup");
SCDesc.EndGroup \|= WriteRes->getValueAsBit("EndGroup");		SCDesc.EndGroup \|= WriteRes->getValueAsBit("EndGroup");

// Create an entry for each ProcResource listed in WriteRes.		// Create an entry for each ProcResource listed in WriteRes.
RecVec PRVec = WriteRes->getValueAsListOfDefs("ProcResources");		RecVec PRVec = WriteRes->getValueAsListOfDefs("ProcResources");
std::vector<int64_t> Cycles =		std::vector<int64_t> Cycles =
WriteRes->getValueAsListOfInts("ResourceCycles");		WriteRes->getValueAsListOfInts("ResourceCycles");

▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	void SubtargetEmitter::EmitSchedClassTables(SchedClassTables &SchedTables,
for (CodeGenSchedModels::ProcIter PI = SchedModels.procModelBegin(),		for (CodeGenSchedModels::ProcIter PI = SchedModels.procModelBegin(),
PE = SchedModels.procModelEnd(); PI != PE; ++PI) {		PE = SchedModels.procModelEnd(); PI != PE; ++PI) {
if (!PI->hasInstrSchedModel())		if (!PI->hasInstrSchedModel())
continue;		continue;

std::vector<MCSchedClassDesc> &SCTab =		std::vector<MCSchedClassDesc> &SCTab =
SchedTables.ProcSchedClasses[1 + (PI - SchedModels.procModelBegin())];		SchedTables.ProcSchedClasses[1 + (PI - SchedModels.procModelBegin())];

OS << "\n// {Name, NumMicroOps, BeginGroup, EndGroup,"		OS << "\n// {Name, NumMicroOps, SingleIssue, BeginGroup, EndGroup,"
<< " WriteProcResIdx,#, WriteLatencyIdx,#, ReadAdvanceIdx,#}\n";		<< " WriteProcResIdx,#, WriteLatencyIdx,#, ReadAdvanceIdx,#}\n";
OS << "static const llvm::MCSchedClassDesc "		OS << "static const llvm::MCSchedClassDesc "
<< PI->ModelName << "SchedClasses[] = {\n";		<< PI->ModelName << "SchedClasses[] = {\n";

// The first class is always invalid. We no way to distinguish it except by		// The first class is always invalid. We no way to distinguish it except by
// name and position.		// name and position.
assert(SchedModels.getSchedClass(0).Name == "NoInstrModel"		assert(SchedModels.getSchedClass(0).Name == "NoInstrModel"
&& "invalid class not first");		&& "invalid class not first");
OS << " {DBGFIELD(\"InvalidSchedClass\") "		OS << " {DBGFIELD(\"InvalidSchedClass\") "
<< MCSchedClassDesc::InvalidNumMicroOps		<< MCSchedClassDesc::InvalidNumMicroOps
<< ", false, false, 0, 0, 0, 0, 0, 0},\n";		<< ", false, false, false, 0, 0, 0, 0, 0, 0},\n";

for (unsigned SCIdx = 1, SCEnd = SCTab.size(); SCIdx != SCEnd; ++SCIdx) {		for (unsigned SCIdx = 1, SCEnd = SCTab.size(); SCIdx != SCEnd; ++SCIdx) {
MCSchedClassDesc &MCDesc = SCTab[SCIdx];		MCSchedClassDesc &MCDesc = SCTab[SCIdx];
const CodeGenSchedClass &SchedClass = SchedModels.getSchedClass(SCIdx);		const CodeGenSchedClass &SchedClass = SchedModels.getSchedClass(SCIdx);
OS << " {DBGFIELD(\"" << SchedClass.Name << "\") ";		OS << " {DBGFIELD(\"" << SchedClass.Name << "\") ";
if (SchedClass.Name.size() < 18)		if (SchedClass.Name.size() < 18)
OS.indent(18 - SchedClass.Name.size());		OS.indent(18 - SchedClass.Name.size());
OS << MCDesc.NumMicroOps		OS << MCDesc.NumMicroOps
		<< ", " << (MCDesc.SingleIssue ? "true" : "false")
<< ", " << ( MCDesc.BeginGroup ? "true" : "false" )		<< ", " << ( MCDesc.BeginGroup ? "true" : "false" )
<< ", " << ( MCDesc.EndGroup ? "true" : "false" )		<< ", " << ( MCDesc.EndGroup ? "true" : "false" )
<< ", " << format("%2d", MCDesc.WriteProcResIdx)		<< ", " << format("%2d", MCDesc.WriteProcResIdx)
<< ", " << MCDesc.NumWriteProcResEntries		<< ", " << MCDesc.NumWriteProcResEntries
<< ", " << format("%2d", MCDesc.WriteLatencyIdx)		<< ", " << format("%2d", MCDesc.WriteLatencyIdx)
<< ", " << MCDesc.NumWriteLatencyEntries		<< ", " << MCDesc.NumWriteLatencyEntries
<< ", " << format("%2d", MCDesc.ReadAdvanceIdx)		<< ", " << format("%2d", MCDesc.ReadAdvanceIdx)
<< ", " << MCDesc.NumReadAdvanceEntries << "}";		<< ", " << MCDesc.NumReadAdvanceEntries << "}";
▲ Show 20 Lines • Show All 396 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Improve machine schedulers for in-order processors
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 91028

include/llvm/CodeGen/TargetSchedule.h

include/llvm/MC/MCSchedule.h

include/llvm/Target/TargetSchedule.td

lib/CodeGen/MachineScheduler.cpp

lib/CodeGen/TargetSchedule.cpp

lib/Target/ARM/ARMScheduleR52.td

test/CodeGen/ARM/single-issue-r52.mir

utils/TableGen/SubtargetEmitter.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Improve machine schedulers for in-order processorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 91028

include/llvm/CodeGen/TargetSchedule.h

include/llvm/MC/MCSchedule.h

include/llvm/Target/TargetSchedule.td

lib/CodeGen/MachineScheduler.cpp

lib/CodeGen/TargetSchedule.cpp

lib/Target/ARM/ARMScheduleR52.td

test/CodeGen/ARM/single-issue-r52.mir

utils/TableGen/SubtargetEmitter.cpp

Improve machine schedulers for in-order processors
ClosedPublic