This is an archive of the discontinued LLVM Phabricator instance.

[MachineScheduler] Don't enforce some hazard checks pre-RA.
Needs ReviewPublic

Authored by jonpa on May 15 2018, 4:28 AM.

Download Raw Diff

Details

Reviewers

atrick
javed.absar

Summary

Currently, SchedBoundary::checkHazard() checks if an instruction begins or ends a group and if its micro-ops fits in the current group and may decide to put SU into Pending instead of Available based on this.

These are exact checks, but since so many things can happen with the code between pre-RA scheduling and final output, it seems far-fetched to put a priority pre-RA on this. It seems better to let such an instruction into Available and pick it if it e.g. helps with register pressure, and trust that post-RA scheduling will fix the grouping (which it most likely has to do anyway).

This patch introduces a new member SchedBoundary::IsPostRA and uses this to only do these checks post regalloc.

I also removed a TODO comment which seems already done.

This will be committed along with SystemZ changes soon, hopefully.

Diff Detail

Event Timeline

jonpa created this revision.May 15 2018, 4:28 AM

Herald added subscribers: JDevlieghere, javed.absar, MatzeB. · View Herald TranscriptMay 15 2018, 4:28 AM

Forgot to mention that as of now ~20 X86 tests fail, and also a few others.

javed.absar added inline comments.May 15 2018, 5:08 AM

lib/CodeGen/MachineScheduler.cpp
2256	Probably the comment here needs updating if we are going to change behaviour. There is another problem - "Single Issue " relies on BeginGroup/EndGroup and that is not just PostRA related.

jonpa added inline comments.May 15 2018, 6:40 AM

lib/CodeGen/MachineScheduler.cpp
2256	So, with our abstract machine as a model, what specifically does "Single Issue" imply? Is this on a particular target? To me this sounds like such an instruction should have a doubled value of NumMicroOps, which then pushes CurrCycle further. Per the basic premise of this patch, I would have hoped this is good enough pre-RA. Why not? Would it help to also guard this change so that only targets that do post-RA scheduling is affected?

dmgreen added a subscriber: dmgreen.May 16 2018, 1:31 AM

ping.

Patch updated so that the checks are done post-RA or pre-RA if target does not do post-RA scheduling. Now only two tests fail.

The ARM test seem to have tested that the cycle is bumped based on BeginGroup/EndGroup flags, which does not now happen.
The SystemZ test now contains one more spill, but I am hoping this disappears once the other scheduling patch for SystemZ also is applied.

On SystemZ, those instructions are quite rare, so it is typically possible to rearrange them post-RA in a constructive way while ignoring them pre-RA. Is this not true on ARM (or other targets)?

Does this patch make sense now? If not, could we add a target flag to control this?

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 29 2018, 4:21 AM

ping!

Still does not make sense to me to have hard checks for decoding constraints pre-RA if the target also does post-RA scheduling... Is this generally true, or would this have to wait for the day that SystemZ might get its own pre-RA sched strategy?

Revision Contents

Path

Size

include/

llvm/

CodeGen/

MachineScheduler.h

9 lines

lib/

CodeGen/

MachineScheduler.cpp

45 lines

test/

CodeGen/

ARM/

single-issue-r52.mir

4 lines

SystemZ/

int-conv-11.ll

4 lines

Diff 148880

include/llvm/CodeGen/MachineScheduler.h

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
#include "llvm/ADT/BitVector.h"		#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachinePassRegistry.h"		#include "llvm/CodeGen/MachinePassRegistry.h"
		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/RegisterPressure.h"		#include "llvm/CodeGen/RegisterPressure.h"
#include "llvm/CodeGen/ScheduleDAG.h"		#include "llvm/CodeGen/ScheduleDAG.h"
#include "llvm/CodeGen/ScheduleDAGInstrs.h"		#include "llvm/CodeGen/ScheduleDAGInstrs.h"
#include "llvm/CodeGen/ScheduleDAGMutation.h"		#include "llvm/CodeGen/ScheduleDAGMutation.h"
#include "llvm/CodeGen/TargetSchedule.h"		#include "llvm/CodeGen/TargetSchedule.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include <algorithm>		#include <algorithm>
▲ Show 20 Lines • Show All 581 Lines • ▼ Show 20 Lines	private:
SmallVector<unsigned, 16> ReservedCycles;		SmallVector<unsigned, 16> ReservedCycles;

#ifndef NDEBUG		#ifndef NDEBUG
// Remember the greatest possible stall as an upper bound on the number of		// Remember the greatest possible stall as an upper bound on the number of
// times we should retry the pending queue because of a hazard.		// times we should retry the pending queue because of a hazard.
unsigned MaxObservedStall;		unsigned MaxObservedStall;
#endif		#endif

		// Return true if checks should be done for the current issue group,
		// involving NumMicroOps, BeginGroup and EndGroup of the instruction.
		// Only do this post-RA if target enables late scheduling.
		bool checkIssueGroupConstraints() {
		return ((DAG->MF.getRegInfo().getNumVirtRegs() == 0) \|\|
		!DAG->MF.getSubtarget().enablePostRAScheduler());
		}

public:		public:
/// Pending queues extend the ready queues with the same ID and the		/// Pending queues extend the ready queues with the same ID and the
/// PendingFlag set.		/// PendingFlag set.
SchedBoundary(unsigned ID, const Twine &Name):		SchedBoundary(unsigned ID, const Twine &Name):
Available(ID, Name+".A"), Pending(ID << LogMaxQID, Name+".P") {		Available(ID, Name+".A"), Pending(ID << LogMaxQID, Name+".P") {
reset();		reset();
}		}

▲ Show 20 Lines • Show All 386 Lines • Show Last 20 Lines

lib/CodeGen/MachineScheduler.cpp

Show First 20 Lines • Show All 1,925 Lines • ▼ Show 20 Lines
/// ScheduleHazardRecognizer API. It is a fully general hazard recognizer that		/// ScheduleHazardRecognizer API. It is a fully general hazard recognizer that
/// supports highly complicated in-order reservation tables		/// supports highly complicated in-order reservation tables
/// (ScoreboardHazardRecognizer) and arbitraty target-specific logic.		/// (ScoreboardHazardRecognizer) and arbitraty target-specific logic.
///		///
/// The second is a streamlined mechanism that checks for hazards based on		/// The second is a streamlined mechanism that checks for hazards based on
/// simple counters that the scheduler itself maintains. It explicitly checks		/// simple counters that the scheduler itself maintains. It explicitly checks
/// for instruction dispatch limitations, including the number of micro-ops that		/// for instruction dispatch limitations, including the number of micro-ops that
/// can dispatch per cycle.		/// can dispatch per cycle.
///
/// TODO: Also check whether the SU must start a new group.
bool SchedBoundary::checkHazard(SUnit *SU) {		bool SchedBoundary::checkHazard(SUnit *SU) {
if (HazardRec->isEnabled()		if (HazardRec->isEnabled()
&& HazardRec->getHazardType(SU) != ScheduleHazardRecognizer::NoHazard) {		&& HazardRec->getHazardType(SU) != ScheduleHazardRecognizer::NoHazard) {
return true;		return true;
}		}

		if (checkIssueGroupConstraints()) {
unsigned uops = SchedModel->getNumMicroOps(SU->getInstr());		unsigned uops = SchedModel->getNumMicroOps(SU->getInstr());
if ((CurrMOps > 0) && (CurrMOps + uops > SchedModel->getIssueWidth())) {		if ((CurrMOps > 0) && (CurrMOps + uops > SchedModel->getIssueWidth())) {
LLVM_DEBUG(dbgs() << " SU(" << SU->NodeNum << ") uops="		LLVM_DEBUG(dbgs() << " SU(" << SU->NodeNum << ") uops="
<< SchedModel->getNumMicroOps(SU->getInstr()) << '\n');		<< SchedModel->getNumMicroOps(SU->getInstr()) << '\n');
return true;		return true;
}		}

if (CurrMOps > 0 &&		if (CurrMOps > 0 &&
((isTop() && SchedModel->mustBeginGroup(SU->getInstr())) \|\|		((isTop() && SchedModel->mustBeginGroup(SU->getInstr())) \|\|
(!isTop() && SchedModel->mustEndGroup(SU->getInstr())))) {		(!isTop() && SchedModel->mustEndGroup(SU->getInstr())))) {
LLVM_DEBUG(dbgs() << " hazard: SU(" << SU->NodeNum << ") must "		LLVM_DEBUG(dbgs() << " hazard: SU(" << SU->NodeNum << ") must "
<< (isTop() ? "begin" : "end") << " group\n");		<< (isTop() ? "begin" : "end") << " group\n");
return true;		return true;
}		}
		}

if (SchedModel->hasInstrSchedModel() && SU->hasReservedResource) {		if (SchedModel->hasInstrSchedModel() && SU->hasReservedResource) {
const MCSchedClassDesc *SC = DAG->getSchedClass(SU);		const MCSchedClassDesc *SC = DAG->getSchedClass(SU);
for (const MCWriteProcResEntry &PE :		for (const MCWriteProcResEntry &PE :
make_range(SchedModel->getWriteProcResBegin(SC),		make_range(SchedModel->getWriteProcResBegin(SC),
SchedModel->getWriteProcResEnd(SC))) {		SchedModel->getWriteProcResEnd(SC))) {
unsigned ResIdx = PE.ProcResourceIdx;		unsigned ResIdx = PE.ProcResourceIdx;
unsigned Cycles = PE.Cycles;		unsigned Cycles = PE.Cycles;
▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines	if (!isTop() && SU->isCall) {
HazardRec->Reset();		HazardRec->Reset();
}		}
HazardRec->EmitInstruction(SU);		HazardRec->EmitInstruction(SU);
}		}
// checkHazard should prevent scheduling multiple instructions per cycle that		// checkHazard should prevent scheduling multiple instructions per cycle that
// exceed the issue width.		// exceed the issue width.
const MCSchedClassDesc *SC = DAG->getSchedClass(SU);		const MCSchedClassDesc *SC = DAG->getSchedClass(SU);
unsigned IncMOps = SchedModel->getNumMicroOps(SU->getInstr());		unsigned IncMOps = SchedModel->getNumMicroOps(SU->getInstr());
assert(		assert((!checkIssueGroupConstraints() \|\| (CurrMOps == 0 \|\|
(CurrMOps == 0 \|\| (CurrMOps + IncMOps) <= SchedModel->getIssueWidth()) &&		(CurrMOps + IncMOps) <= SchedModel->getIssueWidth())) &&
"Cannot schedule this instruction's MicroOps in the current cycle.");		"Cannot schedule this instruction's MicroOps in the current cycle.");

unsigned ReadyCycle = (isTop() ? SU->TopReadyCycle : SU->BotReadyCycle);		unsigned ReadyCycle = (isTop() ? SU->TopReadyCycle : SU->BotReadyCycle);
LLVM_DEBUG(dbgs() << " Ready @" << ReadyCycle << "c\n");		LLVM_DEBUG(dbgs() << " Ready @" << ReadyCycle << "c\n");

unsigned NextCycle = CurrCycle;		unsigned NextCycle = CurrCycle;
switch (SchedModel->getMicroOpBufferSize()) {		switch (SchedModel->getMicroOpBufferSize()) {
case 0:		case 0:
assert(ReadyCycle <= CurrCycle && "Broken PendingQueue");		assert(ReadyCycle <= CurrCycle && "Broken PendingQueue");
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	IsResourceLimited =
getScheduledLatency());		getScheduledLatency());

// Update CurrMOps after calling bumpCycle to handle stalls, since bumpCycle		// Update CurrMOps after calling bumpCycle to handle stalls, since bumpCycle
// resets CurrMOps. Loop to handle instructions with more MOps than issue in		// resets CurrMOps. Loop to handle instructions with more MOps than issue in
// one cycle. Since we commonly reach the max MOps here, opportunistically		// one cycle. Since we commonly reach the max MOps here, opportunistically
// bump the cycle to avoid uselessly checking everything in the readyQ.		// bump the cycle to avoid uselessly checking everything in the readyQ.
CurrMOps += IncMOps;		CurrMOps += IncMOps;

// Bump the cycle count for issue group constraints.		// Bump the cycle count for issue group constraints.
		javed.absarUnsubmitted Not Done Reply Inline Actions Probably the comment here needs updating if we are going to change behaviour. There is another problem - "Single Issue " relies on BeginGroup/EndGroup and that is not just PostRA related. javed.absar: Probably the comment here needs updating if we are going to change behaviour. There is another…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions So, with our abstract machine as a model, what specifically does "Single Issue" imply? Is this on a particular target? To me this sounds like such an instruction should have a doubled value of NumMicroOps, which then pushes CurrCycle further. Per the basic premise of this patch, I would have hoped this is good enough pre-RA. Why not? Would it help to also guard this change so that only targets that do post-RA scheduling is affected? jonpa: So, with our abstract machine as a model, what specifically does "Single Issue" imply? Is this…
// This must be done after NextCycle has been adjust for all other stalls.		// This must be done after NextCycle has been adjust for all other stalls.
// Calling bumpCycle(X) will reduce CurrMOps by one issue group and set		// Calling bumpCycle(X) will reduce CurrMOps by one issue group and set
// currCycle to X.		// currCycle to X.
if ((isTop() && SchedModel->mustEndGroup(SU->getInstr())) \|\|		if (checkIssueGroupConstraints() &&
(!isTop() && SchedModel->mustBeginGroup(SU->getInstr()))) {		((isTop() && SchedModel->mustEndGroup(SU->getInstr())) \|\|
		(!isTop() && SchedModel->mustBeginGroup(SU->getInstr())))) {
LLVM_DEBUG(dbgs() << " Bump cycle to " << (isTop() ? "end" : "begin")		LLVM_DEBUG(dbgs() << " Bump cycle to " << (isTop() ? "end" : "begin")
<< " group\n");		<< " group\n");
bumpCycle(++NextCycle);		bumpCycle(++NextCycle);
}		}

while (CurrMOps >= SchedModel->getIssueWidth()) {		while (CurrMOps >= SchedModel->getIssueWidth()) {
LLVM_DEBUG(dbgs() << " *** Max MOps " << CurrMOps << " at cycle "		LLVM_DEBUG(dbgs() << " *** Max MOps " << CurrMOps << " at cycle "
<< CurrCycle << '\n');		<< CurrCycle << '\n');
bumpCycle(++NextCycle);		bumpCycle(++NextCycle);
}		}
LLVM_DEBUG(dumpScheduledState());		LLVM_DEBUG(dumpScheduledState());
▲ Show 20 Lines • Show All 1,371 Lines • Show Last 20 Lines

test/CodeGen/ARM/single-issue-r52.mir

	Show All 25 Lines
	# CHECK: SU(2): %4:dpr = VADDv8i8 %1.dsub_0:qqpr, %1.dsub_1:qqpr, 14, $noreg			# CHECK: SU(2): %4:dpr = VADDv8i8 %1.dsub_0:qqpr, %1.dsub_1:qqpr, 14, $noreg
	# CHECK: Latency : 5			# CHECK: Latency : 5
	# CHECK: Single Issue : false;			# CHECK: Single Issue : false;
	# CHECK: SU(3): %5:gpr, %6:gpr = VMOVRRD %4:dpr, 14, $noreg			# CHECK: SU(3): %5:gpr, %6:gpr = VMOVRRD %4:dpr, 14, $noreg
	# CHECK: Latency : 4			# CHECK: Latency : 4
	# CHECK: Single Issue : false;			# CHECK: Single Issue : false;

	# TOPDOWN: Scheduling SU(1) %1:qqpr = VLD4d8Pseudo			# TOPDOWN: Scheduling SU(1) %1:qqpr = VLD4d8Pseudo
	# TOPDOWN: Bump cycle to end group			# TOPDOWN: *** Max MOps 7 at cycle 3
	# TOPDOWN: Scheduling SU(2) %4:dpr = VADDv8i8			# TOPDOWN: Scheduling SU(2) %4:dpr = VADDv8i8

	# BOTTOMUP: Scheduling SU(2) %4:dpr = VADDv8i8			# BOTTOMUP: Scheduling SU(2) %4:dpr = VADDv8i8
	# BOTTOMUP: Scheduling SU(1) %1:qqpr = VLD4d8Pseudo			# BOTTOMUP: Scheduling SU(1) %1:qqpr = VLD4d8Pseudo
	# BOTTOMUP: Bump cycle to begin group			# BOTTOMUP: *** Max MOps 7 at cycle 19

	...			...
	---			---
	name: foo			name: foo
	alignment: 2			alignment: 2
	exposesReturnsTwice: false			exposesReturnsTwice: false
	legalized: false			legalized: false
	regBankSelected: false			regBankSelected: false
	Show All 39 Lines

test/CodeGen/SystemZ/int-conv-11.ll

; Test spills of zero extensions when high GR32s are available.		; Test spills of zero extensions when high GR32s are available.
;		;
; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z196 \| FileCheck %s		; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z196 \| FileCheck %s

; Test a case where we spill the source of at least one LLCRMux. We want		; Test a case where we spill the source of at least one LLCRMux. We want
; to use LLC(H) if possible.		; to use LLC(H) if possible.
define void @f1(i32 *%ptr) {		define void @f1(i32 *%ptr) {
; CHECK-LABEL: f1:		; CHECK-LABEL: f1:
; CHECK: llc{{h?}} {{%r[0-9]+}}, 16{{[37]}}(%r15)		; CHECK: llc{{h?}} {{%r[0-9]+}}, 1{{[67]}}{{[379]}}(%r15)
; CHECK: br %r14		; CHECK: br %r14
%val0 = load volatile i32 , i32 *%ptr		%val0 = load volatile i32 , i32 *%ptr
%val1 = load volatile i32 , i32 *%ptr		%val1 = load volatile i32 , i32 *%ptr
%val2 = load volatile i32 , i32 *%ptr		%val2 = load volatile i32 , i32 *%ptr
%val3 = load volatile i32 , i32 *%ptr		%val3 = load volatile i32 , i32 *%ptr
%val4 = load volatile i32 , i32 *%ptr		%val4 = load volatile i32 , i32 *%ptr
%val5 = load volatile i32 , i32 *%ptr		%val5 = load volatile i32 , i32 *%ptr
%val6 = load volatile i32 , i32 *%ptr		%val6 = load volatile i32 , i32 *%ptr
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	; CHECK: br %r14
store volatile i32 %ext31, i32 *%ptr		store volatile i32 %ext31, i32 *%ptr

ret void		ret void
}		}

; Same again with i16, which should use LLH(H).		; Same again with i16, which should use LLH(H).
define void @f2(i32 *%ptr) {		define void @f2(i32 *%ptr) {
; CHECK-LABEL: f2:		; CHECK-LABEL: f2:
; CHECK: llh{{h?}} {{%r[0-9]+}}, 16{{[26]}}(%r15)		; CHECK: llh{{h?}} {{%r[0-9]+}}, 1{{[67]}}{{[268]}}(%r15)
; CHECK: br %r14		; CHECK: br %r14
%val0 = load volatile i32 , i32 *%ptr		%val0 = load volatile i32 , i32 *%ptr
%val1 = load volatile i32 , i32 *%ptr		%val1 = load volatile i32 , i32 *%ptr
%val2 = load volatile i32 , i32 *%ptr		%val2 = load volatile i32 , i32 *%ptr
%val3 = load volatile i32 , i32 *%ptr		%val3 = load volatile i32 , i32 *%ptr
%val4 = load volatile i32 , i32 *%ptr		%val4 = load volatile i32 , i32 *%ptr
%val5 = load volatile i32 , i32 *%ptr		%val5 = load volatile i32 , i32 *%ptr
%val6 = load volatile i32 , i32 *%ptr		%val6 = load volatile i32 , i32 *%ptr
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MachineScheduler] Don't enforce some hazard checks pre-RA.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 148880

include/llvm/CodeGen/MachineScheduler.h

lib/CodeGen/MachineScheduler.cpp

test/CodeGen/ARM/single-issue-r52.mir

test/CodeGen/SystemZ/int-conv-11.ll

[MachineScheduler] Don't enforce some hazard checks pre-RA.
Needs ReviewPublic