This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Target/
-
llvm/
-
Target/
-
TargetSchedule.td
-
test/tools/llvm-mca/
-
tools/
-
llvm-mca/
-
AArch64/
-
CortexA57/
-
shifted-register.s
-
Cyclone/
-
register-offset.s
-
ARM/
-
unsupported-write-variant.s
-
utils/TableGen/
-
TableGen/
-
SubtargetEmitter.cpp

Differential D54648

[TableGen] Emit more variant transitions
ClosedPublic

Authored by evandro on Nov 16 2018, 1:34 PM.

Download Raw Diff

Details

Reviewers

andreadb

Commits

rG079bf4b7b4fc: [TableGen] Emit more variant transitions
rL347504: [TableGen] Emit more variant transitions

Summary

llvm-mca relies on the predicates to be based on MCSchedPredicate in order to resolve the scheduling for variant instructions. Otherwise, it aborts the building of the instruction model early.

However, the scheduling model emitter in TableGen gives up too soon, unless all processors use only such predicates.

In order to allow more processors to be used with llvm-mca, this patch emits scheduling transitions if any processor uses these predicates. The transition emitted for the processors using legacy predicates is the one specified with NoSchedPred, which is based on MCSchedPredicate.

Diff Detail

Repository: rL LLVM

Event Timeline

evandro created this revision.Nov 16 2018, 1:34 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptNov 16 2018, 1:34 PM

nhaehnle added a subscriber: nhaehnle.Nov 19 2018, 3:47 AM

Exclude test that does not apply anymore.

Herald added subscribers: gbedwell, javed.absar. · View Herald TranscriptNov 19 2018, 2:40 PM

evandro added a subscriber: mattd.Nov 19 2018, 2:41 PM

Hi Evandro,

Sorry for the very late reply.
I am back from holidays. I plan to review this patch and your other patch (D54777).

More in general: I am very happy to see that there is interest in llvm-mca from people working on aarch64. So, I will do my best to help you with reviews and suggesting alternative approaches.

llvm-mca relies on the predicates to be based on MCSchedPredicate in order to resolve the scheduling for variant instructions. Otherwise, it aborts the building of the instruction model early.
However, the scheduling model emitter in TableGen gives up too soon, unless all processors use only such predicates.

This is done intentionally.

Transitions are expanded into an if-then-else sequence, and scheduling predicates are predicate expressions used to check if a branch (of the if-then-else construct) is taken or not.
Predicates have to be processed strictly in sequence, and the order matters.

In order to allow more processors to be used with llvm-mca, this patch emits scheduling transitions if any processor uses these predicates. The transition emitted for the processors using legacy predicates is the one specified with NoSchedPred, which is based on MCSchedPredicate.

I don't think this is the right approach. As I mentioned before, predicates must be processed in sequence.

We cannot just emit "some" transitions and pretend that everything works fine; the resolved scheduling class would be potentially incorrect. We cannot assume that NoSchedPred is a good guess (i.e. default) for most scheduling write variants... NoSchedPred is just the last predicate of the sequence; we cannot make assumptions on how often it is used to resolve write variants. Using it as a default might be okay for your target processor in some cases. But it would lead to inaccurate results on other target processors.

In all honesty. If you really care about supporting llvm-mca for aarch64, then the right approach is to port existing scheduling predicates to the new MCSchedPredicate framework.
This patch sound more like a hack to get llvm-mca working for some Aarch64 instructions. I personally don't like it.

On the other hand, your D54777 seems to go the right direction. I plan to review it next.

-Andrea

The issue that I'm trying to avoid is that it's not enough for me to add predicates based on MCSchedPredicate for Exynos processors is other processors don't. Then, if an instruction that I model by using a variant schedule is also modeled by another processor, TableGen will emit no solution at all for the instruction. This patch, which I recognize is just an attempt, aims at allowing the proper solution for a processor using such predicates, while indeed resulting in a clumsy solution the scheduling of the same instruction for other processors.

The issue is that it's virtually impossible at the moment to model AArch64 without running on llvm-mca giving right up. I was thinking that instead of giving up, llvm-mca should resort to a reasobale default and highlight it in its result. I proposed NoSchedPred as this default, but, though we can discuss what the default should be, I think that no default does not make sense as is.

ormris added a subscriber: ormris.Nov 21 2018, 11:38 AM

In D54648#1305567, @evandro wrote:

The issue that I'm trying to avoid is that it's not enough for me to add predicates based on MCSchedPredicate for Exynos processors is other processors don't. Then, if an instruction that I model by using a variant schedule is also modeled by another processor, TableGen will emit no solution at all for the instruction. This patch, which I recognize is just an attempt, aims at allowing the proper solution for a processor using such predicates, while indeed resulting in a clumsy solution the scheduling of the same instruction for other processors.

The issue is that it's virtually impossible at the moment to model AArch64 without running on llvm-mca giving right up. I was thinking that instead of giving up, llvm-mca should resort to a reasobale default and highlight it in its result. I proposed NoSchedPred as this default, but, though we can discuss what the default should be, I think that no default does not make sense as is.

Right. I see what you mean.

I was under the impression that transitions would have been correctly expanded at least for your Exynos models.
However, it seems like the lack of MCSchedPredicates in other processor models is somehow causing problems.. If that's the case, then I think your patch makes sense.
Although it is not idea, at least, it unblocks your work.

Could you add a brief comment (a FIXME ) in the code, and the XFAIL test to explain what is happening (and why we are less strict with the check)?
We may want to revisit this in future. But for now, your patch is a good starting point.

Thanks!
-Andrea

This revision is now accepted and ready to land.Nov 21 2018, 12:28 PM

Closed by commit rL347504: [TableGen] Emit more variant transitions (authored by evandro). · Explain WhyNov 23 2018, 1:20 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetSchedule.td

4 lines

test/

tools/

llvm-mca/

AArch64/

CortexA57/

shifted-register.s

26 lines

Cyclone/

28 lines

ARM/

unsupported-write-variant.s

2 lines

utils/

TableGen/

SubtargetEmitter.cpp

17 lines

Diff 175140

llvm/trunk/include/llvm/Target/TargetSchedule.td

	Show First 20 Lines • Show All 367 Lines • ▼ Show 20 Lines
	// if-statement's expression. Available variables are MI, SchedModel,			// if-statement's expression. Available variables are MI, SchedModel,
	// and anything defined in a PredicateProlog.			// and anything defined in a PredicateProlog.
	//			//
	// SchedModel silences warnings but is ignored.			// SchedModel silences warnings but is ignored.
	class SchedPredicate<code pred> : SchedPredicateBase {			class SchedPredicate<code pred> : SchedPredicateBase {
	SchedMachineModel SchedModel = ?;			SchedMachineModel SchedModel = ?;
	code Predicate = pred;			code Predicate = pred;
	}			}

				// Define a predicate to be typically used as the default case in a
				// SchedVariant. It the SchedVariant does not use any other predicate based on
				// MCSchedPredicate, this is the default scheduling case used by llvm-mca.
	def NoSchedPred : MCSchedPredicate<TruePred>;			def NoSchedPred : MCSchedPredicate<TruePred>;

	// Associate a predicate with a list of SchedReadWrites. By default,			// Associate a predicate with a list of SchedReadWrites. By default,
	// the selected SchedReadWrites are still associated with a single			// the selected SchedReadWrites are still associated with a single
	// operand and assumed to execute sequentially with additive			// operand and assumed to execute sequentially with additive
	// latency. However, if the parent SchedWriteVariant or			// latency. However, if the parent SchedWriteVariant or
	// SchedReadVariant is marked "Variadic", then each Selected			// SchedReadVariant is marked "Variadic", then each Selected
	// SchedReadWrite is mapped in place to the instruction's variadic			// SchedReadWrite is mapped in place to the instruction's variadic
	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

llvm/trunk/test/tools/llvm-mca/AArch64/CortexA57/shifted-register.s

	# RUN: not llvm-mca -march=aarch64 -mcpu=cortex-a57 -resource-pressure=false < %s 2> %t			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: FileCheck --input-file %t %s			# RUN: llvm-mca -march=aarch64 -mcpu=cortex-a57 -resource-pressure=false < %s \| FileCheck %s

	add x0, x1, x2, lsl #3			add x0, x1, x2, lsl #3

	# CHECK: error			# CHECK: Iterations: 100
	# CHECK-SAME: unable to resolve scheduling class for write variant.			# CHECK-NEXT: Instructions: 100
				# CHECK-NEXT: Total Cycles: 53
				# CHECK-NEXT: Total uOps: 100

				# CHECK: Dispatch Width: 3
				# CHECK-NEXT: uOps Per Cycle: 1.89
				# CHECK-NEXT: IPC: 1.89
				# CHECK-NEXT: Block RThroughput: 0.5

				# CHECK: Instruction Info:
				# CHECK-NEXT: [1]: #uOps
				# CHECK-NEXT: [2]: Latency
				# CHECK-NEXT: [3]: RThroughput
				# CHECK-NEXT: [4]: MayLoad
				# CHECK-NEXT: [5]: MayStore
				# CHECK-NEXT: [6]: HasSideEffects (U)

				# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
				# CHECK-NEXT: 1 1 0.50 add x0, x1, x2, lsl #3

llvm/trunk/test/tools/llvm-mca/AArch64/Cyclone/register-offset.s

	# RUN: not llvm-mca -march=aarch64 -mcpu=cyclone -resource-pressure=false < %s 2> %t			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: FileCheck --input-file %t %s			# RUN: llvm-mca -march=aarch64 -mcpu=cyclone -resource-pressure=false < %s \| FileCheck %s

	ldr x7, [x1, #8]			ldr x7, [x1, #8]
	ldr x6, [x1, x2]			ldr x6, [x1, x2]
	ldr x4, [x1, x2, sxtx]			ldr x4, [x1, x2, sxtx]

	# CHECK: error			# CHECK: Iterations: 100
	# CHECK-SAME: unable to resolve scheduling class for write variant.			# CHECK-NEXT: Instructions: 300
				# CHECK-NEXT: Total Cycles: 156
				# CHECK-NEXT: Total uOps: 300

				# CHECK: Dispatch Width: 6
				# CHECK-NEXT: uOps Per Cycle: 1.92
				# CHECK-NEXT: IPC: 1.92
				# CHECK-NEXT: Block RThroughput: 1.5

				# CHECK: Instruction Info:
				# CHECK-NEXT: [1]: #uOps
				# CHECK-NEXT: [2]: Latency
				# CHECK-NEXT: [3]: RThroughput
				# CHECK-NEXT: [4]: MayLoad
				# CHECK-NEXT: [5]: MayStore
				# CHECK-NEXT: [6]: HasSideEffects (U)

				# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
				# CHECK-NEXT: 1 4 0.50 * ldr x7, [x1, #8]
				# CHECK-NEXT: 1 4 0.50 * ldr x6, [x1, x2]
				# CHECK-NEXT: 1 4 0.50 * ldr x4, [x1, x2, sxtx]

llvm/trunk/test/tools/llvm-mca/ARM/unsupported-write-variant.s

	# RUN: not llvm-mca -march=arm -mcpu=swift -all-views=false 2>&1 < %s \| FileCheck %s			# RUN: not llvm-mca -march=arm -mcpu=swift -all-views=false 2>&1 < %s \| FileCheck %s
				# D54648 results in this test to become valid.
				# XFAIL: *

	add r3, r1, r12, lsl #2			add r3, r1, r12, lsl #2

	# CHECK: error: unable to resolve scheduling class for write variant.			# CHECK: error: unable to resolve scheduling class for write variant.
	# CHECK-NEXT: note: instruction: add r3, r1, r12, lsl #2			# CHECK-NEXT: note: instruction: add r3, r1, r12, lsl #2

llvm/trunk/utils/TableGen/SubtargetEmitter.cpp

Show First 20 Lines • Show All 1,498 Lines • ▼ Show 20 Lines	void collectVariantClasses(const CodeGenSchedModels &SchedModels,
IdxVec &VariantClasses,		IdxVec &VariantClasses,
bool OnlyExpandMCInstPredicates) {		bool OnlyExpandMCInstPredicates) {
for (const CodeGenSchedClass &SC : SchedModels.schedClasses()) {		for (const CodeGenSchedClass &SC : SchedModels.schedClasses()) {
// Ignore non-variant scheduling classes.		// Ignore non-variant scheduling classes.
if (SC.Transitions.empty())		if (SC.Transitions.empty())
continue;		continue;

if (OnlyExpandMCInstPredicates) {		if (OnlyExpandMCInstPredicates) {
// Ignore this variant scheduling class if transitions don't uses any		// Ignore this variant scheduling class no transitions use any meaningful
// MCSchedPredicate definitions.		// MCSchedPredicate definitions.
if (!all_of(SC.Transitions, [](const CodeGenSchedTransition &T) {		if (!any_of(SC.Transitions, [](const CodeGenSchedTransition &T) {
return hasMCSchedPredicates(T);		return hasMCSchedPredicates(T);
}))		}))
continue;		continue;
}		}

VariantClasses.push_back(SC.Index);		VariantClasses.push_back(SC.Index);
}		}
}		}
Show All 37 Lines	for (unsigned VC : VariantClasses) {

OS << " case " << VC << ": // " << SC.Name << '\n';		OS << " case " << VC << ": // " << SC.Name << '\n';

PredicateExpander PE(Target);		PredicateExpander PE(Target);
PE.setByRef(false);		PE.setByRef(false);
PE.setExpandForMC(OnlyExpandMCInstPredicates);		PE.setExpandForMC(OnlyExpandMCInstPredicates);
for (unsigned PI : ProcIndices) {		for (unsigned PI : ProcIndices) {
OS << " ";		OS << " ";

// Emit a guard on the processor ID.		// Emit a guard on the processor ID.
if (PI != 0) {		if (PI != 0) {
OS << (OnlyExpandMCInstPredicates		OS << (OnlyExpandMCInstPredicates
? "if (CPUID == "		? "if (CPUID == "
: "if (SchedModel->getProcessorID() == ");		: "if (SchedModel->getProcessorID() == ");
OS << PI << ") ";		OS << PI << ") ";
OS << "{ // " << (SchedModels.procModelBegin() + PI)->ModelName << '\n';		OS << "{ // " << (SchedModels.procModelBegin() + PI)->ModelName << '\n';
}		}

// Now emit transitions associated with processor PI.		// Now emit transitions associated with processor PI.
for (const CodeGenSchedTransition &T : SC.Transitions) {		for (const CodeGenSchedTransition &T : SC.Transitions) {
if (PI != 0 && !count(T.ProcIndices, PI))		if (PI != 0 && !count(T.ProcIndices, PI))
continue;		continue;

		// Emit only transitions based on MCSchedPredicate, if it's the case.
		// At least the transition specified by NoSchedPred is emitted,
		// which becomes the default transition for those variants otherwise
		// not based on MCSchedPredicate.
		// FIXME: preferably, llvm-mca should instead assume a reasonable
		// default when a variant transition is not based on MCSchedPredicate
		// for a given processor.
		if (OnlyExpandMCInstPredicates && !hasMCSchedPredicates(T))
		continue;

PE.setIndentLevel(3);		PE.setIndentLevel(3);
emitPredicates(T, SchedModels.getSchedClass(T.ToClassIdx), PE, OS);		emitPredicates(T, SchedModels.getSchedClass(T.ToClassIdx), PE, OS);
}		}

OS << " }\n";		OS << " }\n";

if (PI == 0)		if (PI == 0)
break;		break;
}		}

if (SC.isInferred())		if (SC.isInferred())
OS << " return " << SC.Index << ";\n";		OS << " return " << SC.Index << ";\n";
OS << " break;\n";		OS << " break;\n";
}		}
▲ Show 20 Lines • Show All 325 Lines • Show Last 20 Lines