This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/
-
llvm/
-
MC/
-
MCSchedule.h
-
Target/
-
TargetSchedule.td
-
lib/Target/X86/
-
Target/
-
X86/
-
X86ScheduleBdVer2.td
-
test/tools/llvm-mca/X86/BdVer2/
-
tools/
-
llvm-mca/
-
X86/
-
BdVer2/
-
load-throughput.s
-
store-throughput.s
-
tools/llvm-mca/
-
llvm-mca/
-
Views/
-
SchedulerStatistics.h
-
SchedulerStatistics.cpp
-
include/HardwareUnits/
-
HardwareUnits/
-
LSUnit.h
-
lib/
-
Context.cpp
-
HardwareUnits/
-
LSUnit.cpp
-
llvm-mca.cpp
-
utils/TableGen/
-
TableGen/
-
CodeGenSchedule.h
-
CodeGenSchedule.cpp
-
SubtargetEmitter.cpp

Differential D54957

[llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).
ClosedPublic

Authored by andreadb on Nov 27 2018, 9:51 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
evandro
courbet
gchatelet
mattd
lebedev.ri

Commits

rG373a4ccf6cdd: [llvm-mca][MC] Add the ability to declare which processor resources model…
rL347857: [llvm-mca][MC] Add the ability to declare which processor resources model…

Summary

This patch adds the ability to specify via tablegen which processor resources are load/store queue resources.

A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues.
Information about the load/store queue is collected at CodeGenSchedule stage, and analyzed by the SubtargetEmitter to initialize two new fields in struct MCExtraProcessorInfo named LoadQueueID and StoreQueueID.
Those two fields are identifiers for buffered resources used to describe the load queue and the store queue.
Field BufferSize is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores).

At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model.
If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources.

With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full".

About the minor differences in the -scheduler-stats view:
those differences are expected, because entries in the load/store queue are not released when at instruction issue stage. Those are only released at instruction executed stage. This is the main reason why in some tests the load/store queues become full before PdEx is full.

Let me know if okay to commit.

-Andrea

Diff Detail

Repository: rL LLVM

Event Timeline

andreadb created this revision.Nov 27 2018, 9:51 AM

Herald added subscribers: gbedwell, tschuett, javed.absar. · View Herald TranscriptNov 27 2018, 9:51 AM

andreadb added a reviewer: lebedev.ri.Nov 27 2018, 9:51 AM

LGTM, I'll let others weigh in. For the most part I think this patch is fine, as long as my questions do not point out any bugs.

include/llvm/Target/TargetSchedule.td
567 ↗	(On Diff #175509)	I'd suggest renaming the parameter something different, perhaps `<ProcResource PR>`. Reading `<ProcResource ProcResource>` caused my brain to hiccup a bit.
tools/llvm-mca/Views/SchedulerStatistics.cpp
49 ↗	(On Diff #175509)	I'm curious. What if the target does not have a LoadQueueID or StoreQueueID defined? It seems like SchedulerStats would not account for the uses in Usage vector in that case. Should we emit a warning if that is the case, or is it reasonable for SchedulerStats to ignore targets that do not define the Load/Store IDs?
87 ↗	(On Diff #175509)	If a target does not define a LQ/SQ resource ID, then SchedulerStats defaults to an ID of '0'. Is it possible for there to be a valid buffer with a BufferID of 0, or is that some special case? Our resource Buffer IDs come from an index defined in the processor model: `/// An index to the MCProcResourceDesc entry in the processor model.`
tools/llvm-mca/include/HardwareUnits/LSUnit.h
108 ↗	(On Diff #175509)	nit: Remove the wild '.' character in the middle of the sentence.
120 ↗	(On Diff #175509)	It feels weird to mention 'official hardware documentation' here, when the LSUnit should be somewhat target agnostic. Perhaps clarify what hardware documentation you are referring to.
tools/llvm-mca/lib/HardwareUnits/LSUnit.cpp
29 ↗	(On Diff #175509)	nit: In SchedulerStatistics.cpp you named the instance of a MCExtraProcessorInfo as 'EPI' and here you are calling it 'MEP'.

andreadb marked 6 inline comments as done.Nov 27 2018, 10:44 AM

andreadb added inline comments.

include/llvm/Target/TargetSchedule.td
567 ↗	(On Diff #175509)	Will do.
tools/llvm-mca/Views/SchedulerStatistics.cpp
49 ↗	(On Diff #175509)	On targets that don't specify a load/store queue, resource usage is normally updated by `onReservedBuffers` and `onReleasedBuffers`. Basically, this patch is not a visibile change for processors that don't declare a load/store queue.
87 ↗	(On Diff #175509)	ID zero is for the invalid resource. So, it cannot be a buffered resource.
tools/llvm-mca/include/HardwareUnits/LSUnit.h
108 ↗	(On Diff #175509)	Will do.
120 ↗	(On Diff #175509)	No documentation in particular... Processor vendors often describe it in their official documents. Maybe it is my English... if it is confusing, I can remove it.
tools/llvm-mca/lib/HardwareUnits/LSUnit.cpp
29 ↗	(On Diff #175509)	I will change it.

mattd added inline comments.Nov 27 2018, 10:48 AM

tools/llvm-mca/include/HardwareUnits/LSUnit.h
120 ↗	(On Diff #175509)	Ah, I see. I read that statement as being specific to one vendor/implementation, and not something more general.

mattd added inline comments.Nov 27 2018, 10:49 AM

tools/llvm-mca/Views/SchedulerStatistics.cpp
87 ↗	(On Diff #175509)	Ok, that's good to know. Thanks for the clarification.

Patch updated.

Addressed review comments.

Please let me know if okay to commit.
-Andrea

Thank you, looks good in general!
Some very contrived nits:

tools/llvm-mca/Views/SchedulerStatistics.cpp
35–45 ↗	(On Diff #175663)	Is there a tracking bug for this? Could you file one? And reference it here in the comment please.
tools/llvm-mca/include/HardwareUnits/LSUnit.h
130–139 ↗	(On Diff #175663)	Is there a bug tracking this?
196–198 ↗	(On Diff #175663)	Is there a bug tracking this?

In D54957#1311632, @lebedev.ri wrote:

Thank you, looks good in general!
Some very contrived nits:

Thanks for the review Roman!

tools/llvm-mca/Views/SchedulerStatistics.cpp
35–45 ↗	(On Diff #175663)	Sure. I am going to create a bug for it.
tools/llvm-mca/include/HardwareUnits/LSUnit.h
130–139 ↗	(On Diff #175663)	I definitely plan to raise a bug for the usage of field `LoadLatency`. I am not entirely sure about whether there is a good/reasonable way to address the last FIXME paragraph. If you want, I can raise a low-priority investigation bug.
196–198 ↗	(On Diff #175663)	There isn't. I am not sure if we want to revisit this in future. I suspect, itt require a bit more investigation. That being said, if you want, I can raise an investigation bugzilla for it.

Address review comments.

Raised PR39828, PR39829 and PR39830.
Added references to those upstream bugzillas in code comments as suggested by Roman.

Thanks for answering my questions.

This revision is now accepted and ready to land.Nov 28 2018, 11:17 AM

Closed by commit rL347857: [llvm-mca][MC] Add the ability to declare which processor resources model… (authored by adibiagio). · Explain WhyNov 29 2018, 4:18 AM

This revision was automatically updated to reflect the committed changes.

andreadb mentioned this in D66810: [Tblgen][MCA] Add the ability to mark groups as LoadQueue and StoreQueue. NFCI.Aug 27 2019, 10:36 AM

andreadb mentioned this in rG2f51a43f8c2b: [Tblgen][MCA] Add the ability to mark groups as LoadQueue and StoreQueue. NFCI.Aug 27 2019, 11:30 AM

Diffusion mentioned this in rL370091: [Tblgen][MCA] Add the ability to mark groups as LoadQueue and StoreQueue. NFCI.Aug 27 2019, 11:36 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

MC/

MCSchedule.h

2 lines

Target/

TargetSchedule.td

10 lines

lib/

Target/

X86/

X86ScheduleBdVer2.td

4 lines

test/

tools/

llvm-mca/

X86/

BdVer2/

load-throughput.s

90 lines

store-throughput.s

88 lines

tools/

llvm-mca/

Views/

SchedulerStatistics.h

12 lines

SchedulerStatistics.cpp

64 lines

include/

HardwareUnits/

LSUnit.h

52 lines

lib/

Context.cpp

4 lines

HardwareUnits/

LSUnit.cpp

17 lines

llvm-mca.cpp

4 lines

utils/

TableGen/

CodeGenSchedule.h

11 lines

CodeGenSchedule.cpp

32 lines

SubtargetEmitter.cpp

29 lines

Diff 175851

llvm/trunk/include/llvm/MC/MCSchedule.h

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	struct MCExtraProcessorInfo {
// Actual size of the reorder buffer in hardware.		// Actual size of the reorder buffer in hardware.
unsigned ReorderBufferSize;		unsigned ReorderBufferSize;
// Number of instructions retired per cycle.		// Number of instructions retired per cycle.
unsigned MaxRetirePerCycle;		unsigned MaxRetirePerCycle;
const MCRegisterFileDesc *RegisterFiles;		const MCRegisterFileDesc *RegisterFiles;
unsigned NumRegisterFiles;		unsigned NumRegisterFiles;
const MCRegisterCostEntry *RegisterCostTable;		const MCRegisterCostEntry *RegisterCostTable;
unsigned NumRegisterCostEntries;		unsigned NumRegisterCostEntries;
		unsigned LoadQueueID;
		unsigned StoreQueueID;
};		};

/// Machine model for scheduling, bundling, and heuristics.		/// Machine model for scheduling, bundling, and heuristics.
///		///
/// The machine model directly provides basic information about the		/// The machine model directly provides basic information about the
/// microarchitecture to the scheduler in the form of properties. It also		/// microarchitecture to the scheduler in the form of properties. It also
/// optionally refers to scheduler resource tables and itinerary		/// optionally refers to scheduler resource tables and itinerary
/// tables. Scheduler resource tables model the latency and cost for each		/// tables. Scheduler resource tables model the latency and cost for each
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Target/TargetSchedule.td

	Show First 20 Lines • Show All 555 Lines • ▼ Show 20 Lines
	// restrictions on the number of instructions retired per cycle".			// restrictions on the number of instructions retired per cycle".
	// Models can optionally specify up to one instance of RetireControlUnit per			// Models can optionally specify up to one instance of RetireControlUnit per
	// scheduling model.			// scheduling model.
	class RetireControlUnit<int bufferSize, int retirePerCycle> {			class RetireControlUnit<int bufferSize, int retirePerCycle> {
	int ReorderBufferSize = bufferSize;			int ReorderBufferSize = bufferSize;
	int MaxRetirePerCycle = retirePerCycle;			int MaxRetirePerCycle = retirePerCycle;
	SchedMachineModel SchedModel = ?;			SchedMachineModel SchedModel = ?;
	}			}

				// Base class for Load/StoreQueue. It is used to identify processor resources
				// which describe load/store queues in the LS unit.
				class MemoryQueue<ProcResource PR> {
				ProcResource QueueDescriptor = PR;
				SchedMachineModel SchedModel = ?;
				}

				class LoadQueue<ProcResource LDQueue> : MemoryQueue<LDQueue>;
				class StoreQueue<ProcResource STQueue> : MemoryQueue<STQueue>;

llvm/trunk/lib/Target/X86/X86ScheduleBdVer2.td

	Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	//			//

	let Super = PdAGLU01 in			let Super = PdAGLU01 in
	def PdLoad : ProcResource<2> {			def PdLoad : ProcResource<2> {
	// For Piledriver, the load queue is 40 entries deep.			// For Piledriver, the load queue is 40 entries deep.
	let BufferSize = 40;			let BufferSize = 40;
	}			}

				def PdLoadQueue : LoadQueue<PdLoad>;

	let Super = PdAGLU01 in			let Super = PdAGLU01 in
	def PdStore : ProcResource<1> {			def PdStore : ProcResource<1> {
	// For Piledriver, the store queue is 24 entries deep.			// For Piledriver, the store queue is 24 entries deep.
	let BufferSize = 24;			let BufferSize = 24;
	}			}

				def PdStoreQueue : StoreQueue<PdStore>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Integer Execution Units			// Integer Execution Units
	//			//

	def PdDiv : ProcResource<1>; // PdEX0; unpipelined integer division			def PdDiv : ProcResource<1>; // PdEX0; unpipelined integer division
	def PdCount : ProcResource<1>; // PdEX0; POPCNT, LZCOUNT			def PdCount : ProcResource<1>; // PdEX0; POPCNT, LZCOUNT

	def PdMul : ProcResource<1>; // PdEX1; integer multiplication			def PdMul : ProcResource<1>; // PdEX1; integer multiplication
	▲ Show 20 Lines • Show All 1,126 Lines • Show Last 20 Lines

llvm/trunk/test/tools/llvm-mca/X86/BdVer2/load-throughput.s

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 0.50 * movb (%rax), %spl			# CHECK-NEXT: 1 5 0.50 * movb (%rax), %spl
	# CHECK-NEXT: 1 5 0.50 * movb (%rcx), %bpl			# CHECK-NEXT: 1 5 0.50 * movb (%rcx), %bpl
	# CHECK-NEXT: 1 5 0.50 * movb (%rdx), %sil			# CHECK-NEXT: 1 5 0.50 * movb (%rdx), %sil
	# CHECK-NEXT: 1 5 0.50 * movb (%rbx), %dil			# CHECK-NEXT: 1 5 0.50 * movb (%rbx), %dil

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 161 (77.8%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 171 (82.6%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (12.6%)			# CHECK-NEXT: 0, 21 (10.1%)
	# CHECK-NEXT: 2, 162 (78.3%)			# CHECK-NEXT: 2, 172 (83.1%)
	# CHECK-NEXT: 4, 19 (9.2%)			# CHECK-NEXT: 4, 14 (6.8%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 7 (3.4%)			# CHECK-NEXT: 0, 7 (3.4%)
	# CHECK-NEXT: 2, 200 (96.6%)			# CHECK-NEXT: 2, 200 (96.6%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 35 40 40			# CHECK-NEXT: PdEX 27 30 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 35 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 0.50 * movw (%rax), %sp			# CHECK-NEXT: 1 5 0.50 * movw (%rax), %sp
	# CHECK-NEXT: 1 5 0.50 * movw (%rcx), %bp			# CHECK-NEXT: 1 5 0.50 * movw (%rcx), %bp
	# CHECK-NEXT: 1 5 0.50 * movw (%rdx), %si			# CHECK-NEXT: 1 5 0.50 * movw (%rdx), %si
	# CHECK-NEXT: 1 5 0.50 * movw (%rbx), %di			# CHECK-NEXT: 1 5 0.50 * movw (%rbx), %di

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 161 (77.8%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 171 (82.6%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (12.6%)			# CHECK-NEXT: 0, 21 (10.1%)
	# CHECK-NEXT: 2, 162 (78.3%)			# CHECK-NEXT: 2, 172 (83.1%)
	# CHECK-NEXT: 4, 19 (9.2%)			# CHECK-NEXT: 4, 14 (6.8%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 7 (3.4%)			# CHECK-NEXT: 0, 7 (3.4%)
	# CHECK-NEXT: 2, 200 (96.6%)			# CHECK-NEXT: 2, 200 (96.6%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 35 40 40			# CHECK-NEXT: PdEX 27 30 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 35 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 0.50 * movl (%rax), %esp			# CHECK-NEXT: 1 5 0.50 * movl (%rax), %esp
	# CHECK-NEXT: 1 5 0.50 * movl (%rcx), %ebp			# CHECK-NEXT: 1 5 0.50 * movl (%rcx), %ebp
	# CHECK-NEXT: 1 5 0.50 * movl (%rdx), %esi			# CHECK-NEXT: 1 5 0.50 * movl (%rdx), %esi
	# CHECK-NEXT: 1 5 0.50 * movl (%rbx), %edi			# CHECK-NEXT: 1 5 0.50 * movl (%rbx), %edi

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 161 (77.8%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 171 (82.6%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (12.6%)			# CHECK-NEXT: 0, 21 (10.1%)
	# CHECK-NEXT: 2, 162 (78.3%)			# CHECK-NEXT: 2, 172 (83.1%)
	# CHECK-NEXT: 4, 19 (9.2%)			# CHECK-NEXT: 4, 14 (6.8%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 7 (3.4%)			# CHECK-NEXT: 0, 7 (3.4%)
	# CHECK-NEXT: 2, 200 (96.6%)			# CHECK-NEXT: 2, 200 (96.6%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 35 40 40			# CHECK-NEXT: PdEX 27 30 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 35 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 0.50 * movq (%rax), %rsp			# CHECK-NEXT: 1 5 0.50 * movq (%rax), %rsp
	# CHECK-NEXT: 1 5 0.50 * movq (%rcx), %rbp			# CHECK-NEXT: 1 5 0.50 * movq (%rcx), %rbp
	# CHECK-NEXT: 1 5 0.50 * movq (%rdx), %rsi			# CHECK-NEXT: 1 5 0.50 * movq (%rdx), %rsi
	# CHECK-NEXT: 1 5 0.50 * movq (%rbx), %rdi			# CHECK-NEXT: 1 5 0.50 * movq (%rbx), %rdi

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 161 (77.8%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 171 (82.6%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (12.6%)			# CHECK-NEXT: 0, 21 (10.1%)
	# CHECK-NEXT: 2, 162 (78.3%)			# CHECK-NEXT: 2, 172 (83.1%)
	# CHECK-NEXT: 4, 19 (9.2%)			# CHECK-NEXT: 4, 14 (6.8%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 7 (3.4%)			# CHECK-NEXT: 0, 7 (3.4%)
	# CHECK-NEXT: 2, 200 (96.6%)			# CHECK-NEXT: 2, 200 (96.6%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 35 40 40			# CHECK-NEXT: PdEX 27 30 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 35 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 0.50 * movd (%rax), %mm0			# CHECK-NEXT: 1 5 0.50 * movd (%rax), %mm0
	# CHECK-NEXT: 1 5 0.50 * movd (%rcx), %mm1			# CHECK-NEXT: 1 5 0.50 * movd (%rcx), %mm1
	# CHECK-NEXT: 1 5 0.50 * movd (%rdx), %mm2			# CHECK-NEXT: 1 5 0.50 * movd (%rdx), %mm2
	# CHECK-NEXT: 1 5 0.50 * movd (%rbx), %mm3			# CHECK-NEXT: 1 5 0.50 * movd (%rbx), %mm3

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 161 (77.8%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 171 (82.6%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (12.6%)			# CHECK-NEXT: 0, 21 (10.1%)
	# CHECK-NEXT: 2, 162 (78.3%)			# CHECK-NEXT: 2, 172 (83.1%)
	# CHECK-NEXT: 4, 19 (9.2%)			# CHECK-NEXT: 4, 14 (6.8%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 7 (3.4%)			# CHECK-NEXT: 0, 7 (3.4%)
	# CHECK-NEXT: 2, 200 (96.6%)			# CHECK-NEXT: 2, 200 (96.6%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 35 40 40			# CHECK-NEXT: PdEX 27 30 40
	# CHECK-NEXT: PdFPU 35 40 64			# CHECK-NEXT: PdFPU 27 30 64
	# CHECK-NEXT: PdLoad 35 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 0.50 * movaps (%rax), %xmm0			# CHECK-NEXT: 1 5 0.50 * movaps (%rax), %xmm0
	# CHECK-NEXT: 1 5 0.50 * movaps (%rcx), %xmm1			# CHECK-NEXT: 1 5 0.50 * movaps (%rcx), %xmm1
	# CHECK-NEXT: 1 5 0.50 * movaps (%rdx), %xmm2			# CHECK-NEXT: 1 5 0.50 * movaps (%rdx), %xmm2
	# CHECK-NEXT: 1 5 0.50 * movaps (%rbx), %xmm3			# CHECK-NEXT: 1 5 0.50 * movaps (%rbx), %xmm3

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 161 (77.8%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 171 (82.6%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (12.6%)			# CHECK-NEXT: 0, 21 (10.1%)
	# CHECK-NEXT: 2, 162 (78.3%)			# CHECK-NEXT: 2, 172 (83.1%)
	# CHECK-NEXT: 4, 19 (9.2%)			# CHECK-NEXT: 4, 14 (6.8%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 7 (3.4%)			# CHECK-NEXT: 0, 7 (3.4%)
	# CHECK-NEXT: 2, 200 (96.6%)			# CHECK-NEXT: 2, 200 (96.6%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 35 40 40			# CHECK-NEXT: PdEX 27 30 40
	# CHECK-NEXT: PdFPU 35 40 64			# CHECK-NEXT: PdFPU 27 30 64
	# CHECK-NEXT: PdLoad 35 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 1 2 40			# CHECK-NEXT: PdEX 1 2 40
	# CHECK-NEXT: PdFPU 1 2 64			# CHECK-NEXT: PdFPU 1 2 64
	# CHECK-NEXT: PdLoad 1 2 40			# CHECK-NEXT: PdLoad 11 12 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/trunk/test/tools/llvm-mca/X86/BdVer2/store-throughput.s

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.00 * movb %spl, (%rax)			# CHECK-NEXT: 1 1 1.00 * movb %spl, (%rax)
	# CHECK-NEXT: 1 1 1.00 * movb %bpl, (%rcx)			# CHECK-NEXT: 1 1 1.00 * movb %bpl, (%rcx)
	# CHECK-NEXT: 1 1 1.00 * movb %sil, (%rdx)			# CHECK-NEXT: 1 1 1.00 * movb %sil, (%rdx)
	# CHECK-NEXT: 1 1 1.00 * movb %dil, (%rbx)			# CHECK-NEXT: 1 1 1.00 * movb %dil, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 369 (91.6%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 370 (91.8%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (6.5%)			# CHECK-NEXT: 0, 25 (6.2%)
	# CHECK-NEXT: 1, 369 (91.6%)			# CHECK-NEXT: 1, 370 (91.8%)
	# CHECK-NEXT: 3, 1 (0.2%)			# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 7 (1.7%)			# CHECK-NEXT: 4, 7 (1.7%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 3 (0.7%)			# CHECK-NEXT: 0, 3 (0.7%)
	# CHECK-NEXT: 1, 400 (99.3%)			# CHECK-NEXT: 1, 400 (99.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 24 40			# CHECK-NEXT: PdEX 22 23 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 22 24 24			# CHECK-NEXT: PdStore 23 24 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.00 * movw %sp, (%rax)			# CHECK-NEXT: 1 1 1.00 * movw %sp, (%rax)
	# CHECK-NEXT: 1 1 1.00 * movw %bp, (%rcx)			# CHECK-NEXT: 1 1 1.00 * movw %bp, (%rcx)
	# CHECK-NEXT: 1 1 1.00 * movw %si, (%rdx)			# CHECK-NEXT: 1 1 1.00 * movw %si, (%rdx)
	# CHECK-NEXT: 1 1 1.00 * movw %di, (%rbx)			# CHECK-NEXT: 1 1 1.00 * movw %di, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 369 (91.6%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 370 (91.8%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (6.5%)			# CHECK-NEXT: 0, 25 (6.2%)
	# CHECK-NEXT: 1, 369 (91.6%)			# CHECK-NEXT: 1, 370 (91.8%)
	# CHECK-NEXT: 3, 1 (0.2%)			# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 7 (1.7%)			# CHECK-NEXT: 4, 7 (1.7%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 3 (0.7%)			# CHECK-NEXT: 0, 3 (0.7%)
	# CHECK-NEXT: 1, 400 (99.3%)			# CHECK-NEXT: 1, 400 (99.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 24 40			# CHECK-NEXT: PdEX 22 23 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 22 24 24			# CHECK-NEXT: PdStore 23 24 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.00 * movl %esp, (%rax)			# CHECK-NEXT: 1 1 1.00 * movl %esp, (%rax)
	# CHECK-NEXT: 1 1 1.00 * movl %ebp, (%rcx)			# CHECK-NEXT: 1 1 1.00 * movl %ebp, (%rcx)
	# CHECK-NEXT: 1 1 1.00 * movl %esi, (%rdx)			# CHECK-NEXT: 1 1 1.00 * movl %esi, (%rdx)
	# CHECK-NEXT: 1 1 1.00 * movl %edi, (%rbx)			# CHECK-NEXT: 1 1 1.00 * movl %edi, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 369 (91.6%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 370 (91.8%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (6.5%)			# CHECK-NEXT: 0, 25 (6.2%)
	# CHECK-NEXT: 1, 369 (91.6%)			# CHECK-NEXT: 1, 370 (91.8%)
	# CHECK-NEXT: 3, 1 (0.2%)			# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 7 (1.7%)			# CHECK-NEXT: 4, 7 (1.7%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 3 (0.7%)			# CHECK-NEXT: 0, 3 (0.7%)
	# CHECK-NEXT: 1, 400 (99.3%)			# CHECK-NEXT: 1, 400 (99.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 24 40			# CHECK-NEXT: PdEX 22 23 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 22 24 24			# CHECK-NEXT: PdStore 23 24 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.00 * movq %rsp, (%rax)			# CHECK-NEXT: 1 1 1.00 * movq %rsp, (%rax)
	# CHECK-NEXT: 1 1 1.00 * movq %rbp, (%rcx)			# CHECK-NEXT: 1 1 1.00 * movq %rbp, (%rcx)
	# CHECK-NEXT: 1 1 1.00 * movq %rsi, (%rdx)			# CHECK-NEXT: 1 1 1.00 * movq %rsi, (%rdx)
	# CHECK-NEXT: 1 1 1.00 * movq %rdi, (%rbx)			# CHECK-NEXT: 1 1 1.00 * movq %rdi, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 369 (91.6%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 370 (91.8%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (6.5%)			# CHECK-NEXT: 0, 25 (6.2%)
	# CHECK-NEXT: 1, 369 (91.6%)			# CHECK-NEXT: 1, 370 (91.8%)
	# CHECK-NEXT: 3, 1 (0.2%)			# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 7 (1.7%)			# CHECK-NEXT: 4, 7 (1.7%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 3 (0.7%)			# CHECK-NEXT: 0, 3 (0.7%)
	# CHECK-NEXT: 1, 400 (99.3%)			# CHECK-NEXT: 1, 400 (99.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 24 40			# CHECK-NEXT: PdEX 22 23 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 22 24 24			# CHECK-NEXT: PdStore 23 24 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 2 1.00 * U movd %mm0, (%rax)			# CHECK-NEXT: 1 2 1.00 * U movd %mm0, (%rax)
	# CHECK-NEXT: 1 2 1.00 * U movd %mm1, (%rcx)			# CHECK-NEXT: 1 2 1.00 * U movd %mm1, (%rcx)
	# CHECK-NEXT: 1 2 1.00 * U movd %mm2, (%rdx)			# CHECK-NEXT: 1 2 1.00 * U movd %mm2, (%rdx)
	# CHECK-NEXT: 1 2 1.00 * U movd %mm3, (%rbx)			# CHECK-NEXT: 1 2 1.00 * U movd %mm3, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 745 (92.8%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 747 (93.0%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 423 (52.7%)			# CHECK-NEXT: 0, 422 (52.6%)
	# CHECK-NEXT: 1, 373 (46.5%)			# CHECK-NEXT: 1, 374 (46.6%)
	# CHECK-NEXT: 3, 1 (0.1%)			# CHECK-NEXT: 2, 1 (0.1%)
	# CHECK-NEXT: 4, 6 (0.7%)			# CHECK-NEXT: 4, 6 (0.7%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 403 (50.2%)			# CHECK-NEXT: 0, 403 (50.2%)
	# CHECK-NEXT: 1, 400 (49.8%)			# CHECK-NEXT: 1, 400 (49.8%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 23 24 40			# CHECK-NEXT: PdEX 22 23 40
	# CHECK-NEXT: PdFPU 23 24 64			# CHECK-NEXT: PdFPU 22 23 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 23 24 24			# CHECK-NEXT: PdStore 23 24 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.00 * movaps %xmm0, (%rax)			# CHECK-NEXT: 1 1 1.00 * movaps %xmm0, (%rax)
	# CHECK-NEXT: 1 1 1.00 * movaps %xmm1, (%rcx)			# CHECK-NEXT: 1 1 1.00 * movaps %xmm1, (%rcx)
	# CHECK-NEXT: 1 1 1.00 * movaps %xmm2, (%rdx)			# CHECK-NEXT: 1 1 1.00 * movaps %xmm2, (%rdx)
	# CHECK-NEXT: 1 1 1.00 * movaps %xmm3, (%rbx)			# CHECK-NEXT: 1 1 1.00 * movaps %xmm3, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 369 (91.6%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 370 (91.8%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 26 (6.5%)			# CHECK-NEXT: 0, 25 (6.2%)
	# CHECK-NEXT: 1, 369 (91.6%)			# CHECK-NEXT: 1, 370 (91.8%)
	# CHECK-NEXT: 3, 1 (0.2%)			# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 7 (1.7%)			# CHECK-NEXT: 4, 7 (1.7%)

	# CHECK: Schedulers - number of cycles where we saw N instructions issued:			# CHECK: Schedulers - number of cycles where we saw N instructions issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 3 (0.7%)			# CHECK-NEXT: 0, 3 (0.7%)
	# CHECK-NEXT: 1, 400 (99.3%)			# CHECK-NEXT: 1, 400 (99.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 24 40			# CHECK-NEXT: PdEX 22 23 40
	# CHECK-NEXT: PdFPU 22 24 64			# CHECK-NEXT: PdFPU 22 23 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 22 24 24			# CHECK-NEXT: PdStore 23 24 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 1 1 40			# CHECK-NEXT: PdEX 1 1 40
	# CHECK-NEXT: PdFPU 1 1 64			# CHECK-NEXT: PdFPU 1 1 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 1 1 24			# CHECK-NEXT: PdStore 2 2 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/trunk/tools/llvm-mca/Views/SchedulerStatistics.h

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	#include "llvm/MC/MCSubtargetInfo.h"			#include "llvm/MC/MCSubtargetInfo.h"
	#include <map>			#include <map>

	namespace llvm {			namespace llvm {
	namespace mca {			namespace mca {

	class SchedulerStatistics final : public View {			class SchedulerStatistics final : public View {
	const llvm::MCSchedModel &SM;			const llvm::MCSchedModel &SM;
				unsigned LQResourceID;
				unsigned SQResourceID;

	unsigned NumIssued;			unsigned NumIssued;
	unsigned NumCycles;			unsigned NumCycles;

				unsigned MostRecentLoadDispatched;
				unsigned MostRecentStoreDispatched;

	// Tracks the usage of a scheduler's queue.			// Tracks the usage of a scheduler's queue.
	struct BufferUsage {			struct BufferUsage {
	unsigned SlotsInUse;			unsigned SlotsInUse;
	unsigned MaxUsedSlots;			unsigned MaxUsedSlots;
	uint64_t CumulativeNumUsedSlots;			uint64_t CumulativeNumUsedSlots;
	};			};

	std::vector<unsigned> IssuedPerCycle;			std::vector<unsigned> IssuedPerCycle;
	std::vector<BufferUsage> Usage;			std::vector<BufferUsage> Usage;

	void updateHistograms();			void updateHistograms();
	void printSchedulerStats(llvm::raw_ostream &OS) const;			void printSchedulerStats(llvm::raw_ostream &OS) const;
	void printSchedulerUsage(llvm::raw_ostream &OS) const;			void printSchedulerUsage(llvm::raw_ostream &OS) const;

	public:			public:
	SchedulerStatistics(const llvm::MCSubtargetInfo &STI)			SchedulerStatistics(const llvm::MCSubtargetInfo &STI);
	: SM(STI.getSchedModel()), NumIssued(0), NumCycles(0),
	IssuedPerCycle(STI.getSchedModel().NumProcResourceKinds, 0),
	Usage(STI.getSchedModel().NumProcResourceKinds, {0, 0, 0}) {}

	void onEvent(const HWInstructionEvent &Event) override;			void onEvent(const HWInstructionEvent &Event) override;
	void onCycleBegin() override { NumCycles++; }			void onCycleBegin() override { NumCycles++; }
	void onCycleEnd() override { updateHistograms(); }			void onCycleEnd() override { updateHistograms(); }

	// Increases the number of used scheduler queue slots of every buffered			// Increases the number of used scheduler queue slots of every buffered
	// resource in the Buffers set.			// resource in the Buffers set.
	void onReservedBuffers(const InstRef &IR,			void onReservedBuffers(const InstRef &IR,
	llvm::ArrayRef<unsigned> Buffers) override;			llvm::ArrayRef<unsigned> Buffers) override;
	Show All 12 Lines

llvm/trunk/tools/llvm-mca/Views/SchedulerStatistics.cpp

	Show All 13 Lines

	#include "Views/SchedulerStatistics.h"			#include "Views/SchedulerStatistics.h"
	#include "llvm/Support/Format.h"			#include "llvm/Support/Format.h"
	#include "llvm/Support/FormattedStream.h"			#include "llvm/Support/FormattedStream.h"

	namespace llvm {			namespace llvm {
	namespace mca {			namespace mca {

				SchedulerStatistics::SchedulerStatistics(const llvm::MCSubtargetInfo &STI)
				: SM(STI.getSchedModel()), LQResourceID(0), SQResourceID(0), NumIssued(0),
				NumCycles(0), MostRecentLoadDispatched(~0U),
				MostRecentStoreDispatched(~0U),
				IssuedPerCycle(STI.getSchedModel().NumProcResourceKinds, 0),
				Usage(STI.getSchedModel().NumProcResourceKinds, {0, 0, 0}) {
				if (SM.hasExtraProcessorInfo()) {
				const MCExtraProcessorInfo &EPI = SM.getExtraProcessorInfo();
				LQResourceID = EPI.LoadQueueID;
				SQResourceID = EPI.StoreQueueID;
				}
				}

				// FIXME: This implementation works under the assumption that load/store queue
				// entries are reserved at 'instruction dispatched' stage, and released at
				// 'instruction executed' stage. This currently matches the behavior of LSUnit.
				//
				// The current design minimizes the number of events generated by the
				// Dispatch/Execute stages, at the cost of doing extra bookkeeping in method
				// `onEvent`. However, it introduces a subtle dependency between this view and
				// how the LSUnit works.
				//
				// In future we should add a new "memory queue" event type, so that we stop
				// making assumptions on how LSUnit internally works (See PR39828).
	void SchedulerStatistics::onEvent(const HWInstructionEvent &Event) {			void SchedulerStatistics::onEvent(const HWInstructionEvent &Event) {
	if (Event.Type == HWInstructionEvent::Issued)			if (Event.Type == HWInstructionEvent::Issued)
	++NumIssued;			++NumIssued;
				else if (Event.Type == HWInstructionEvent::Dispatched) {
				const Instruction &Inst = *Event.IR.getInstruction();
				const unsigned Index = Event.IR.getSourceIndex();
				if (LQResourceID && Inst.getDesc().MayLoad &&
				MostRecentLoadDispatched != Index) {
				Usage[LQResourceID].SlotsInUse++;
				MostRecentLoadDispatched = Index;
				}
				if (SQResourceID && Inst.getDesc().MayStore &&
				MostRecentStoreDispatched != Index) {
				Usage[SQResourceID].SlotsInUse++;
				MostRecentStoreDispatched = Index;
				}
				} else if (Event.Type == HWInstructionEvent::Executed) {
				const Instruction &Inst = *Event.IR.getInstruction();
				if (LQResourceID && Inst.getDesc().MayLoad) {
				assert(Usage[LQResourceID].SlotsInUse);
				Usage[LQResourceID].SlotsInUse--;
				}
				if (SQResourceID && Inst.getDesc().MayStore) {
				assert(Usage[SQResourceID].SlotsInUse);
				Usage[SQResourceID].SlotsInUse--;
				}
				}
	}			}

	void SchedulerStatistics::onReservedBuffers(const InstRef & /* unused */,			void SchedulerStatistics::onReservedBuffers(const InstRef & /* unused */,
	ArrayRef<unsigned> Buffers) {			ArrayRef<unsigned> Buffers) {
	for (const unsigned Buffer : Buffers) {			for (const unsigned Buffer : Buffers) {
	BufferUsage &BU = Usage[Buffer];			if (Buffer == LQResourceID \|\| Buffer == SQResourceID)
	BU.SlotsInUse++;			continue;
	BU.MaxUsedSlots = std::max(BU.MaxUsedSlots, BU.SlotsInUse);			Usage[Buffer].SlotsInUse++;
	}			}
	}			}

	void SchedulerStatistics::onReleasedBuffers(const InstRef & /* unused */,			void SchedulerStatistics::onReleasedBuffers(const InstRef & /* unused */,
	ArrayRef<unsigned> Buffers) {			ArrayRef<unsigned> Buffers) {
	for (const unsigned Buffer : Buffers)			for (const unsigned Buffer : Buffers) {
				if (Buffer == LQResourceID \|\| Buffer == SQResourceID)
				continue;
	Usage[Buffer].SlotsInUse--;			Usage[Buffer].SlotsInUse--;
	}			}
				}

	void SchedulerStatistics::updateHistograms() {			void SchedulerStatistics::updateHistograms() {
	for (BufferUsage &BU : Usage)			for (BufferUsage &BU : Usage) {
	BU.CumulativeNumUsedSlots += BU.SlotsInUse;			BU.CumulativeNumUsedSlots += BU.SlotsInUse;
				BU.MaxUsedSlots = std::max(BU.MaxUsedSlots, BU.SlotsInUse);
				}

	IssuedPerCycle[NumIssued]++;			IssuedPerCycle[NumIssued]++;
	NumIssued = 0;			NumIssued = 0;
	}			}

	void SchedulerStatistics::printSchedulerStats(raw_ostream &OS) const {			void SchedulerStatistics::printSchedulerStats(raw_ostream &OS) const {
	OS << "\n\nSchedulers - "			OS << "\n\nSchedulers - "
	<< "number of cycles where we saw N instructions issued:\n";			<< "number of cycles where we saw N instructions issued:\n";
	OS << "[# issued], [# cycles]\n";			OS << "[# issued], [# cycles]\n";
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/trunk/tools/llvm-mca/include/HardwareUnits/LSUnit.h

Show All 12 Lines
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TOOLS_LLVM_MCA_LSUNIT_H		#ifndef LLVM_TOOLS_LLVM_MCA_LSUNIT_H
#define LLVM_TOOLS_LLVM_MCA_LSUNIT_H		#define LLVM_TOOLS_LLVM_MCA_LSUNIT_H

#include "HardwareUnits/HardwareUnit.h"		#include "HardwareUnits/HardwareUnit.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
		#include "llvm/MC/MCSchedule.h"

namespace llvm {		namespace llvm {
namespace mca {		namespace mca {

class InstRef;		class InstRef;

/// A Load/Store Unit implementing a load and store queues.		/// A Load/Store Unit implementing a load and store queues.
///		///
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	class LSUnit : public HardwareUnit {

// Store queue size.		// Store queue size.
// SQ_Size == 0 means that there are infinite slots in the store queue.		// SQ_Size == 0 means that there are infinite slots in the store queue.
unsigned SQ_Size;		unsigned SQ_Size;

// If true, loads will never alias with stores. This is the default.		// If true, loads will never alias with stores. This is the default.
bool NoAlias;		bool NoAlias;

		// When a `MayLoad` instruction is dispatched to the schedulers for execution,
		// the LSUnit reserves an entry in the `LoadQueue` for it.
		//
		// LoadQueue keeps track of all the loads that are in-flight. A load
		// instruction is eventually removed from the LoadQueue when it reaches
		// completion stage. That means, a load leaves the queue whe it is 'executed',
		// and its value can be forwarded on the data path to outside units.
		//
		// This class doesn't know about the latency of a load instruction. So, it
		// conservatively/pessimistically assumes that the latency of a load opcode
		// matches the instruction latency.
		//
		// FIXME: In the absence of cache misses (i.e. L1I/L1D/iTLB/dTLB hits/misses),
		// and load/store conflicts, the latency of a load is determined by the depth
		// of the load pipeline. So, we could use field `LoadLatency` in the
		// MCSchedModel to model that latency.
		// Field `LoadLatency` often matches the so-called 'load-to-use' latency from
		// L1D, and it usually already accounts for any extra latency due to data
		// forwarding.
		// When doing throughput analysis, `LoadLatency` is likely to
		// be a better predictor of load latency than instruction latency. This is
		// particularly true when simulating code with temporal/spatial locality of
		// memory accesses.
		// Using `LoadLatency` (instead of the instruction latency) is also expected
		// to improve the load queue allocation for long latency instructions with
		// folded memory operands (See PR39829).
		//
		// FIXME: On some processors, load/store operations are split into multiple
		// uOps. For example, X86 AMD Jaguar natively supports 128-bit data types, but
		// not 256-bit data types. So, a 256-bit load is effectively split into two
		// 128-bit loads, and each split load consumes one 'LoadQueue' entry. For
		// simplicity, this class optimistically assumes that a load instruction only
		// consumes one entry in the LoadQueue. Similarly, store instructions only
		// consume a single entry in the StoreQueue.
		// In future, we should reassess the quality of this design, and consider
		// alternative approaches that let instructions specify the number of
		// load/store queue entries which they consume at dispatch stage (See
		// PR39830).
SmallSet<unsigned, 16> LoadQueue;		SmallSet<unsigned, 16> LoadQueue;
SmallSet<unsigned, 16> StoreQueue;		SmallSet<unsigned, 16> StoreQueue;

void assignLQSlot(unsigned Index);		void assignLQSlot(unsigned Index);
void assignSQSlot(unsigned Index);		void assignSQSlot(unsigned Index);
bool isReadyNoAlias(unsigned Index) const;		bool isReadyNoAlias(unsigned Index) const;

// An instruction that both 'mayStore' and 'HasUnmodeledSideEffects' is		// An instruction that both 'mayStore' and 'HasUnmodeledSideEffects' is
// conservatively treated as a store barrier. It forces older store to be		// conservatively treated as a store barrier. It forces older store to be
// executed before newer stores are issued.		// executed before newer stores are issued.
SmallSet<unsigned, 8> StoreBarriers;		SmallSet<unsigned, 8> StoreBarriers;

// An instruction that both 'MayLoad' and 'HasUnmodeledSideEffects' is		// An instruction that both 'MayLoad' and 'HasUnmodeledSideEffects' is
// conservatively treated as a load barrier. It forces older loads to execute		// conservatively treated as a load barrier. It forces older loads to execute
// before newer loads are issued.		// before newer loads are issued.
SmallSet<unsigned, 8> LoadBarriers;		SmallSet<unsigned, 8> LoadBarriers;

bool isSQEmpty() const { return StoreQueue.empty(); }		bool isSQEmpty() const { return StoreQueue.empty(); }
bool isLQEmpty() const { return LoadQueue.empty(); }		bool isLQEmpty() const { return LoadQueue.empty(); }
bool isSQFull() const { return SQ_Size != 0 && StoreQueue.size() == SQ_Size; }		bool isSQFull() const { return SQ_Size != 0 && StoreQueue.size() == SQ_Size; }
bool isLQFull() const { return LQ_Size != 0 && LoadQueue.size() == LQ_Size; }		bool isLQFull() const { return LQ_Size != 0 && LoadQueue.size() == LQ_Size; }

public:		public:
LSUnit(unsigned LQ = 0, unsigned SQ = 0, bool AssumeNoAlias = false)		LSUnit(const MCSchedModel &SM, unsigned LQ = 0, unsigned SQ = 0,
: LQ_Size(LQ), SQ_Size(SQ), NoAlias(AssumeNoAlias) {}		bool AssumeNoAlias = false);

#ifndef NDEBUG		#ifndef NDEBUG
void dump() const;		void dump() const;
#endif		#endif

enum Status { LSU_AVAILABLE = 0, LSU_LQUEUE_FULL, LSU_SQUEUE_FULL };		enum Status { LSU_AVAILABLE = 0, LSU_LQUEUE_FULL, LSU_SQUEUE_FULL };

// Returns LSU_AVAILABLE if there are enough load/store queue entries to serve		// Returns LSU_AVAILABLE if there are enough load/store queue entries to serve
Show All 9 Lines	#endif
// By default, rules are:		// By default, rules are:
// 1. A store may not pass a previous store.		// 1. A store may not pass a previous store.
// 2. A load may not pass a previous store unless flag 'NoAlias' is set.		// 2. A load may not pass a previous store unless flag 'NoAlias' is set.
// 3. A load may pass a previous load.		// 3. A load may pass a previous load.
// 4. A store may not pass a previous load (regardless of flag 'NoAlias').		// 4. A store may not pass a previous load (regardless of flag 'NoAlias').
// 5. A load has to wait until an older load barrier is fully executed.		// 5. A load has to wait until an older load barrier is fully executed.
// 6. A store has to wait until an older store barrier is fully executed.		// 6. A store has to wait until an older store barrier is fully executed.
virtual bool isReady(const InstRef &IR) const;		virtual bool isReady(const InstRef &IR) const;

		// Load and store instructions are tracked by their corresponding queues from
		// dispatch until the "instruction executed" event.
		// Only when a load instruction reaches the 'Executed' stage, its value
		// becomes available to the users. At that point, the load no longer needs to
		// be tracked by the load queue.
		// FIXME: For simplicity, we optimistically assume a similar behavior for
		// store instructions. In practice, store operation don't tend to leave the
		// store queue until they reach the 'Retired' stage (See PR39830).
void onInstructionExecuted(const InstRef &IR);		void onInstructionExecuted(const InstRef &IR);
};		};

} // namespace mca		} // namespace mca
} // namespace llvm		} // namespace llvm

#endif		#endif

llvm/trunk/tools/llvm-mca/lib/Context.cpp

	Show All 29 Lines
	std::unique_ptr<Pipeline>			std::unique_ptr<Pipeline>
	Context::createDefaultPipeline(const PipelineOptions &Opts, InstrBuilder &IB,			Context::createDefaultPipeline(const PipelineOptions &Opts, InstrBuilder &IB,
	SourceMgr &SrcMgr) {			SourceMgr &SrcMgr) {
	const MCSchedModel &SM = STI.getSchedModel();			const MCSchedModel &SM = STI.getSchedModel();

	// Create the hardware units defining the backend.			// Create the hardware units defining the backend.
	auto RCU = llvm::make_unique<RetireControlUnit>(SM);			auto RCU = llvm::make_unique<RetireControlUnit>(SM);
	auto PRF = llvm::make_unique<RegisterFile>(SM, MRI, Opts.RegisterFileSize);			auto PRF = llvm::make_unique<RegisterFile>(SM, MRI, Opts.RegisterFileSize);
	auto LSU = llvm::make_unique<LSUnit>(Opts.LoadQueueSize, Opts.StoreQueueSize,			auto LSU = llvm::make_unique<LSUnit>(SM, Opts.LoadQueueSize,
	Opts.AssumeNoAlias);			Opts.StoreQueueSize, Opts.AssumeNoAlias);
	auto HWS = llvm::make_unique<Scheduler>(SM, LSU.get());			auto HWS = llvm::make_unique<Scheduler>(SM, LSU.get());

	// Create the pipeline stages.			// Create the pipeline stages.
	auto Fetch = llvm::make_unique<EntryStage>(SrcMgr);			auto Fetch = llvm::make_unique<EntryStage>(SrcMgr);
	auto Dispatch = llvm::make_unique<DispatchStage>(STI, MRI, Opts.DispatchWidth,			auto Dispatch = llvm::make_unique<DispatchStage>(STI, MRI, Opts.DispatchWidth,
	RCU, PRF);			RCU, PRF);
	auto Execute = llvm::make_unique<ExecuteStage>(*HWS);			auto Execute = llvm::make_unique<ExecuteStage>(*HWS);
	auto Retire = llvm::make_unique<RetireStage>(RCU, PRF);			auto Retire = llvm::make_unique<RetireStage>(RCU, PRF);
	Show All 18 Lines

llvm/trunk/tools/llvm-mca/lib/HardwareUnits/LSUnit.cpp

	Show All 16 Lines
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	#define DEBUG_TYPE "llvm-mca"			#define DEBUG_TYPE "llvm-mca"

	namespace llvm {			namespace llvm {
	namespace mca {			namespace mca {

				LSUnit::LSUnit(const MCSchedModel &SM, unsigned LQ, unsigned SQ,
				bool AssumeNoAlias)
				: LQ_Size(LQ), SQ_Size(SQ), NoAlias(AssumeNoAlias) {
				if (SM.hasExtraProcessorInfo()) {
				const MCExtraProcessorInfo &EPI = SM.getExtraProcessorInfo();
				if (!LQ_Size && EPI.LoadQueueID) {
				const MCProcResourceDesc &LdQDesc = *SM.getProcResource(EPI.LoadQueueID);
				LQ_Size = LdQDesc.BufferSize;
				}

				if (!SQ_Size && EPI.StoreQueueID) {
				const MCProcResourceDesc &StQDesc = *SM.getProcResource(EPI.StoreQueueID);
				SQ_Size = StQDesc.BufferSize;
				}
				}
				}

	#ifndef NDEBUG			#ifndef NDEBUG
	void LSUnit::dump() const {			void LSUnit::dump() const {
	dbgs() << "[LSUnit] LQ_Size = " << LQ_Size << '\n';			dbgs() << "[LSUnit] LQ_Size = " << LQ_Size << '\n';
	dbgs() << "[LSUnit] SQ_Size = " << SQ_Size << '\n';			dbgs() << "[LSUnit] SQ_Size = " << SQ_Size << '\n';
	dbgs() << "[LSUnit] NextLQSlotIdx = " << LoadQueue.size() << '\n';			dbgs() << "[LSUnit] NextLQSlotIdx = " << LoadQueue.size() << '\n';
	dbgs() << "[LSUnit] NextSQSlotIdx = " << StoreQueue.size() << '\n';			dbgs() << "[LSUnit] NextSQSlotIdx = " << StoreQueue.size() << '\n';
	}			}
	#endif			#endif
	▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

llvm/trunk/tools/llvm-mca/llvm-mca.cpp

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines

	static cl::opt<bool>			static cl::opt<bool>
	AssumeNoAlias("noalias",			AssumeNoAlias("noalias",
	cl::desc("If set, assume that loads and stores do not alias"),			cl::desc("If set, assume that loads and stores do not alias"),
	cl::cat(ToolOptions), cl::init(true));			cl::cat(ToolOptions), cl::init(true));

	static cl::opt<unsigned>			static cl::opt<unsigned>
	LoadQueueSize("lqueue",			LoadQueueSize("lqueue",
	cl::desc("Size of the load queue (unbound by default)"),			cl::desc("Size of the load queue"),
	cl::cat(ToolOptions), cl::init(0));			cl::cat(ToolOptions), cl::init(0));

	static cl::opt<unsigned>			static cl::opt<unsigned>
	StoreQueueSize("squeue",			StoreQueueSize("squeue",
	cl::desc("Size of the store queue (unbound by default)"),			cl::desc("Size of the store queue"),
	cl::cat(ToolOptions), cl::init(0));			cl::cat(ToolOptions), cl::init(0));

	static cl::opt<bool>			static cl::opt<bool>
	PrintInstructionTables("instruction-tables",			PrintInstructionTables("instruction-tables",
	cl::desc("Print instruction tables"),			cl::desc("Print instruction tables"),
	cl::cat(ToolOptions), cl::init(false));			cl::cat(ToolOptions), cl::init(false));

	static cl::opt<bool> PrintInstructionInfoView(			static cl::opt<bool> PrintInstructionInfoView(
	▲ Show 20 Lines • Show All 337 Lines • Show Last 20 Lines

llvm/trunk/utils/TableGen/CodeGenSchedule.h

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	struct CodeGenProcModel {
RecVec ProcResourceDefs;		RecVec ProcResourceDefs;

// List of Register Files.		// List of Register Files.
std::vector<CodeGenRegisterFile> RegisterFiles;		std::vector<CodeGenRegisterFile> RegisterFiles;

// Optional Retire Control Unit definition.		// Optional Retire Control Unit definition.
Record *RetireControlUnit;		Record *RetireControlUnit;

		// Load/Store queue descriptors.
		Record *LoadQueue;
		Record *StoreQueue;

CodeGenProcModel(unsigned Idx, std::string Name, Record *MDef,		CodeGenProcModel(unsigned Idx, std::string Name, Record *MDef,
Record *IDef) :		Record *IDef) :
Index(Idx), ModelName(std::move(Name)), ModelDef(MDef), ItinsDef(IDef),		Index(Idx), ModelName(std::move(Name)), ModelDef(MDef), ItinsDef(IDef),
RetireControlUnit(nullptr) {}		RetireControlUnit(nullptr), LoadQueue(nullptr), StoreQueue(nullptr) {}

bool hasItineraries() const {		bool hasItineraries() const {
return !ItinsDef->getValueAsListOfDefs("IID").empty();		return !ItinsDef->getValueAsListOfDefs("IID").empty();
}		}

bool hasInstrSchedModel() const {		bool hasInstrSchedModel() const {
return !WriteResDefs.empty() \|\| !ItinRWDefs.empty();		return !WriteResDefs.empty() \|\| !ItinRWDefs.empty();
}		}

bool hasExtraProcessorInfo() const {		bool hasExtraProcessorInfo() const {
return RetireControlUnit \|\| !RegisterFiles.empty();		return RetireControlUnit \|\| LoadQueue \|\| StoreQueue \|\|
		!RegisterFiles.empty();
}		}

unsigned getProcResourceIdx(Record *PRDef) const;		unsigned getProcResourceIdx(Record *PRDef) const;

bool isUnsupported(const CodeGenInstruction &Inst) const;		bool isUnsupported(const CodeGenInstruction &Inst) const;

#ifndef NDEBUG		#ifndef NDEBUG
void dump() const;		void dump() const;
▲ Show 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	private:
void inferSchedClasses();		void inferSchedClasses();

void checkMCInstPredicates() const;		void checkMCInstPredicates() const;

void checkSTIPredicates() const;		void checkSTIPredicates() const;

void collectSTIPredicates();		void collectSTIPredicates();

		void collectLoadStoreQueueInfo();

void checkCompleteness();		void checkCompleteness();

void inferFromRW(ArrayRef<unsigned> OperWrites, ArrayRef<unsigned> OperReads,		void inferFromRW(ArrayRef<unsigned> OperWrites, ArrayRef<unsigned> OperReads,
unsigned FromClassIdx, ArrayRef<unsigned> ProcIndices);		unsigned FromClassIdx, ArrayRef<unsigned> ProcIndices);
void inferFromItinClass(Record *ItinClassDef, unsigned FromClassIdx);		void inferFromItinClass(Record *ItinClassDef, unsigned FromClassIdx);
void inferFromInstRWs(unsigned SCIdx);		void inferFromInstRWs(unsigned SCIdx);

bool hasSuperGroup(RecVec &SubUnits, CodeGenProcModel &PM);		bool hasSuperGroup(RecVec &SubUnits, CodeGenProcModel &PM);
Show All 23 Lines

llvm/trunk/utils/TableGen/CodeGenSchedule.cpp

Show First 20 Lines • Show All 473 Lines • ▼ Show 20 Lines	if (PM.RetireControlUnit) {
"Expected a single RetireControlUnit definition");		"Expected a single RetireControlUnit definition");
PrintNote(PM.RetireControlUnit->getLoc(),		PrintNote(PM.RetireControlUnit->getLoc(),
"Previous definition of RetireControlUnit was here");		"Previous definition of RetireControlUnit was here");
}		}
PM.RetireControlUnit = RCU;		PM.RetireControlUnit = RCU;
}		}
}		}

		void CodeGenSchedModels::collectLoadStoreQueueInfo() {
		RecVec Queues = Records.getAllDerivedDefinitions("MemoryQueue");

		for (Record *Queue : Queues) {
		CodeGenProcModel &PM = getProcModel(Queue->getValueAsDef("SchedModel"));
		if (Queue->isSubClassOf("LoadQueue")) {
		if (PM.LoadQueue) {
		PrintError(Queue->getLoc(),
		"Expected a single LoadQueue definition");
		PrintNote(PM.LoadQueue->getLoc(),
		"Previous definition of LoadQueue was here");
		}

		PM.LoadQueue = Queue;
		}

		if (Queue->isSubClassOf("StoreQueue")) {
		if (PM.StoreQueue) {
		PrintError(Queue->getLoc(),
		"Expected a single StoreQueue definition");
		PrintNote(PM.LoadQueue->getLoc(),
		"Previous definition of StoreQueue was here");
		}

		PM.StoreQueue = Queue;
		}
		}
		}

/// Collect optional processor information.		/// Collect optional processor information.
void CodeGenSchedModels::collectOptionalProcessorInfo() {		void CodeGenSchedModels::collectOptionalProcessorInfo() {
// Find register file definitions for each processor.		// Find register file definitions for each processor.
collectRegisterFiles();		collectRegisterFiles();

// Collect processor RetireControlUnit descriptors if available.		// Collect processor RetireControlUnit descriptors if available.
collectRetireControlUnits();		collectRetireControlUnits();

		// Collect information about load/store queues.
		collectLoadStoreQueueInfo();

checkCompleteness();		checkCompleteness();
}		}

/// Gather all processor models.		/// Gather all processor models.
void CodeGenSchedModels::collectProcModels() {		void CodeGenSchedModels::collectProcModels() {
RecVec ProcRecords = Records.getAllDerivedDefinitions("Processor");		RecVec ProcRecords = Records.getAllDerivedDefinitions("Processor");
llvm::sort(ProcRecords, LessRecordFieldName());		llvm::sort(ProcRecords, LessRecordFieldName());

▲ Show 20 Lines • Show All 1,697 Lines • Show Last 20 Lines

llvm/trunk/utils/TableGen/SubtargetEmitter.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	class SubtargetEmitter {
void EmitStageAndOperandCycleData(raw_ostream &OS,		void EmitStageAndOperandCycleData(raw_ostream &OS,
std::vector<std::vector<InstrItinerary>>		std::vector<std::vector<InstrItinerary>>
&ProcItinLists);		&ProcItinLists);
void EmitItineraries(raw_ostream &OS,		void EmitItineraries(raw_ostream &OS,
std::vector<std::vector<InstrItinerary>>		std::vector<std::vector<InstrItinerary>>
&ProcItinLists);		&ProcItinLists);
unsigned EmitRegisterFileTables(const CodeGenProcModel &ProcModel,		unsigned EmitRegisterFileTables(const CodeGenProcModel &ProcModel,
raw_ostream &OS);		raw_ostream &OS);
		void EmitLoadStoreQueueInfo(const CodeGenProcModel &ProcModel,
		raw_ostream &OS);
void EmitExtraProcessorInfo(const CodeGenProcModel &ProcModel,		void EmitExtraProcessorInfo(const CodeGenProcModel &ProcModel,
raw_ostream &OS);		raw_ostream &OS);
void EmitProcessorProp(raw_ostream &OS, const Record *R, StringRef Name,		void EmitProcessorProp(raw_ostream &OS, const Record *R, StringRef Name,
char Separator);		char Separator);
void EmitProcessorResourceSubUnits(const CodeGenProcModel &ProcModel,		void EmitProcessorResourceSubUnits(const CodeGenProcModel &ProcModel,
raw_ostream &OS);		raw_ostream &OS);
void EmitProcessorResources(const CodeGenProcModel &ProcModel,		void EmitProcessorResources(const CodeGenProcModel &ProcModel,
raw_ostream &OS);		raw_ostream &OS);
▲ Show 20 Lines • Show All 588 Lines • ▼ Show 20 Lines	OS << NumCostEntries << ", " << CostTblIndex << ", "
<< RD.AllowZeroMoveEliminationOnly << "},\n";		<< RD.AllowZeroMoveEliminationOnly << "},\n";
CostTblIndex += NumCostEntries;		CostTblIndex += NumCostEntries;
}		}
OS << "};\n";		OS << "};\n";

return CostTblIndex;		return CostTblIndex;
}		}

		void SubtargetEmitter::EmitLoadStoreQueueInfo(const CodeGenProcModel &ProcModel,
		raw_ostream &OS) {
		unsigned QueueID = 0;
		if (ProcModel.LoadQueue) {
		const Record *Queue = ProcModel.LoadQueue->getValueAsDef("QueueDescriptor");
		QueueID =
		1 + std::distance(ProcModel.ProcResourceDefs.begin(),
		std::find(ProcModel.ProcResourceDefs.begin(),
		ProcModel.ProcResourceDefs.end(), Queue));
		}
		OS << " " << QueueID << ", // Resource Descriptor for the Load Queue\n";

		QueueID = 0;
		if (ProcModel.StoreQueue) {
		const Record *Queue =
		ProcModel.StoreQueue->getValueAsDef("QueueDescriptor");
		QueueID =
		1 + std::distance(ProcModel.ProcResourceDefs.begin(),
		std::find(ProcModel.ProcResourceDefs.begin(),
		ProcModel.ProcResourceDefs.end(), Queue));
		}
		OS << " " << QueueID << ", // Resource Descriptor for the Store Queue\n";
		}

void SubtargetEmitter::EmitExtraProcessorInfo(const CodeGenProcModel &ProcModel,		void SubtargetEmitter::EmitExtraProcessorInfo(const CodeGenProcModel &ProcModel,
raw_ostream &OS) {		raw_ostream &OS) {
// Generate a table of register file descriptors (one entry per each user		// Generate a table of register file descriptors (one entry per each user
// defined register file), and a table of register costs.		// defined register file), and a table of register costs.
unsigned NumCostEntries = EmitRegisterFileTables(ProcModel, OS);		unsigned NumCostEntries = EmitRegisterFileTables(ProcModel, OS);

// Now generate a table for the extra processor info.		// Now generate a table for the extra processor info.
OS << "\nstatic const llvm::MCExtraProcessorInfo " << ProcModel.ModelName		OS << "\nstatic const llvm::MCExtraProcessorInfo " << ProcModel.ModelName
<< "ExtraInfo = {\n ";		<< "ExtraInfo = {\n ";

// Add information related to the retire control unit.		// Add information related to the retire control unit.
EmitRetireControlUnitInfo(ProcModel, OS);		EmitRetireControlUnitInfo(ProcModel, OS);

// Add information related to the register files (i.e. where to find register		// Add information related to the register files (i.e. where to find register
// file descriptors and register costs).		// file descriptors and register costs).
EmitRegisterFileInfo(ProcModel, ProcModel.RegisterFiles.size(),		EmitRegisterFileInfo(ProcModel, ProcModel.RegisterFiles.size(),
NumCostEntries, OS);		NumCostEntries, OS);

		// Add information about load/store queues.
		EmitLoadStoreQueueInfo(ProcModel, OS);

OS << "};\n";		OS << "};\n";
}		}

void SubtargetEmitter::EmitProcessorResources(const CodeGenProcModel &ProcModel,		void SubtargetEmitter::EmitProcessorResources(const CodeGenProcModel &ProcModel,
raw_ostream &OS) {		raw_ostream &OS) {
EmitProcessorResourceSubUnits(ProcModel, OS);		EmitProcessorResourceSubUnits(ProcModel, OS);

OS << "\n// {Name, NumUnits, SuperIdx, BufferSize, SubUnitsIdxBegin}\n";		OS << "\n// {Name, NumUnits, SuperIdx, BufferSize, SubUnitsIdxBegin}\n";
▲ Show 20 Lines • Show All 1,201 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 175851

llvm/trunk/include/llvm/MC/MCSchedule.h

llvm/trunk/include/llvm/Target/TargetSchedule.td

llvm/trunk/lib/Target/X86/X86ScheduleBdVer2.td

llvm/trunk/test/tools/llvm-mca/X86/BdVer2/load-throughput.s

llvm/trunk/test/tools/llvm-mca/X86/BdVer2/store-throughput.s

llvm/trunk/tools/llvm-mca/Views/SchedulerStatistics.h

llvm/trunk/tools/llvm-mca/Views/SchedulerStatistics.cpp

llvm/trunk/tools/llvm-mca/include/HardwareUnits/LSUnit.h

llvm/trunk/tools/llvm-mca/lib/Context.cpp

llvm/trunk/tools/llvm-mca/lib/HardwareUnits/LSUnit.cpp

llvm/trunk/tools/llvm-mca/llvm-mca.cpp

llvm/trunk/utils/TableGen/CodeGenSchedule.h

llvm/trunk/utils/TableGen/CodeGenSchedule.cpp

llvm/trunk/utils/TableGen/SubtargetEmitter.cpp

[llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).
ClosedPublic