This is an archive of the discontinued LLVM Phabricator instance.

[MCA][LSUnit] Track loads and stores until retirement.
ClosedPublic

Authored by andreadb on Oct 1 2019, 5:07 AM.

Download Raw Diff

Details

Reviewers

RKSimon
lebedev.ri
courbet

Commits

rG8d6651f7b11e: [MCA][LSUnit] Track loads and stores until retirement.
rL374034: [MCA][LSUnit] Track loads and stores until retirement.

Summary

Before this patch, loads and stores were only tracked by their corresponding queues in the LSUnit from dispatch until execute stage. In practice we should be more conservative and assume that memory opcodes leave their queues at retirement stage.

Basically, loads should leave the load queue only when they have completed and delivered their data. We conservatively assume that a load is completed when it is retired. Stores should be tracked by the store queue from dispatch until retirement. In practice, stores can only leave the store queue if their data can be written to the data cache.

This is mostly a mechanical change. With this patch, the retire stage notifies the LSUnit when a memory instruction has retired.
That would triggers the release of LDQ/STQ entries.
The only visible change is in memory tests for the bdver2 model. That is because bdver2 is the only model that defines the load/store queue size.

This patch partially addresses PR39830.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

andreadb created this revision.Oct 1 2019, 5:07 AM

Herald added a subscriber: gbedwell. · View Herald TranscriptOct 1 2019, 5:07 AM

Thank you for working on this!
Seems ok to me.

include/llvm/MCA/HardwareUnits/LSUnit.h
298–299 ↗	(On Diff #222592)	// By default we conservatively assume that the LDQ receives a load at dispatch. I think this may explain some of the weird throughput numbers i was seeing for load-folded instructions. (as compared with llvm-exegesis measurements) Is there a bug that tracks this? I wonder if the correct choice would be to make it wait for L1 latency here.

Thanks Roman,

include/llvm/MCA/HardwareUnits/LSUnit.h
298–299 ↗	(On Diff #222592)	It would be interesting to see what code is compiled and run by exegesis to obtain the latency/throughput of those load folded instructions. Not knowing what kernel is run by exegesis makes it hard for me to understand your last comment. Could you please post an example in PR39830 (or raise a separate bug)? that would be very useful. Thanks.
298–299 ↗	(On Diff #222592)	To further clarify this. The LDQ does receive load opcodes at dispatch. The 'conservative assumption' here is that loads leave at retire rather than at the end of execution. Stores are always tracked by the STQ from dispatch until retire.

I think we should proceed with this.
The question i raised can be addressed later.
It basically is: "if the L1 latency is <n> cycles, and we start executing these
load-folded instructions these <n> cycles earlier, then where are those cycles
*actually* spent, if we're in-and-out of load queue without spending any cycles there?"

This revision is now accepted and ready to land.Oct 6 2019, 10:58 AM

Closed by commit rG8d6651f7b11e: [MCA][LSUnit] Track loads and stores until retirement. (authored by andreadb). · Explain WhyOct 8 2019, 3:46 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2019, 3:46 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

In D68266#1696692, @lebedev.ri wrote:

I think we should proceed with this.
The question i raised can be addressed later.
It basically is: "if the L1 latency is <n> cycles, and we start executing these
load-folded instructions these <n> cycles earlier, then where are those cycles
*actually* spent, if we're in-and-out of load queue without spending any cycles there?"

Finally moved to https://bugs.llvm.org/show_bug.cgi?id=39830#c4

Revision Contents

Path

Size

llvm/

include/

llvm/

MCA/

HardwareUnits/

LSUnit.h

10 lines

Stages/

RetireStage.h

6 lines

lib/

MCA/

Context.cpp

2 lines

HardwareUnits/

LSUnit.cpp

16 lines

Stages/

RetireStage.cpp

4 lines

test/

tools/

llvm-mca/

X86/

BdVer2/

load-store-throughput.s

30 lines

load-throughput.s

44 lines

store-throughput.s

73 lines

Diff 223815

llvm/include/llvm/MCA/HardwareUnits/LSUnit.h

Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines	public:
}		}

unsigned createMemoryGroup() {		unsigned createMemoryGroup() {
Groups.insert(		Groups.insert(
std::make_pair(NextGroupID, std::make_unique<MemoryGroup>()));		std::make_pair(NextGroupID, std::make_unique<MemoryGroup>()));
return NextGroupID++;		return NextGroupID++;
}		}

// Instruction executed event handlers.
virtual void onInstructionExecuted(const InstRef &IR);		virtual void onInstructionExecuted(const InstRef &IR);

		// Loads are tracked by the LDQ (load queue) from dispatch until completion.
		// Stores are tracked by the STQ (store queue) from dispatch until commitment.
		// By default we conservatively assume that the LDQ receives a load at
		// dispatch. Loads leave the LDQ at retirement stage.
		virtual void onInstructionRetired(const InstRef &IR);

virtual void onInstructionIssued(const InstRef &IR) {		virtual void onInstructionIssued(const InstRef &IR) {
unsigned GroupID = IR.getInstruction()->getLSUTokenID();		unsigned GroupID = IR.getInstruction()->getLSUTokenID();
Groups[GroupID]->onInstructionIssued(IR);		Groups[GroupID]->onInstructionIssued(IR);
}		}

virtual void cycleEvent();		virtual void cycleEvent();

#ifndef NDEBUG		#ifndef NDEBUG
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	public:
/// 1. A store may not pass a previous store.		/// 1. A store may not pass a previous store.
/// 2. A load may not pass a previous store unless flag 'NoAlias' is set.		/// 2. A load may not pass a previous store unless flag 'NoAlias' is set.
/// 3. A load may pass a previous load.		/// 3. A load may pass a previous load.
/// 4. A store may not pass a previous load (regardless of flag 'NoAlias').		/// 4. A store may not pass a previous load (regardless of flag 'NoAlias').
/// 5. A load has to wait until an older load barrier is fully executed.		/// 5. A load has to wait until an older load barrier is fully executed.
/// 6. A store has to wait until an older store barrier is fully executed.		/// 6. A store has to wait until an older store barrier is fully executed.
unsigned dispatch(const InstRef &IR) override;		unsigned dispatch(const InstRef &IR) override;

// FIXME: For simplicity, we optimistically assume a similar behavior for
// store instructions. In practice, store operations don't tend to leave the
// store queue until they reach the 'Retired' stage (See PR39830).
void onInstructionExecuted(const InstRef &IR) override;		void onInstructionExecuted(const InstRef &IR) override;
};		};

} // namespace mca		} // namespace mca
} // namespace llvm		} // namespace llvm

#endif // LLVM_MCA_LSUNIT_H		#endif // LLVM_MCA_LSUNIT_H

llvm/include/llvm/MCA/Stages/RetireStage.h

	Show All 10 Lines
	/// The RetireStage represents the process logic that interacts with the			/// The RetireStage represents the process logic that interacts with the
	/// simulated RetireControlUnit hardware.			/// simulated RetireControlUnit hardware.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_MCA_RETIRE_STAGE_H			#ifndef LLVM_MCA_RETIRE_STAGE_H
	#define LLVM_MCA_RETIRE_STAGE_H			#define LLVM_MCA_RETIRE_STAGE_H

				#include "llvm/MCA/HardwareUnits/LSUnit.h"
	#include "llvm/MCA/HardwareUnits/RegisterFile.h"			#include "llvm/MCA/HardwareUnits/RegisterFile.h"
	#include "llvm/MCA/HardwareUnits/RetireControlUnit.h"			#include "llvm/MCA/HardwareUnits/RetireControlUnit.h"
	#include "llvm/MCA/Stages/Stage.h"			#include "llvm/MCA/Stages/Stage.h"

	namespace llvm {			namespace llvm {
	namespace mca {			namespace mca {

	class RetireStage final : public Stage {			class RetireStage final : public Stage {
	// Owner will go away when we move listeners/eventing to the stages.			// Owner will go away when we move listeners/eventing to the stages.
	RetireControlUnit &RCU;			RetireControlUnit &RCU;
	RegisterFile &PRF;			RegisterFile &PRF;
				LSUnitBase &LSU;

	RetireStage(const RetireStage &Other) = delete;			RetireStage(const RetireStage &Other) = delete;
	RetireStage &operator=(const RetireStage &Other) = delete;			RetireStage &operator=(const RetireStage &Other) = delete;

	public:			public:
	RetireStage(RetireControlUnit &R, RegisterFile &F)			RetireStage(RetireControlUnit &R, RegisterFile &F, LSUnitBase &LS)
	: Stage(), RCU(R), PRF(F) {}			: Stage(), RCU(R), PRF(F), LSU(LS) {}

	bool hasWorkToComplete() const override { return !RCU.isEmpty(); }			bool hasWorkToComplete() const override { return !RCU.isEmpty(); }
	Error cycleStart() override;			Error cycleStart() override;
	Error execute(InstRef &IR) override;			Error execute(InstRef &IR) override;
	void notifyInstructionRetired(const InstRef &IR) const;			void notifyInstructionRetired(const InstRef &IR) const;
	};			};

	} // namespace mca			} // namespace mca
	} // namespace llvm			} // namespace llvm

	#endif // LLVM_MCA_RETIRE_STAGE_H			#endif // LLVM_MCA_RETIRE_STAGE_H

llvm/lib/MCA/Context.cpp

Show All 38 Lines	Context::createDefaultPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr) {
auto HWS = std::make_unique<Scheduler>(SM, *LSU);		auto HWS = std::make_unique<Scheduler>(SM, *LSU);

// Create the pipeline stages.		// Create the pipeline stages.
auto Fetch = std::make_unique<EntryStage>(SrcMgr);		auto Fetch = std::make_unique<EntryStage>(SrcMgr);
auto Dispatch = std::make_unique<DispatchStage>(STI, MRI, Opts.DispatchWidth,		auto Dispatch = std::make_unique<DispatchStage>(STI, MRI, Opts.DispatchWidth,
RCU, PRF);		RCU, PRF);
auto Execute =		auto Execute =
std::make_unique<ExecuteStage>(*HWS, Opts.EnableBottleneckAnalysis);		std::make_unique<ExecuteStage>(*HWS, Opts.EnableBottleneckAnalysis);
auto Retire = std::make_unique<RetireStage>(RCU, PRF);		auto Retire = std::make_unique<RetireStage>(RCU, PRF, *LSU);

// Pass the ownership of all the hardware units to this Context.		// Pass the ownership of all the hardware units to this Context.
addHardwareUnit(std::move(RCU));		addHardwareUnit(std::move(RCU));
addHardwareUnit(std::move(PRF));		addHardwareUnit(std::move(PRF));
addHardwareUnit(std::move(LSU));		addHardwareUnit(std::move(LSU));
addHardwareUnit(std::move(HWS));		addHardwareUnit(std::move(HWS));

// Build the pipeline.		// Build the pipeline.
Show All 13 Lines

llvm/lib/MCA/HardwareUnits/LSUnit.cpp

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	LSUnit::Status LSUnit::isAvailable(const InstRef &IR) const {
if (Desc.MayLoad && isLQFull())		if (Desc.MayLoad && isLQFull())
return LSUnit::LSU_LQUEUE_FULL;		return LSUnit::LSU_LQUEUE_FULL;
if (Desc.MayStore && isSQFull())		if (Desc.MayStore && isSQFull())
return LSUnit::LSU_SQUEUE_FULL;		return LSUnit::LSU_SQUEUE_FULL;
return LSUnit::LSU_AVAILABLE;		return LSUnit::LSU_AVAILABLE;
}		}

void LSUnitBase::onInstructionExecuted(const InstRef &IR) {		void LSUnitBase::onInstructionExecuted(const InstRef &IR) {
const InstrDesc &Desc = IR.getInstruction()->getDesc();
bool IsALoad = Desc.MayLoad;
bool IsAStore = Desc.MayStore;
assert((IsALoad \|\| IsAStore) && "Expected a memory operation!");

unsigned GroupID = IR.getInstruction()->getLSUTokenID();		unsigned GroupID = IR.getInstruction()->getLSUTokenID();
auto It = Groups.find(GroupID);		auto It = Groups.find(GroupID);
		assert(It != Groups.end() && "Instruction not dispatched to the LS unit");
It->second->onInstructionExecuted();		It->second->onInstructionExecuted();
if (It->second->isExecuted()) {		if (It->second->isExecuted())
Groups.erase(It);		Groups.erase(It);
}		}

		void LSUnitBase::onInstructionRetired(const InstRef &IR) {
		const InstrDesc &Desc = IR.getInstruction()->getDesc();
		bool IsALoad = Desc.MayLoad;
		bool IsAStore = Desc.MayStore;
		assert((IsALoad \|\| IsAStore) && "Expected a memory operation!");

if (IsALoad) {		if (IsALoad) {
releaseLQSlot();		releaseLQSlot();
LLVM_DEBUG(dbgs() << "[LSUnit]: Instruction idx=" << IR.getSourceIndex()		LLVM_DEBUG(dbgs() << "[LSUnit]: Instruction idx=" << IR.getSourceIndex()
<< " has been removed from the load queue.\n");		<< " has been removed from the load queue.\n");
}		}

if (IsAStore) {		if (IsAStore) {
releaseSQSlot();		releaseSQSlot();
Show All 24 Lines

llvm/lib/MCA/Stages/RetireStage.cpp

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	llvm::Error RetireStage::execute(InstRef &IR) {
return llvm::ErrorSuccess();		return llvm::ErrorSuccess();
}		}

void RetireStage::notifyInstructionRetired(const InstRef &IR) const {		void RetireStage::notifyInstructionRetired(const InstRef &IR) const {
LLVM_DEBUG(llvm::dbgs() << "[E] Instruction Retired: #" << IR << '\n');		LLVM_DEBUG(llvm::dbgs() << "[E] Instruction Retired: #" << IR << '\n');
llvm::SmallVector<unsigned, 4> FreedRegs(PRF.getNumRegisterFiles());		llvm::SmallVector<unsigned, 4> FreedRegs(PRF.getNumRegisterFiles());
const Instruction &Inst = *IR.getInstruction();		const Instruction &Inst = *IR.getInstruction();

		// Release the load/store queue entries.
		if (Inst.isMemOp())
		LSU.onInstructionRetired(IR);

for (const WriteState &WS : Inst.getDefs())		for (const WriteState &WS : Inst.getDefs())
PRF.removeRegisterWrite(WS, FreedRegs);		PRF.removeRegisterWrite(WS, FreedRegs);
notifyEvent<HWInstructionEvent>(HWInstructionRetiredEvent(IR, FreedRegs));		notifyEvent<HWInstructionEvent>(HWInstructionRetiredEvent(IR, FreedRegs));
}		}

} // namespace mca		} // namespace mca
} // namespace llvm		} // namespace llvm

llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s

	Show First 20 Lines • Show All 501 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movq (%rcx), %rbp			# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movq (%rcx), %rbp
	# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movq (%rdx), %rsi			# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movq (%rdx), %rsi
	# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movq %rdi, (%rbx)			# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movq %rdi, (%rbx)

	# CHECK: [4] Code Region			# CHECK: [4] Code Region

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 400			# CHECK-NEXT: Instructions: 400
	# CHECK-NEXT: Total Cycles: 593			# CHECK-NEXT: Total Cycles: 554
	# CHECK-NEXT: Total uOps: 400			# CHECK-NEXT: Total uOps: 400

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 0.67			# CHECK-NEXT: uOps Per Cycle: 0.72
	# CHECK-NEXT: IPC: 0.67			# CHECK-NEXT: IPC: 0.72
	# CHECK-NEXT: Block RThroughput: 4.0			# CHECK-NEXT: Block RThroughput: 4.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	# CHECK-NEXT: [6]: HasSideEffects (U)			# CHECK-NEXT: [6]: HasSideEffects (U)

	# CHECK: [1] [2] [3] [4] [5] [6] Instructions:			# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
	# CHECK-NEXT: 1 2 1.50 * U movd %mm0, (%rax)			# CHECK-NEXT: 1 2 1.50 * U movd %mm0, (%rax)
	# CHECK-NEXT: 1 5 1.50 * movd (%rcx), %mm1			# CHECK-NEXT: 1 5 1.50 * movd (%rcx), %mm1
	# CHECK-NEXT: 1 5 1.50 * movd (%rdx), %mm2			# CHECK-NEXT: 1 5 1.50 * movd (%rdx), %mm2
	# CHECK-NEXT: 1 2 1.50 * U movd %mm3, (%rbx)			# CHECK-NEXT: 1 2 1.50 * U movd %mm3, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 187 (31.5%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 55 (9.9%)
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 342 (57.7%)			# CHECK-NEXT: SQ - Store queue full: 437 (78.9%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 403 (68.0%)			# CHECK-NEXT: 0, 365 (65.9%)
	# CHECK-NEXT: 1, 90 (15.2%)			# CHECK-NEXT: 1, 88 (15.9%)
	# CHECK-NEXT: 2, 2 (0.3%)			# CHECK-NEXT: 2, 3 (0.5%)
	# CHECK-NEXT: 3, 86 (14.5%)			# CHECK-NEXT: 3, 86 (15.5%)
	# CHECK-NEXT: 4, 12 (2.0%)			# CHECK-NEXT: 4, 12 (2.2%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 292 (49.2%)			# CHECK-NEXT: 0, 253 (45.7%)
	# CHECK-NEXT: 1, 202 (34.1%)			# CHECK-NEXT: 1, 202 (36.5%)
	# CHECK-NEXT: 2, 99 (16.7%)			# CHECK-NEXT: 2, 99 (17.9%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	Show All 29 Lines

	# CHECK: Resource pressure per iteration:			# CHECK: Resource pressure per iteration:
	# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18]			# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18]
	# CHECK-NEXT: 4.00 4.00 - - - - - - - - 3.00 3.00 - 2.00 1.00 1.00 3.00 3.00 - 3.00 3.00 - 2.00			# CHECK-NEXT: 4.00 4.00 - - - - - - - - 3.00 3.00 - 2.00 1.00 1.00 3.00 3.00 - 3.00 3.00 - 2.00

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:			# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
	# CHECK-NEXT: - 1.00 - - - - - - - - - - - 1.00 - - - 3.00 - - - - 1.00 movd %mm0, (%rax)			# CHECK-NEXT: - 1.00 - - - - - - - - - - - 1.00 - - - 3.00 - - - - 1.00 movd %mm0, (%rax)
	# CHECK-NEXT: 0.36 2.64 - - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - movd (%rcx), %mm1			# CHECK-NEXT: 1.53 1.47 - - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - movd (%rcx), %mm1
	# CHECK-NEXT: 2.64 0.36 - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - - movd (%rdx), %mm2			# CHECK-NEXT: 1.47 1.53 - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - - movd (%rdx), %mm2
	# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movd %mm3, (%rbx)			# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movd %mm3, (%rbx)

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 0			# CHECK-NEXT: 0
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeER. . movd %mm0, (%rax)			# CHECK: [0,0] DeeER. . movd %mm0, (%rax)
	# CHECK-NEXT: [0,1] DeeeeeER . movd (%rcx), %mm1			# CHECK-NEXT: [0,1] DeeeeeER . movd (%rcx), %mm1
	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/test/tools/llvm-mca/X86/BdVer2/load-throughput.s

	Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 1.00 * movb (%rcx), %bpl			# CHECK-NEXT: 1 5 1.00 * movb (%rcx), %bpl
	# CHECK-NEXT: 1 5 1.00 * movb (%rdx), %sil			# CHECK-NEXT: 1 5 1.00 * movb (%rdx), %sil
	# CHECK-NEXT: 1 5 1.00 * movb (%rbx), %dil			# CHECK-NEXT: 1 5 1.00 * movb (%rbx), %dil

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 353 (86.9%)			# CHECK-NEXT: LQ - Load queue full: 354 (87.2%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 217 (53.4%)			# CHECK-NEXT: 0, 217 (53.4%)
	# CHECK-NEXT: 2, 178 (43.8%)			# CHECK-NEXT: 2, 178 (43.8%)
	# CHECK-NEXT: 4, 11 (2.7%)			# CHECK-NEXT: 4, 11 (2.7%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 206 (50.7%)			# CHECK-NEXT: 0, 206 (50.7%)
	# CHECK-NEXT: 2, 200 (49.3%)			# CHECK-NEXT: 2, 200 (49.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 32 36 40			# CHECK-NEXT: PdEX 31 34 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 37 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 1.00 * movw (%rcx), %bp			# CHECK-NEXT: 1 5 1.00 * movw (%rcx), %bp
	# CHECK-NEXT: 1 5 1.00 * movw (%rdx), %si			# CHECK-NEXT: 1 5 1.00 * movw (%rdx), %si
	# CHECK-NEXT: 1 5 1.00 * movw (%rbx), %di			# CHECK-NEXT: 1 5 1.00 * movw (%rbx), %di

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 353 (86.9%)			# CHECK-NEXT: LQ - Load queue full: 354 (87.2%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 217 (53.4%)			# CHECK-NEXT: 0, 217 (53.4%)
	# CHECK-NEXT: 2, 178 (43.8%)			# CHECK-NEXT: 2, 178 (43.8%)
	# CHECK-NEXT: 4, 11 (2.7%)			# CHECK-NEXT: 4, 11 (2.7%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 206 (50.7%)			# CHECK-NEXT: 0, 206 (50.7%)
	# CHECK-NEXT: 2, 200 (49.3%)			# CHECK-NEXT: 2, 200 (49.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 32 36 40			# CHECK-NEXT: PdEX 31 34 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 37 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 1.00 * movl (%rcx), %ebp			# CHECK-NEXT: 1 5 1.00 * movl (%rcx), %ebp
	# CHECK-NEXT: 1 5 1.00 * movl (%rdx), %esi			# CHECK-NEXT: 1 5 1.00 * movl (%rdx), %esi
	# CHECK-NEXT: 1 5 1.00 * movl (%rbx), %edi			# CHECK-NEXT: 1 5 1.00 * movl (%rbx), %edi

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 353 (86.9%)			# CHECK-NEXT: LQ - Load queue full: 354 (87.2%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 217 (53.4%)			# CHECK-NEXT: 0, 217 (53.4%)
	# CHECK-NEXT: 2, 178 (43.8%)			# CHECK-NEXT: 2, 178 (43.8%)
	# CHECK-NEXT: 4, 11 (2.7%)			# CHECK-NEXT: 4, 11 (2.7%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 206 (50.7%)			# CHECK-NEXT: 0, 206 (50.7%)
	# CHECK-NEXT: 2, 200 (49.3%)			# CHECK-NEXT: 2, 200 (49.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 32 36 40			# CHECK-NEXT: PdEX 31 34 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 37 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 1.00 * movq (%rcx), %rbp			# CHECK-NEXT: 1 5 1.00 * movq (%rcx), %rbp
	# CHECK-NEXT: 1 5 1.00 * movq (%rdx), %rsi			# CHECK-NEXT: 1 5 1.00 * movq (%rdx), %rsi
	# CHECK-NEXT: 1 5 1.00 * movq (%rbx), %rdi			# CHECK-NEXT: 1 5 1.00 * movq (%rbx), %rdi

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 353 (86.9%)			# CHECK-NEXT: LQ - Load queue full: 354 (87.2%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 217 (53.4%)			# CHECK-NEXT: 0, 217 (53.4%)
	# CHECK-NEXT: 2, 178 (43.8%)			# CHECK-NEXT: 2, 178 (43.8%)
	# CHECK-NEXT: 4, 11 (2.7%)			# CHECK-NEXT: 4, 11 (2.7%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 206 (50.7%)			# CHECK-NEXT: 0, 206 (50.7%)
	# CHECK-NEXT: 2, 200 (49.3%)			# CHECK-NEXT: 2, 200 (49.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 32 36 40			# CHECK-NEXT: PdEX 31 34 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 37 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 1.50 * movd (%rcx), %mm1			# CHECK-NEXT: 1 5 1.50 * movd (%rcx), %mm1
	# CHECK-NEXT: 1 5 1.50 * movd (%rdx), %mm2			# CHECK-NEXT: 1 5 1.50 * movd (%rdx), %mm2
	# CHECK-NEXT: 1 5 1.50 * movd (%rbx), %mm3			# CHECK-NEXT: 1 5 1.50 * movd (%rbx), %mm3

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 532 (87.9%)			# CHECK-NEXT: LQ - Load queue full: 533 (88.1%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 416 (68.8%)			# CHECK-NEXT: 0, 416 (68.8%)
	# CHECK-NEXT: 2, 178 (29.4%)			# CHECK-NEXT: 2, 178 (29.4%)
	# CHECK-NEXT: 4, 11 (1.8%)			# CHECK-NEXT: 4, 11 (1.8%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 405 (66.9%)			# CHECK-NEXT: 0, 405 (66.9%)
	# CHECK-NEXT: 2, 200 (33.1%)			# CHECK-NEXT: 2, 200 (33.1%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 34 38 40			# CHECK-NEXT: PdEX 33 36 40
	# CHECK-NEXT: PdFPU 34 38 64			# CHECK-NEXT: PdFPU 33 36 64
	# CHECK-NEXT: PdLoad 37 40 40			# CHECK-NEXT: PdLoad 37 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 5 1.50 * movaps (%rcx), %xmm1			# CHECK-NEXT: 1 5 1.50 * movaps (%rcx), %xmm1
	# CHECK-NEXT: 1 5 1.50 * movaps (%rdx), %xmm2			# CHECK-NEXT: 1 5 1.50 * movaps (%rdx), %xmm2
	# CHECK-NEXT: 1 5 1.50 * movaps (%rbx), %xmm3			# CHECK-NEXT: 1 5 1.50 * movaps (%rbx), %xmm3

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 532 (87.9%)			# CHECK-NEXT: LQ - Load queue full: 533 (88.1%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 416 (68.8%)			# CHECK-NEXT: 0, 416 (68.8%)
	# CHECK-NEXT: 2, 178 (29.4%)			# CHECK-NEXT: 2, 178 (29.4%)
	# CHECK-NEXT: 4, 11 (1.8%)			# CHECK-NEXT: 4, 11 (1.8%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 405 (66.9%)			# CHECK-NEXT: 0, 405 (66.9%)
	# CHECK-NEXT: 2, 200 (33.1%)			# CHECK-NEXT: 2, 200 (33.1%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 34 38 40			# CHECK-NEXT: PdEX 33 36 40
	# CHECK-NEXT: PdFPU 34 38 64			# CHECK-NEXT: PdFPU 33 36 64
	# CHECK-NEXT: PdLoad 37 40 40			# CHECK-NEXT: PdLoad 37 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 2 5 1.50 * vmovaps (%rcx), %ymm1			# CHECK-NEXT: 2 5 1.50 * vmovaps (%rcx), %ymm1
	# CHECK-NEXT: 2 5 1.50 * vmovaps (%rdx), %ymm2			# CHECK-NEXT: 2 5 1.50 * vmovaps (%rdx), %ymm2
	# CHECK-NEXT: 2 5 1.50 * vmovaps (%rbx), %ymm3			# CHECK-NEXT: 2 5 1.50 * vmovaps (%rbx), %ymm3

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 344 (56.9%)			# CHECK-NEXT: LQ - Load queue full: 345 (57.0%)
	# CHECK-NEXT: SQ - Store queue full: 0			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 405 (66.9%)			# CHECK-NEXT: 0, 405 (66.9%)
	# CHECK-NEXT: 4, 200 (33.1%)			# CHECK-NEXT: 4, 200 (33.1%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 405 (66.9%)			# CHECK-NEXT: 0, 405 (66.9%)
	# CHECK-NEXT: 4, 200 (33.1%)			# CHECK-NEXT: 4, 200 (33.1%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 33 38 40			# CHECK-NEXT: PdEX 33 36 40
	# CHECK-NEXT: PdFPU 33 38 64			# CHECK-NEXT: PdFPU 33 36 64
	# CHECK-NEXT: PdLoad 37 40 40			# CHECK-NEXT: PdLoad 36 40 40
	# CHECK-NEXT: PdStore 0 0 24			# CHECK-NEXT: PdStore 0 0 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s

	Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.00 * movb %sil, (%rdx)			# CHECK-NEXT: 1 1 1.00 * movb %sil, (%rdx)
	# CHECK-NEXT: 1 1 1.00 * movb %dil, (%rbx)			# CHECK-NEXT: 1 1 1.00 * movb %dil, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 370 (91.8%)			# CHECK-NEXT: SQ - Store queue full: 371 (92.1%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 25 (6.2%)			# CHECK-NEXT: 0, 24 (6.0%)
	# CHECK-NEXT: 1, 370 (91.8%)			# CHECK-NEXT: 1, 372 (92.3%)
	# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 7 (1.7%)			# CHECK-NEXT: 4, 7 (1.7%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 3 (0.7%)			# CHECK-NEXT: 0, 3 (0.7%)
	# CHECK-NEXT: 1, 400 (99.3%)			# CHECK-NEXT: 1, 400 (99.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 23 40			# CHECK-NEXT: PdEX 21 22 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 23 24 24			# CHECK-NEXT: PdStore 22 23 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.00 * movw %si, (%rdx)			# CHECK-NEXT: 1 1 1.00 * movw %si, (%rdx)
	# CHECK-NEXT: 1 1 1.00 * movw %di, (%rbx)			# CHECK-NEXT: 1 1 1.00 * movw %di, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 370 (91.8%)			# CHECK-NEXT: SQ - Store queue full: 371 (92.1%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 25 (6.2%)			# CHECK-NEXT: 0, 24 (6.0%)
	# CHECK-NEXT: 1, 370 (91.8%)			# CHECK-NEXT: 1, 372 (92.3%)
	# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 7 (1.7%)			# CHECK-NEXT: 4, 7 (1.7%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 3 (0.7%)			# CHECK-NEXT: 0, 3 (0.7%)
	# CHECK-NEXT: 1, 400 (99.3%)			# CHECK-NEXT: 1, 400 (99.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 23 40			# CHECK-NEXT: PdEX 21 22 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 23 24 24			# CHECK-NEXT: PdStore 22 23 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.00 * movl %esi, (%rdx)			# CHECK-NEXT: 1 1 1.00 * movl %esi, (%rdx)
	# CHECK-NEXT: 1 1 1.00 * movl %edi, (%rbx)			# CHECK-NEXT: 1 1 1.00 * movl %edi, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 370 (91.8%)			# CHECK-NEXT: SQ - Store queue full: 371 (92.1%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 25 (6.2%)			# CHECK-NEXT: 0, 24 (6.0%)
	# CHECK-NEXT: 1, 370 (91.8%)			# CHECK-NEXT: 1, 372 (92.3%)
	# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 7 (1.7%)			# CHECK-NEXT: 4, 7 (1.7%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 3 (0.7%)			# CHECK-NEXT: 0, 3 (0.7%)
	# CHECK-NEXT: 1, 400 (99.3%)			# CHECK-NEXT: 1, 400 (99.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 23 40			# CHECK-NEXT: PdEX 21 22 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 23 24 24			# CHECK-NEXT: PdStore 22 23 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.00 * movq %rsi, (%rdx)			# CHECK-NEXT: 1 1 1.00 * movq %rsi, (%rdx)
	# CHECK-NEXT: 1 1 1.00 * movq %rdi, (%rbx)			# CHECK-NEXT: 1 1 1.00 * movq %rdi, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 370 (91.8%)			# CHECK-NEXT: SQ - Store queue full: 371 (92.1%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 25 (6.2%)			# CHECK-NEXT: 0, 24 (6.0%)
	# CHECK-NEXT: 1, 370 (91.8%)			# CHECK-NEXT: 1, 372 (92.3%)
	# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 7 (1.7%)			# CHECK-NEXT: 4, 7 (1.7%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 3 (0.7%)			# CHECK-NEXT: 0, 3 (0.7%)
	# CHECK-NEXT: 1, 400 (99.3%)			# CHECK-NEXT: 1, 400 (99.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 23 40			# CHECK-NEXT: PdEX 21 22 40
	# CHECK-NEXT: PdFPU 0 0 64			# CHECK-NEXT: PdFPU 0 0 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 23 24 24			# CHECK-NEXT: PdStore 22 23 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 2 1.50 * U movd %mm2, (%rdx)			# CHECK-NEXT: 1 2 1.50 * U movd %mm2, (%rdx)
	# CHECK-NEXT: 1 2 1.50 * U movd %mm3, (%rbx)			# CHECK-NEXT: 1 2 1.50 * U movd %mm3, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 747 (93.0%)			# CHECK-NEXT: SQ - Store queue full: 748 (93.2%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 422 (52.6%)			# CHECK-NEXT: 0, 422 (52.6%)
	# CHECK-NEXT: 1, 374 (46.6%)			# CHECK-NEXT: 1, 374 (46.6%)
	# CHECK-NEXT: 2, 1 (0.1%)			# CHECK-NEXT: 2, 1 (0.1%)
	# CHECK-NEXT: 4, 6 (0.7%)			# CHECK-NEXT: 4, 6 (0.7%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 403 (50.2%)			# CHECK-NEXT: 0, 403 (50.2%)
	# CHECK-NEXT: 1, 400 (49.8%)			# CHECK-NEXT: 1, 400 (49.8%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 23 40			# CHECK-NEXT: PdEX 21 23 40
	# CHECK-NEXT: PdFPU 22 23 64			# CHECK-NEXT: PdFPU 21 23 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 23 24 24			# CHECK-NEXT: PdStore 22 24 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 1 1 1.50 * movaps %xmm0, (%rax)			# CHECK-NEXT: 1 1 1.50 * movaps %xmm0, (%rax)
	# CHECK-NEXT: 1 1 1.50 * movaps %xmm1, (%rcx)			# CHECK-NEXT: 1 1 1.50 * movaps %xmm1, (%rcx)
	# CHECK-NEXT: 1 1 1.50 * movaps %xmm2, (%rdx)			# CHECK-NEXT: 1 1 1.50 * movaps %xmm2, (%rdx)
	# CHECK-NEXT: 1 1 1.50 * movaps %xmm3, (%rbx)			# CHECK-NEXT: 1 1 1.50 * movaps %xmm3, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 185 (30.7%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 372 (61.8%)			# CHECK-NEXT: SQ - Store queue full: 559 (92.9%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 223 (37.0%)			# CHECK-NEXT: 0, 222 (36.9%)
	# CHECK-NEXT: 1, 372 (61.8%)			# CHECK-NEXT: 1, 373 (62.0%)
	# CHECK-NEXT: 4, 7 (1.2%)			# CHECK-NEXT: 3, 1 (0.2%)
				# CHECK-NEXT: 4, 6 (1.0%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 202 (33.6%)			# CHECK-NEXT: 0, 202 (33.6%)
	# CHECK-NEXT: 1, 400 (66.4%)			# CHECK-NEXT: 1, 400 (66.4%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 22 24 40			# CHECK-NEXT: PdEX 21 23 40
	# CHECK-NEXT: PdFPU 22 24 64			# CHECK-NEXT: PdFPU 21 23 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 23 24 24			# CHECK-NEXT: PdStore 22 24 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 4 1 18.00 * vmovaps %ymm0, (%rax)			# CHECK-NEXT: 4 1 18.00 * vmovaps %ymm0, (%rax)
	# CHECK-NEXT: 4 1 18.00 * vmovaps %ymm1, (%rcx)			# CHECK-NEXT: 4 1 18.00 * vmovaps %ymm1, (%rcx)
	# CHECK-NEXT: 4 1 18.00 * vmovaps %ymm2, (%rdx)			# CHECK-NEXT: 4 1 18.00 * vmovaps %ymm2, (%rdx)
	# CHECK-NEXT: 4 1 18.00 * vmovaps %ymm3, (%rbx)			# CHECK-NEXT: 4 1 18.00 * vmovaps %ymm3, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 5963 (83.2%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 5777 (80.6%)
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 374 (5.2%)			# CHECK-NEXT: SQ - Store queue full: 561 (7.8%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 6770 (94.4%)			# CHECK-NEXT: 0, 6770 (94.4%)
	# CHECK-NEXT: 4, 400 (5.6%)			# CHECK-NEXT: 4, 400 (5.6%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines