This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/MCA/
-
llvm/
-
MCA/
1/2
Context.h
-
Stages/
9/10
MicroOpQueueStage.h
-
lib/MCA/
-
MCA/
1/1
CMakeLists.txt
3/3
Context.cpp
-
Stages/
-
MicroOpQueueStage.cpp
-
test/tools/llvm-mca/X86/
-
tools/
-
llvm-mca/
-
X86/
-
uop-queue.s
-
tools/llvm-mca/
-
llvm-mca/
2/2
llvm-mca.cpp

Differential D59928

[MCA] Add an experimental MicroOpQueue stage.
ClosedPublic

Authored by andreadb on Mar 28 2019, 6:03 AM.

Download Raw Diff

Details

Reviewers

courbet
RKSimon
mattd
lebedev.ri

Commits

rGe074ac60b452: [MCA] Add an experimental MicroOpQueue stage.
rL357248: [MCA] Add an experimental MicroOpQueue stage.

Summary

This patch adds an experimental stage named MicroOpQueueStage.
A MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically, a decoupling queue between 'decode' and 'dispatch').
Users can specify a queue size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage - can be used to simulate a different throughput from the decoders).

This stage is added to the default pipeline between the EntryStage and the DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag -micro-op-queue-size.

Throughput from the decoder can be simulated via another hidden flag named -decoder-throughput.
That flag allows us to quickly experiment with different frontend throughputs. For targets that declare a loop buffer, flag -decoder-throughput allows users to do multiple runs, each time simulating a different throughput from the decoders.

This stage can/will be extended in future. For example, we could add a "buffer full" events to identify bottlenecks caused by backpressure. flag -decoder-throughput would probably go away if in future we delegate to another stage (DecoderStage?) the simulation of a (potentially variable) throughput from the decoders.
For now, flag -decoder-throughput is "good enough" to run some simple experiments.

Let me know if okay to commit.

-Andrea

Diff Detail

Event Timeline

andreadb created this revision.Mar 28 2019, 6:03 AM

Herald added subscribers: jdoerfert, gbedwell, mgorny. · View Herald TranscriptMar 28 2019, 6:03 AM

Only cosmetic comments, this looks good !

include/llvm/MCA/Context.h
42	Maybe add a comment: `// Instructions per cycle.`
include/llvm/MCA/Stages/MicroOpQueueStage.h
45	This duplicates Buffer.size(), what about removing it or making it an accessor ?
tools/llvm-mca/llvm-mca.cpp
110	This can be understood as input (instructions) or output (uops) throughput, so I think we should clarify: `Maximum throughput from the decoders (instructions per cycle)`

andreadb marked 3 inline comments as done.Mar 28 2019, 7:52 AM

andreadb added inline comments.

include/llvm/MCA/Context.h
42	Sure. I will add that comment.
include/llvm/MCA/Stages/MicroOpQueueStage.h
45	Good point. I will remove it. I will move that code comment to method `getNormalizedOpcodes`.
tools/llvm-mca/llvm-mca.cpp
110	I will change it.

Patch updated.

Address review comments:

removed field BufferSize from MicroOpQueueStage
added code comments
improved flag description in the llvm-mca driver.

RKSimon added inline comments.Mar 28 2019, 10:29 AM

lib/MCA/CMakeLists.txt
13	Alphabetical order?
lib/MCA/Context.cpp
21	order?

Address review comment.

andreadb marked 4 inline comments as done.Mar 28 2019, 10:42 AM

LGTM!

include/llvm/MCA/Stages/MicroOpQueueStage.h
2	nit: s/MicroOpQueue.h/MicroOpQueueStage.h/
17	nit: Perhaps rename the guard to "LLVM_MCA_MICRO_OP_QUEUE_STAGE" . I know that looks a little nasty, but it follows the header-guard convention.
52	Unnecessary-pedantic nit: Your `std::min` call at line 56 is written in a different order, with `std::min(Buffer.size(), NumMicroOpcodes)`
89	nit: Copy-pasta comment. Now I know what your favorite stage is.
lib/MCA/Context.cpp
60	Do you need a std::move here?

mattd accepted this revision.Mar 28 2019, 9:15 PM

This revision is now accepted and ready to land.Mar 28 2019, 9:15 PM

andreadb marked 5 inline comments as done.Mar 29 2019, 4:19 AM

andreadb added inline comments.

include/llvm/MCA/Stages/MicroOpQueueStage.h
2	Thanks for catching it. I will fix it.
17	You are absolutely right. The original name of that stage was BufferStage. But then I opted for MicroOpQueueStage. Apparently I forgot to find&replace all the occurrences of buffer stage in this code...
52	.. or I can just write "minimum between NumMicroOpcodes and the buffer size".
89	hehehe. :-)
lib/MCA/Context.cpp
60	I'll double check before committing it. Probably not...

Closed by commit rL357248: [MCA] Add an experimental MicroOpQueue stage. (authored by adibiagio). · Explain WhyMar 29 2019, 5:15 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2019, 5:15 AM

Revision Contents

Path

Size

include/

llvm/

MCA/

Context.h

10 lines

Stages/

MicroOpQueueStage.h

88 lines

lib/

MCA/

CMakeLists.txt

1 line

Context.cpp

4 lines

Stages/

MicroOpQueueStage.cpp

70 lines

test/

tools/

llvm-mca/

X86/

uop-queue.s

105 lines

tools/

llvm-mca/

llvm-mca.cpp

17 lines

Diff 192639

include/llvm/MCA/Context.h

	Show All 25 Lines
	#include <memory>			#include <memory>

	namespace llvm {			namespace llvm {
	namespace mca {			namespace mca {

	/// This is a convenience struct to hold the parameters necessary for creating			/// This is a convenience struct to hold the parameters necessary for creating
	/// the pre-built "default" out-of-order pipeline.			/// the pre-built "default" out-of-order pipeline.
	struct PipelineOptions {			struct PipelineOptions {
	PipelineOptions(unsigned DW, unsigned RFS, unsigned LQS, unsigned SQS,			PipelineOptions(unsigned UOPQSize, unsigned DecThr, unsigned DW, unsigned RFS,
	bool NoAlias, bool ShouldEnableBottleneckAnalysis = false)			unsigned LQS, unsigned SQS, bool NoAlias,
	: DispatchWidth(DW), RegisterFileSize(RFS), LoadQueueSize(LQS),			bool ShouldEnableBottleneckAnalysis = false)
				: MicroOpQueueSize(UOPQSize), DecodersThroughput(DecThr),
				DispatchWidth(DW), RegisterFileSize(RFS), LoadQueueSize(LQS),
	StoreQueueSize(SQS), AssumeNoAlias(NoAlias),			StoreQueueSize(SQS), AssumeNoAlias(NoAlias),
	EnableBottleneckAnalysis(ShouldEnableBottleneckAnalysis) {}			EnableBottleneckAnalysis(ShouldEnableBottleneckAnalysis) {}
				unsigned MicroOpQueueSize;
				unsigned DecodersThroughput; // Instructions per cycle.
				courbetUnsubmitted Not Done Reply Inline Actions Maybe add a comment: `// Instructions per cycle.` courbet: Maybe add a comment: `// Instructions per cycle.`
				andreadbAuthorUnsubmitted Done Reply Inline Actions Sure. I will add that comment. andreadb: Sure. I will add that comment.
	unsigned DispatchWidth;			unsigned DispatchWidth;
	unsigned RegisterFileSize;			unsigned RegisterFileSize;
	unsigned LoadQueueSize;			unsigned LoadQueueSize;
	unsigned StoreQueueSize;			unsigned StoreQueueSize;
	bool AssumeNoAlias;			bool AssumeNoAlias;
	bool EnableBottleneckAnalysis;			bool EnableBottleneckAnalysis;
	};			};

	Show All 24 Lines

include/llvm/MCA/Stages/MicroOpQueueStage.h

				//===---------------------- MicroOpQueue.h ----------------------- C++ --===//
				//
				mattdUnsubmitted Not Done Reply Inline Actions nit: s/MicroOpQueue.h/MicroOpQueueStage.h/ mattd: nit: s/MicroOpQueue.h/MicroOpQueueStage.h/
				andreadbAuthorUnsubmitted Done Reply Inline Actions Thanks for catching it. I will fix it. andreadb: Thanks for catching it. I will fix it.
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file defines a stage that implements a queue of micro opcodes.
				/// It can be used to simulate a hardware micro-op queue that serves opcodes to
				/// the out of order backend.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_MCA_BUFFER_STAGE_H
				#define LLVM_MCA_BUFFER_STAGE_H
				mattdUnsubmitted Not Done Reply Inline Actions nit: Perhaps rename the guard to "LLVM_MCA_MICRO_OP_QUEUE_STAGE" . I know that looks a little nasty, but it follows the header-guard convention. mattd: nit: Perhaps rename the guard to "LLVM_MCA_MICRO_OP_QUEUE_STAGE" . I know that looks a little…
				andreadbAuthorUnsubmitted Done Reply Inline Actions You are absolutely right. The original name of that stage was BufferStage. But then I opted for MicroOpQueueStage. Apparently I forgot to find&replace all the occurrences of buffer stage in this code... andreadb: You are absolutely right. The original name of that stage was BufferStage. But then I opted for…

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/MCA/Stages/Stage.h"

				namespace llvm {
				namespace mca {

				/// A stage that simulates a queue of instruction opcodes.
				class MicroOpQueueStage : public Stage {
				SmallVector<InstRef, 8> Buffer;
				unsigned NextAvailableSlotIdx;
				unsigned CurrentInstructionSlotIdx;

				// Limits the number of instructions that can be written to this buffer every
				// cycle. A value of zero means that there is no limit to the instruction
				// throughput in input.
				const unsigned MaxIPC;
				unsigned CurrentIPC;

				// Number of entries that are available during this cycle.
				unsigned AvailableEntries;

				// True if instructions dispatched to this stage don't need to wait for the
				// next cycle before moving to the next stage.
				// False if this buffer acts as a one cycle delay in the execution pipeline.
				bool IsZeroLatencyStage;

				MicroOpQueueStage(const MicroOpQueueStage &Other) = delete;
				courbetUnsubmitted Done Reply Inline Actions This duplicates Buffer.size(), what about removing it or making it an accessor ? courbet: This duplicates Buffer.size(), what about removing it or making it an accessor ?
				andreadbAuthorUnsubmitted Done Reply Inline Actions Good point. I will remove it. I will move that code comment to method `getNormalizedOpcodes`. andreadb: Good point. I will remove it. I will move that code comment to method `getNormalizedOpcodes`.
				MicroOpQueueStage &operator=(const MicroOpQueueStage &Other) = delete;

				// By default, an instruction consumes a number of buffer entries equal to its
				// number of micro opcodes (see field `InstrDesc::NumMicroOpcodes`). The
				// number of entries consumed by an instruction is normalized to the
				// std::min(NumMicroOpcodes, Buffer.size()). This is to avoid problems with
				// (microcoded) instructions that generate a number of micro opcodes than
				mattdUnsubmitted Not Done Reply Inline Actions Unnecessary-pedantic nit: Your `std::min` call at line 56 is written in a different order, with `std::min(Buffer.size(), NumMicroOpcodes)` mattd: Unnecessary-pedantic nit: Your `std::min` call at line 56 is written in a different order, with…
				andreadbAuthorUnsubmitted Done Reply Inline Actions .. or I can just write "minimum between NumMicroOpcodes and the buffer size". andreadb: .. or I can just write "minimum between NumMicroOpcodes and the buffer size".
				// doesn't fit in the buffer.
				unsigned getNormalizedOpcodes(const InstRef &IR) const {
				unsigned NormalizedOpcodes =
				std::min(static_cast<unsigned>(Buffer.size()),
				IR.getInstruction()->getDesc().NumMicroOps);
				return NormalizedOpcodes ? NormalizedOpcodes : 1U;
				}

				Error moveInstructions();

				public:
				MicroOpQueueStage(unsigned Size, unsigned IPC = 0,
				bool ZeroLatencyStage = true);

				bool isAvailable(const InstRef &IR) const override {
				if (MaxIPC && CurrentIPC == MaxIPC)
				return false;
				unsigned NormalizedOpcodes = getNormalizedOpcodes(IR);
				if (NormalizedOpcodes > AvailableEntries)
				return false;
				return true;
				}

				bool hasWorkToComplete() const override {
				return AvailableEntries != Buffer.size();
				}

				Error execute(InstRef &IR) override;
				Error cycleStart() override;
				Error cycleEnd() override;
				};

				} // namespace mca
				} // namespace llvm

				#endif // LLVM_MCA_FETCH_STAGE_H
				mattdUnsubmitted Not Done Reply Inline Actions nit: Copy-pasta comment. Now I know what your favorite stage is. mattd: nit: Copy-pasta comment. Now I know what your favorite stage is.
				andreadbAuthorUnsubmitted Done Reply Inline Actions hehehe. :-) andreadb: hehehe. :-)

lib/MCA/CMakeLists.txt

	add_llvm_library(LLVMMCA			add_llvm_library(LLVMMCA
	Context.cpp			Context.cpp
	HWEventListener.cpp			HWEventListener.cpp
	HardwareUnits/HardwareUnit.cpp			HardwareUnits/HardwareUnit.cpp
	HardwareUnits/LSUnit.cpp			HardwareUnits/LSUnit.cpp
	HardwareUnits/RegisterFile.cpp			HardwareUnits/RegisterFile.cpp
	HardwareUnits/ResourceManager.cpp			HardwareUnits/ResourceManager.cpp
	HardwareUnits/RetireControlUnit.cpp			HardwareUnits/RetireControlUnit.cpp
	HardwareUnits/Scheduler.cpp			HardwareUnits/Scheduler.cpp
	InstrBuilder.cpp			InstrBuilder.cpp
	Instruction.cpp			Instruction.cpp
	Pipeline.cpp			Pipeline.cpp
				Stages/MicroOpQueueStage.cpp
				RKSimonUnsubmitted Done Reply Inline Actions Alphabetical order? RKSimon: Alphabetical order?
	Stages/DispatchStage.cpp			Stages/DispatchStage.cpp
	Stages/EntryStage.cpp			Stages/EntryStage.cpp
	Stages/ExecuteStage.cpp			Stages/ExecuteStage.cpp
	Stages/InstructionTables.cpp			Stages/InstructionTables.cpp
	Stages/RetireStage.cpp			Stages/RetireStage.cpp
	Stages/Stage.cpp			Stages/Stage.cpp
	Support.cpp			Support.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/MCA			${LLVM_MAIN_INCLUDE_DIR}/llvm/MCA
	)			)

lib/MCA/Context.cpp

Show All 12 Lines
/// stages.		/// stages.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/MCA/Context.h"		#include "llvm/MCA/Context.h"
#include "llvm/MCA/HardwareUnits/RegisterFile.h"		#include "llvm/MCA/HardwareUnits/RegisterFile.h"
#include "llvm/MCA/HardwareUnits/RetireControlUnit.h"		#include "llvm/MCA/HardwareUnits/RetireControlUnit.h"
#include "llvm/MCA/HardwareUnits/Scheduler.h"		#include "llvm/MCA/HardwareUnits/Scheduler.h"
		#include "llvm/MCA/Stages/MicroOpQueueStage.h"
		RKSimonUnsubmitted Done Reply Inline Actions order? RKSimon: order?
#include "llvm/MCA/Stages/DispatchStage.h"		#include "llvm/MCA/Stages/DispatchStage.h"
#include "llvm/MCA/Stages/EntryStage.h"		#include "llvm/MCA/Stages/EntryStage.h"
#include "llvm/MCA/Stages/ExecuteStage.h"		#include "llvm/MCA/Stages/ExecuteStage.h"
#include "llvm/MCA/Stages/RetireStage.h"		#include "llvm/MCA/Stages/RetireStage.h"

namespace llvm {		namespace llvm {
namespace mca {		namespace mca {

Show All 21 Lines	Context::createDefaultPipeline(const PipelineOptions &Opts, InstrBuilder &IB,
addHardwareUnit(std::move(RCU));		addHardwareUnit(std::move(RCU));
addHardwareUnit(std::move(PRF));		addHardwareUnit(std::move(PRF));
addHardwareUnit(std::move(LSU));		addHardwareUnit(std::move(LSU));
addHardwareUnit(std::move(HWS));		addHardwareUnit(std::move(HWS));

// Build the pipeline.		// Build the pipeline.
auto StagePipeline = llvm::make_unique<Pipeline>();		auto StagePipeline = llvm::make_unique<Pipeline>();
StagePipeline->appendStage(std::move(Fetch));		StagePipeline->appendStage(std::move(Fetch));
		if (Opts.MicroOpQueueSize)
		StagePipeline->appendStage(std::move(llvm::make_unique<MicroOpQueueStage>(
		mattdUnsubmitted Not Done Reply Inline Actions Do you need a std::move here? mattd: Do you need a std::move here?
		andreadbAuthorUnsubmitted Done Reply Inline Actions I'll double check before committing it. Probably not... andreadb: I'll double check before committing it. Probably not...
		Opts.MicroOpQueueSize, Opts.DecodersThroughput)));
StagePipeline->appendStage(std::move(Dispatch));		StagePipeline->appendStage(std::move(Dispatch));
StagePipeline->appendStage(std::move(Execute));		StagePipeline->appendStage(std::move(Execute));
StagePipeline->appendStage(std::move(Retire));		StagePipeline->appendStage(std::move(Retire));
return StagePipeline;		return StagePipeline;
}		}

} // namespace mca		} // namespace mca
} // namespace llvm		} // namespace llvm

lib/MCA/Stages/MicroOpQueueStage.cpp

				//===---------------------- MicroOpQueueStage.cpp ---------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file defines the MicroOpQueueStage.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/MCA/Stages/MicroOpQueueStage.h"

				namespace llvm {
				namespace mca {

				#define DEBUG_TYPE "llvm-mca"

				Error MicroOpQueueStage::moveInstructions() {
				InstRef IR = Buffer[CurrentInstructionSlotIdx];
				while (IR && checkNextStage(IR)) {
				if (llvm::Error Val = moveToTheNextStage(IR))
				return Val;

				Buffer[CurrentInstructionSlotIdx].invalidate();
				unsigned NormalizedOpcodes = getNormalizedOpcodes(IR);
				CurrentInstructionSlotIdx += NormalizedOpcodes;
				CurrentInstructionSlotIdx %= Buffer.size();
				AvailableEntries += NormalizedOpcodes;
				IR = Buffer[CurrentInstructionSlotIdx];
				}

				return llvm::ErrorSuccess();
				}

				MicroOpQueueStage::MicroOpQueueStage(unsigned Size, unsigned IPC,
				bool ZeroLatencyStage)
				: NextAvailableSlotIdx(0), CurrentInstructionSlotIdx(0), MaxIPC(IPC),
				CurrentIPC(0), IsZeroLatencyStage(ZeroLatencyStage) {
				Buffer.resize(Size ? Size : 1);
				AvailableEntries = Buffer.size();
				}

				Error MicroOpQueueStage::execute(InstRef &IR) {
				Buffer[NextAvailableSlotIdx] = IR;
				unsigned NormalizedOpcodes = getNormalizedOpcodes(IR);
				NextAvailableSlotIdx += NormalizedOpcodes;
				NextAvailableSlotIdx %= Buffer.size();
				AvailableEntries -= NormalizedOpcodes;
				++CurrentIPC;
				return llvm::ErrorSuccess();
				}

				Error MicroOpQueueStage::cycleStart() {
				CurrentIPC = 0;
				if (!IsZeroLatencyStage)
				return moveInstructions();
				return llvm::ErrorSuccess();
				}

				Error MicroOpQueueStage::cycleEnd() {
				if (IsZeroLatencyStage)
				return moveInstructions();
				return llvm::ErrorSuccess();
				}

				} // namespace mca
				} // namespace llvm

test/tools/llvm-mca/X86/uop-queue.s

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -iterations=1500 -micro-op-queue-size=1 -all-views=false -summary-view < %s \| FileCheck %s -check-prefix=HASWELL-UOPQ-1
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -iterations=1500 -micro-op-queue-size=2 -all-views=false -summary-view < %s \| FileCheck %s -check-prefix=HASWELL-UOPQ-2
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -iterations=1500 -micro-op-queue-size=3 -all-views=false -summary-view < %s \| FileCheck %s -check-prefix=HASWELL-UOPQ-3
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -iterations=1500 -micro-op-queue-size=4 -all-views=false -summary-view < %s \| FileCheck %s -check-prefix=HASWELL-UOPQ-4
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -iterations=1500 -micro-op-queue-size=4 -decoder-throughput=2 -all-views=false -summary-view < %s \| FileCheck %s -check-prefix=HASWELL-DEC-2

				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -micro-op-queue-size=1 -all-views=false -summary-view < %s \| FileCheck %s -check-prefix=BTVER2-UOPQ-1
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -micro-op-queue-size=2 -all-views=false -summary-view < %s \| FileCheck %s -check-prefix=BTVER2-UOPQ-2
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -micro-op-queue-size=4 -decoder-throughput=1 -all-views=false -summary-view < %s \| FileCheck %s -check-prefix=BTVER2-DEC-1

				add %eax, %eax
				add %ebx, %ebx
				add %ecx, %ecx
				add %edx, %edx

				# BTVER2-DEC-2: Iterations: 1500
				# BTVER2-DEC-2-NEXT: Instructions: 6000
				# BTVER2-DEC-2-NEXT: Total Cycles: 3003
				# BTVER2-DEC-2-NEXT: Total uOps: 6000

				# BTVER2-DEC-2: Dispatch Width: 2
				# BTVER2-DEC-2-NEXT: uOps Per Cycle: 2.00
				# BTVER2-DEC-2-NEXT: IPC: 2.00
				# BTVER2-DEC-2-NEXT: Block RThroughput: 2.0

				# BTVER2-DEC-1: Iterations: 1500
				# BTVER2-DEC-1-NEXT: Instructions: 6000
				# BTVER2-DEC-1-NEXT: Total Cycles: 6003
				# BTVER2-DEC-1-NEXT: Total uOps: 6000

				# BTVER2-UOPQ-1: Iterations: 1500
				# BTVER2-UOPQ-1-NEXT: Instructions: 6000
				# BTVER2-UOPQ-1-NEXT: Total Cycles: 6003
				# BTVER2-UOPQ-1-NEXT: Total uOps: 6000

				# BTVER2-UOPQ-2: Iterations: 1500
				# BTVER2-UOPQ-2-NEXT: Instructions: 6000
				# BTVER2-UOPQ-2-NEXT: Total Cycles: 3003
				# BTVER2-UOPQ-2-NEXT: Total uOps: 6000

				# HASWELL-DEC-2: Iterations: 1500
				# HASWELL-DEC-2-NEXT: Instructions: 6000
				# HASWELL-DEC-2-NEXT: Total Cycles: 3003
				# HASWELL-DEC-2-NEXT: Total uOps: 6000

				# HASWELL-UOPQ-1: Iterations: 1500
				# HASWELL-UOPQ-1-NEXT: Instructions: 6000
				# HASWELL-UOPQ-1-NEXT: Total Cycles: 6003
				# HASWELL-UOPQ-1-NEXT: Total uOps: 6000

				# HASWELL-UOPQ-2: Iterations: 1500
				# HASWELL-UOPQ-2-NEXT: Instructions: 6000
				# HASWELL-UOPQ-2-NEXT: Total Cycles: 3003
				# HASWELL-UOPQ-2-NEXT: Total uOps: 6000

				# HASWELL-UOPQ-3: Iterations: 1500
				# HASWELL-UOPQ-3-NEXT: Instructions: 6000
				# HASWELL-UOPQ-3-NEXT: Total Cycles: 2003
				# HASWELL-UOPQ-3-NEXT: Total uOps: 6000

				# HASWELL-UOPQ-4: Iterations: 1500
				# HASWELL-UOPQ-4-NEXT: Instructions: 6000
				# HASWELL-UOPQ-4-NEXT: Total Cycles: 1503
				# HASWELL-UOPQ-4-NEXT: Total uOps: 6000

				# BTVER2-DEC-1: Dispatch Width: 2
				# BTVER2-DEC-1-NEXT: uOps Per Cycle: 1.00
				# BTVER2-DEC-1-NEXT: IPC: 1.00
				# BTVER2-DEC-1-NEXT: Block RThroughput: 2.0

				# BTVER2-UOPQ-1: Dispatch Width: 2
				# BTVER2-UOPQ-1-NEXT: uOps Per Cycle: 1.00
				# BTVER2-UOPQ-1-NEXT: IPC: 1.00
				# BTVER2-UOPQ-1-NEXT: Block RThroughput: 2.0

				# BTVER2-UOPQ-2: Dispatch Width: 2
				# BTVER2-UOPQ-2-NEXT: uOps Per Cycle: 2.00
				# BTVER2-UOPQ-2-NEXT: IPC: 2.00
				# BTVER2-UOPQ-2-NEXT: Block RThroughput: 2.0

				# HASWELL-DEC-2: Dispatch Width: 4
				# HASWELL-DEC-2-NEXT: uOps Per Cycle: 2.00
				# HASWELL-DEC-2-NEXT: IPC: 2.00
				# HASWELL-DEC-2-NEXT: Block RThroughput: 1.0

				# HASWELL-UOPQ-1: Dispatch Width: 4
				# HASWELL-UOPQ-1-NEXT: uOps Per Cycle: 1.00
				# HASWELL-UOPQ-1-NEXT: IPC: 1.00
				# HASWELL-UOPQ-1-NEXT: Block RThroughput: 1.0

				# HASWELL-UOPQ-2: Dispatch Width: 4
				# HASWELL-UOPQ-2-NEXT: uOps Per Cycle: 2.00
				# HASWELL-UOPQ-2-NEXT: IPC: 2.00
				# HASWELL-UOPQ-2-NEXT: Block RThroughput: 1.0

				# HASWELL-UOPQ-3: Dispatch Width: 4
				# HASWELL-UOPQ-3-NEXT: uOps Per Cycle: 3.00
				# HASWELL-UOPQ-3-NEXT: IPC: 3.00
				# HASWELL-UOPQ-3-NEXT: Block RThroughput: 1.0

				# HASWELL-UOPQ-4: Dispatch Width: 4
				# HASWELL-UOPQ-4-NEXT: uOps Per Cycle: 3.99
				# HASWELL-UOPQ-4-NEXT: IPC: 3.99
				# HASWELL-UOPQ-4-NEXT: Block RThroughput: 1.0

tools/llvm-mca/llvm-mca.cpp

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	DispatchWidth("dispatch", cl::desc("Override the processor dispatch width"),
cl::cat(ToolOptions), cl::init(0));		cl::cat(ToolOptions), cl::init(0));

static cl::opt<unsigned>		static cl::opt<unsigned>
RegisterFileSize("register-file-size",		RegisterFileSize("register-file-size",
cl::desc("Maximum number of physical registers which can "		cl::desc("Maximum number of physical registers which can "
"be used for register mappings"),		"be used for register mappings"),
cl::cat(ToolOptions), cl::init(0));		cl::cat(ToolOptions), cl::init(0));

		static cl::opt<unsigned>
		MicroOpQueue("micro-op-queue-size", cl::Hidden,
		cl::desc("Number of entries in the micro-op queue"),
		cl::cat(ToolOptions), cl::init(0));

		static cl::opt<unsigned>
		DecoderThroughput("decoder-throughput", cl::Hidden,
		cl::desc("Maximum throughput from the decoders "
		courbetUnsubmitted Done Reply Inline Actions This can be understood as input (instructions) or output (uops) throughput, so I think we should clarify: `Maximum throughput from the decoders (instructions per cycle)` courbet: This can be understood as input (instructions) or output (uops) throughput, so I think we…
		andreadbAuthorUnsubmitted Done Reply Inline Actions I will change it. andreadb: I will change it.
		"(instructions per cycle)"),
		cl::cat(ToolOptions), cl::init(0));

static cl::opt<bool>		static cl::opt<bool>
PrintRegisterFileStats("register-file-stats",		PrintRegisterFileStats("register-file-stats",
cl::desc("Print register file statistics"),		cl::desc("Print register file statistics"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

static cl::opt<bool> PrintDispatchStats("dispatch-stats",		static cl::opt<bool> PrintDispatchStats("dispatch-stats",
cl::desc("Print dispatch statistics"),		cl::desc("Print dispatch statistics"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
const MCSchedModel &SM = STI->getSchedModel();		const MCSchedModel &SM = STI->getSchedModel();

// Create an instruction builder.		// Create an instruction builder.
mca::InstrBuilder IB(STI, MCII, *MRI, MCIA.get());		mca::InstrBuilder IB(STI, MCII, *MRI, MCIA.get());

// Create a context to control ownership of the pipeline hardware.		// Create a context to control ownership of the pipeline hardware.
mca::Context MCA(MRI, STI);		mca::Context MCA(MRI, STI);

mca::PipelineOptions PO(DispatchWidth, RegisterFileSize, LoadQueueSize,		mca::PipelineOptions PO(MicroOpQueue, DecoderThroughput, DispatchWidth,
StoreQueueSize, AssumeNoAlias,		RegisterFileSize, LoadQueueSize, StoreQueueSize,
EnableBottleneckAnalysis);		AssumeNoAlias, EnableBottleneckAnalysis);

// Number each region in the sequence.		// Number each region in the sequence.
unsigned RegionIdx = 0;		unsigned RegionIdx = 0;

for (const std::unique_ptr<mca::CodeRegion> &Region : Regions) {		for (const std::unique_ptr<mca::CodeRegion> &Region : Regions) {
// Skip empty code regions.		// Skip empty code regions.
if (Region->empty())		if (Region->empty())
continue;		continue;
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines