This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/CommandGuide/
-
CommandGuide/
-
llvm-mca.rst
-
include/llvm/MCA/
-
llvm/
-
MCA/
-
Context.h
-
HWEventListener.h
-
HardwareUnits/
-
Scheduler.h
-
Instruction.h
-
Stages/
-
ExecuteStage.h
-
lib/MCA/
-
MCA/
-
Context.cpp
-
HardwareUnits/
-
Scheduler.cpp
-
Stages/
-
ExecuteStage.cpp
-
test/tools/llvm-mca/X86/BtVer2/
-
tools/
-
llvm-mca/
-
X86/
-
BtVer2/
-
bottleneck-hints-1.s
-
bottleneck-hints-2.s
-
bottleneck-hints-3.s
-
tools/llvm-mca/
-
llvm-mca/
-
Views/
-
SummaryView.h
-
SummaryView.cpp
-
llvm-mca.cpp

Differential D58728

[MCA] Highlight kernel bottlenecks in the summary view.
ClosedPublic

Authored by andreadb on Feb 27 2019, 10:58 AM.

Download Raw Diff

Details

Reviewers

mattd
RKSimon
courbet
lebedev.ri

Commits

rGbe3281a281e3: [MCA] Highlight kernel bottlenecks in the summary view.
rL355308: [MCA] Highlight kernel bottlenecks in the summary view.

Summary

This patch adds a new flag named -bottleneck-analysis to print out information about throughput bottlenecks.

MCA knows how to identify and classify dynamic dispatch stalls. However, it doesn't know how to analyze and highlight kernel bottlenecks.
The goal of this patch is to teach MCA how to correlate increases in backend pressure to backend stalls (and therefore, the loss of throughput).

From a Scheduler point of view, backend pressure is a function of the scheduler buffer usage (i.e. how the number of uOps in the scheduler buffers changes over time). Backend pressure increases (or decreases) when there is a mismatch between the number of opcodes dispatched, and the number of opcodes issued in the same cycle.
Since buffer resources are limited, continuous increases in backend pressure would eventually leads to dispatch stalls. So, there is a strong correlation between dispatch stalls, and how backpressure changed over time.

This patch teaches how to identify situations where backend pressure increases due to:

unavailable pipeline resources.
data dependencies.

Data dependencies may delay execution of instructions and therefore increase the time that uOps have to spend in the scheduler buffers. That often translates to an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and lead to a temporary increase in backend pressure.

Internally, the Scheduler classifies instructions based on whether register/memory operands are available or not.

An instruction is marked as "ready to execute" only if data dependencies are fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready to execute. If an instruction cannot execute because of unavailable pipeline resources, then the Scheduler internally updates a BusyResourceUnits mask with the ID of each unavailable resource.

ExecuteStage is responsible for tracking changes in backend pressure. If backend pressure increases during a cycle because of contention on pipeline resources, then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource pressure, as well as the BusyResourceUnits mask.

Note that ExecuteStage also knows how to identify situations where backpressure increased because of delays introduced by data dependencies.

The SummaryView observes "backend pressure" events and prints out a "bottleneck report".

Example of bottleneck report:

Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
  Resource Pressure       [ 0.00% ]
  Data Dependencies:      [ 99.89% ]
   - Register Dependencies [ 0.00% ]
   - Memory Dependencies   [ 99.89% ]

A bottleneck report is printed out only if increases in backend pressure eventually caused backend stalls.

About the time complexity:
The average slowdown tends to be in the range of ~5-6%.

Time complexity is linear in the number of instructions in the Scheduler::PendingSet.
For memory intensive kernels, the slowdown can be significant if flag -noalias=false is specified. In the worst case scenario I have observed a slowdown of ~30% when flag -noalias=false was specified.
We can definitely recover part of that slowdown if we optimize class LSUnit (by doing extra bookkeeping to speedup queries).

This new analysis is disabled by default, and it can be enabled via flag -bottleneck-analysis.
Users of MCA as a library can enable the generation of pressure events through the constructor of ExecuteStage.

This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494
A follow up patch will extend the "scheduler-stats" view to also print out:

the most problematic register dependencies (top 3)
the most problematic memory dependencies (top 3)
instructions mostly affected by bottlenecks caused by pipeline pressures (top 3).

That change plus this patch should fully address PR37494.

Let me know if okay to commit.

-Andrea

Diff Detail

Repository: rL LLVM

Event Timeline

andreadb created this revision.Feb 27 2019, 10:58 AM

Herald added subscribers: jdoerfert, gbedwell, javed.absar. · View Herald TranscriptFeb 27 2019, 10:58 AM

andreadb added a reviewer: lebedev.ri.Feb 27 2019, 11:20 AM

ormris added a subscriber: ormris.Feb 27 2019, 11:28 AM

lebedev.ri added inline comments.Feb 27 2019, 12:34 PM

include/llvm/MCA/HWEventListener.h
143 ↗	(On Diff #188574)	HWPressureEvent(GenericReason reason, ?
147 ↗	(On Diff #188574)	`GenericReason`?
lib/MCA/HardwareUnits/Scheduler.cpp
232–233 ↗	(On Diff #188574)	Isn't `ReadySet` a `std::vector`? It would be best to do something like `Insts.insert(Insts.end(), ReadySet.begin(), ReadySet.end());` E.g. because this way you avoid constant reallocations since you tell beforehand the amount of entries you want to insert.

I like this and think users will find it very helpful. The changes seem sensible to me; however, I had a few nits... mostly stylistic things, no big deals. Anyways, I'll mark this patch as accept, as long as you cover the suggestions made by @lebedev.ri.

include/llvm/MCA/Stages/ExecuteStage.h
33 ↗	(On Diff #188574)	Where is 'DispatchStallCycle' used? I see it initialized, but not accessed anywhere in this patch.
35 ↗	(On Diff #188574)	'True if this stage should notify listeners of HWPressureEvents'
lib/MCA/HardwareUnits/Scheduler.cpp
186 ↗	(On Diff #188574)	As a side note, perhaps we should consider addding a getDesc() accessor to InstRef. The pattern `getInstruction().getDesc()` or similar, is used throughout MCA.
250 ↗	(On Diff #188574)	Suggestion: Remove the last 'continue' and add an 'else { RegDeps.emplace_back(IR); }'
tools/llvm-mca/Views/SummaryView.cpp
28 ↗	(On Diff #188574)	Perhaps we should introduce a constructor to `struct BackPressureInfo` that initializes the instance, instead of explicitly zeroing the members here.
70 ↗	(On Diff #188574)	Seems like the case of INVALID should never occur. If my assumption is true, should we toss in an assert here?
tools/llvm-mca/llvm-mca.cpp
180 ↗	(On Diff #188574)	s/the fault/default/

This revision is now accepted and ready to land.Feb 27 2019, 2:59 PM

andreadb edited the summary of this revision. (Show Details)Feb 28 2019, 3:45 AM

Patch updated.

Addressed review comments.

mattd accepted this revision.Mar 1 2019, 10:29 AM

LGTM

Closed by commit rL355308: [MCA] Highlight kernel bottlenecks in the summary view. (authored by adibiagio). · Explain WhyMar 4 2019, 3:52 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2019, 3:52 AM

Revision Contents

Path

Size

llvm/

trunk/

docs/

CommandGuide/

llvm-mca.rst

6 lines

include/

llvm/

MCA/

Context.h

6 lines

HWEventListener.h

30 lines

HardwareUnits/

Scheduler.h

21 lines

Instruction.h

14 lines

Stages/

ExecuteStage.h

12 lines

lib/

MCA/

Context.cpp

3 lines

HardwareUnits/

Scheduler.cpp

28 lines

Stages/

ExecuteStage.cpp

43 lines

test/

tools/

llvm-mca/

X86/

BtVer2/

bottleneck-hints-1.s

85 lines

bottleneck-hints-2.s

72 lines

bottleneck-hints-3.s

106 lines

tools/

llvm-mca/

Views/

SummaryView.h

46 lines

SummaryView.cpp

102 lines

llvm-mca.cpp

8 lines

Diff 189132

llvm/trunk/docs/CommandGuide/llvm-mca.rst

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	.. option:: -instruction-tables			.. option:: -instruction-tables

	Prints resource pressure information based on the static information			Prints resource pressure information based on the static information
	available from the processor model. This differs from the resource pressure			available from the processor model. This differs from the resource pressure
	view because it doesn't require that the code is simulated. It instead prints			view because it doesn't require that the code is simulated. It instead prints
	the theoretical uniform distribution of resource pressure for every			the theoretical uniform distribution of resource pressure for every
	instruction in sequence.			instruction in sequence.

				.. option:: -bottleneck-analysis

				Print information about bottlenecks that affect the throughput. This analysis
				can be expensive, and it is disabled by default. Bottlenecks are highlighted
				in the summary view.


	EXIT STATUS			EXIT STATUS
	-----------			-----------

	:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed			:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
	to standard error, and the tool returns 1.			to standard error, and the tool returns 1.

	USING MARKERS TO ANALYZE SPECIFIC CODE BLOCKS			USING MARKERS TO ANALYZE SPECIFIC CODE BLOCKS
	▲ Show 20 Lines • Show All 590 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/MCA/Context.h

	Show All 26 Lines

	namespace llvm {			namespace llvm {
	namespace mca {			namespace mca {

	/// This is a convenience struct to hold the parameters necessary for creating			/// This is a convenience struct to hold the parameters necessary for creating
	/// the pre-built "default" out-of-order pipeline.			/// the pre-built "default" out-of-order pipeline.
	struct PipelineOptions {			struct PipelineOptions {
	PipelineOptions(unsigned DW, unsigned RFS, unsigned LQS, unsigned SQS,			PipelineOptions(unsigned DW, unsigned RFS, unsigned LQS, unsigned SQS,
	bool NoAlias)			bool NoAlias, bool ShouldEnableBottleneckAnalysis = false)
	: DispatchWidth(DW), RegisterFileSize(RFS), LoadQueueSize(LQS),			: DispatchWidth(DW), RegisterFileSize(RFS), LoadQueueSize(LQS),
	StoreQueueSize(SQS), AssumeNoAlias(NoAlias) {}			StoreQueueSize(SQS), AssumeNoAlias(NoAlias),
				EnableBottleneckAnalysis(ShouldEnableBottleneckAnalysis) {}
	unsigned DispatchWidth;			unsigned DispatchWidth;
	unsigned RegisterFileSize;			unsigned RegisterFileSize;
	unsigned LoadQueueSize;			unsigned LoadQueueSize;
	unsigned StoreQueueSize;			unsigned StoreQueueSize;
	bool AssumeNoAlias;			bool AssumeNoAlias;
				bool EnableBottleneckAnalysis;
	};			};

	class Context {			class Context {
	SmallVector<std::unique_ptr<HardwareUnit>, 4> Hardware;			SmallVector<std::unique_ptr<HardwareUnit>, 4> Hardware;
	const MCRegisterInfo &MRI;			const MCRegisterInfo &MRI;
	const MCSubtargetInfo &STI;			const MCSubtargetInfo &STI;

	public:			public:
	Show All 18 Lines

llvm/trunk/include/llvm/MCA/HWEventListener.h

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	public:

// The exact meaning of the stall event type depends on the subtarget.		// The exact meaning of the stall event type depends on the subtarget.
const unsigned Type;		const unsigned Type;

// The instruction this event was generated for.		// The instruction this event was generated for.
const InstRef &IR;		const InstRef &IR;
};		};

		// A HWPressureEvent describes an increase in backend pressure caused by
		// the presence of data dependencies or unavailability of pipeline resources.
		class HWPressureEvent {
		public:
		enum GenericReason {
		INVALID = 0,
		// Scheduler was unable to issue all the ready instructions because some
		// pipeline resources were unavailable.
		RESOURCES,
		// Instructions could not be issued because of register data dependencies.
		REGISTER_DEPS,
		// Instructions could not be issued because of memory dependencies.
		MEMORY_DEPS
		};

		HWPressureEvent(GenericReason reason, ArrayRef<InstRef> Insts,
		uint64_t Mask = 0)
		: Reason(reason), AffectedInstructions(Insts), ResourceMask(Mask) {}

		// Reason for this increase in backend pressure.
		GenericReason Reason;

		// Instructions affected (i.e. delayed) by this increase in backend pressure.
		ArrayRef<InstRef> AffectedInstructions;

		// A mask of unavailable processor resources.
		const uint64_t ResourceMask;
		};

class HWEventListener {		class HWEventListener {
public:		public:
// Generic events generated by the pipeline.		// Generic events generated by the pipeline.
virtual void onCycleBegin() {}		virtual void onCycleBegin() {}
virtual void onCycleEnd() {}		virtual void onCycleEnd() {}

virtual void onEvent(const HWInstructionEvent &Event) {}		virtual void onEvent(const HWInstructionEvent &Event) {}
virtual void onEvent(const HWStallEvent &Event) {}		virtual void onEvent(const HWStallEvent &Event) {}
		virtual void onEvent(const HWPressureEvent &Event) {}

using ResourceRef = std::pair<uint64_t, uint64_t>;		using ResourceRef = std::pair<uint64_t, uint64_t>;
virtual void onResourceAvailable(const ResourceRef &RRef) {}		virtual void onResourceAvailable(const ResourceRef &RRef) {}

// Events generated by the Scheduler when buffered resources are		// Events generated by the Scheduler when buffered resources are
// consumed/freed for an instruction.		// consumed/freed for an instruction.
virtual void onReservedBuffers(const InstRef &Inst,		virtual void onReservedBuffers(const InstRef &Inst,
ArrayRef<unsigned> Buffers) {}		ArrayRef<unsigned> Buffers) {}
Show All 12 Lines

llvm/trunk/include/llvm/MCA/HardwareUnits/Scheduler.h

Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	enum Status {
SC_STORE_QUEUE_FULL,		SC_STORE_QUEUE_FULL,
SC_BUFFERS_FULL,		SC_BUFFERS_FULL,
SC_DISPATCH_GROUP_STALL,		SC_DISPATCH_GROUP_STALL,
};		};

/// Check if the instruction in 'IR' can be dispatched during this cycle.		/// Check if the instruction in 'IR' can be dispatched during this cycle.
/// Return SC_AVAILABLE if both scheduler and LS resources are available.		/// Return SC_AVAILABLE if both scheduler and LS resources are available.
///		///
/// This method internally sets field HadTokenStall based on the Scheduler		/// This method is also responsible for setting field HadTokenStall if
/// Status value.		/// IR cannot be dispatched to the Scheduler due to unavailable resources.
Status isAvailable(const InstRef &IR);		Status isAvailable(const InstRef &IR);

/// Reserves buffer and LSUnit queue resources that are necessary to issue		/// Reserves buffer and LSUnit queue resources that are necessary to issue
/// this instruction.		/// this instruction.
///		///
/// Returns true if instruction IR is ready to be issued to the underlying		/// Returns true if instruction IR is ready to be issued to the underlying
/// pipelines. Note that this operation cannot fail; it assumes that a		/// pipelines. Note that this operation cannot fail; it assumes that a
/// previous call to method `isAvailable(IR)` returned `SC_AVAILABLE`.		/// previous call to method `isAvailable(IR)` returned `SC_AVAILABLE`.
Show All 27 Lines	unsigned getResourceID(uint64_t Mask) const {
return Resources->resolveResourceMask(Mask);		return Resources->resolveResourceMask(Mask);
}		}

/// Select the next instruction to issue from the ReadySet. Returns an invalid		/// Select the next instruction to issue from the ReadySet. Returns an invalid
/// instruction reference if there are no ready instructions, or if processor		/// instruction reference if there are no ready instructions, or if processor
/// resources are not available.		/// resources are not available.
InstRef select();		InstRef select();

/// Returns a mask of busy resources. Each bit of the mask identifies a unique
/// processor resource unit. In the absence of bottlenecks caused by resource
/// pressure, the mask value returned by this method is always zero.
uint64_t getBusyResourceUnits() const { return BusyResourceUnits; }
bool arePipelinesFullyUsed() const {		bool arePipelinesFullyUsed() const {
return !Resources->getAvailableProcResUnits();		return !Resources->getAvailableProcResUnits();
}		}
bool isReadySetEmpty() const { return ReadySet.empty(); }		bool isReadySetEmpty() const { return ReadySet.empty(); }
bool isWaitSetEmpty() const { return WaitSet.empty(); }		bool isWaitSetEmpty() const { return WaitSet.empty(); }

		/// This method is called by the ExecuteStage at the end of each cycle to
		/// identify bottlenecks caused by data dependencies. Vector RegDeps is
		/// populated by instructions that were not issued because of unsolved
		/// register dependencies. Vector MemDeps is populated by instructions that
		/// were not issued because of unsolved memory dependencies.
		void analyzeDataDependencies(SmallVectorImpl<InstRef> &RegDeps,
		SmallVectorImpl<InstRef> &MemDeps);

		/// Returns a mask of busy resources, and populates vector Insts with
		/// instructions that could not be issued to the underlying pipelines because
		/// not all pipeline resources were available.
		uint64_t analyzeResourcePressure(SmallVectorImpl<InstRef> &Insts);

// Returns true if the dispatch logic couldn't dispatch a full group due to		// Returns true if the dispatch logic couldn't dispatch a full group due to
// unavailable scheduler and/or LS resources.		// unavailable scheduler and/or LS resources.
bool hadTokenStall() const { return HadTokenStall; }		bool hadTokenStall() const { return HadTokenStall; }

#ifndef NDEBUG		#ifndef NDEBUG
// Update the ready queues.		// Update the ready queues.
void dump() const;		void dump() const;

Show All 13 Lines

llvm/trunk/include/llvm/MCA/Instruction.h

Show First 20 Lines • Show All 442 Lines • ▼ Show 20 Lines	class Instruction : public InstructionBase {

// This value defaults to the instruction latency. This instruction is		// This value defaults to the instruction latency. This instruction is
// considered executed when field CyclesLeft goes to zero.		// considered executed when field CyclesLeft goes to zero.
int CyclesLeft;		int CyclesLeft;

// Retire Unit token ID for this instruction.		// Retire Unit token ID for this instruction.
unsigned RCUTokenID;		unsigned RCUTokenID;

		// A bitmask of busy processor resource units.
		// This field is set to zero only if execution is not delayed during this
		// cycle because of unavailable pipeline resources.
uint64_t CriticalResourceMask;		uint64_t CriticalResourceMask;

		// An instruction identifier. This field is only set if execution is delayed
		// by a memory dependency.
unsigned CriticalMemDep;		unsigned CriticalMemDep;

public:		public:
Instruction(const InstrDesc &D)		Instruction(const InstrDesc &D)
: InstructionBase(D), Stage(IS_INVALID), CyclesLeft(UNKNOWN_CYCLES),		: InstructionBase(D), Stage(IS_INVALID), CyclesLeft(UNKNOWN_CYCLES),
RCUTokenID(0), CriticalResourceMask(0), CriticalMemDep(0) {}		RCUTokenID(0), CriticalResourceMask(0), CriticalMemDep(0) {}

unsigned getRCUTokenID() const { return RCUTokenID; }		unsigned getRCUTokenID() const { return RCUTokenID; }
Show All 34 Lines	public:
// Forces a transition from state IS_DISPATCHED to state IS_EXECUTED.		// Forces a transition from state IS_DISPATCHED to state IS_EXECUTED.
void forceExecuted();		void forceExecuted();

void retire() {		void retire() {
assert(isExecuted() && "Instruction is in an invalid state!");		assert(isExecuted() && "Instruction is in an invalid state!");
Stage = IS_RETIRED;		Stage = IS_RETIRED;
}		}

void updateCriticalResourceMask(uint64_t BusyResourceUnits) {
CriticalResourceMask \|= BusyResourceUnits;
}
uint64_t getCriticalResourceMask() const { return CriticalResourceMask; }		uint64_t getCriticalResourceMask() const { return CriticalResourceMask; }
void setCriticalMemDep(unsigned IID) { CriticalMemDep = IID; }
unsigned getCriticalMemDep() const { return CriticalMemDep; }		unsigned getCriticalMemDep() const { return CriticalMemDep; }
		void setCriticalResourceMask(uint64_t ResourceMask) {
		CriticalResourceMask = ResourceMask;
		}
		void setCriticalMemDep(unsigned IID) { CriticalMemDep = IID; }

void cycleEvent();		void cycleEvent();
};		};

/// An InstRef contains both a SourceMgr index and Instruction pair. The index		/// An InstRef contains both a SourceMgr index and Instruction pair. The index
/// is used as a unique identifier for the instruction. MCA will make use of		/// is used as a unique identifier for the instruction. MCA will make use of
/// this index as a key throughout MCA.		/// this index as a key throughout MCA.
class InstRef {		class InstRef {
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/MCA/Stages/ExecuteStage.h

	Show All 22 Lines
	#include "llvm/MCA/Stages/Stage.h"			#include "llvm/MCA/Stages/Stage.h"

	namespace llvm {			namespace llvm {
	namespace mca {			namespace mca {

	class ExecuteStage final : public Stage {			class ExecuteStage final : public Stage {
	Scheduler &HWS;			Scheduler &HWS;

				unsigned NumDispatchedOpcodes;
				unsigned NumIssuedOpcodes;

				// True if this stage should notify listeners of HWPressureEvents.
				bool EnablePressureEvents;

	Error issueInstruction(InstRef &IR);			Error issueInstruction(InstRef &IR);

	// Called at the beginning of each cycle to issue already dispatched			// Called at the beginning of each cycle to issue already dispatched
	// instructions to the underlying pipelines.			// instructions to the underlying pipelines.
	Error issueReadyInstructions();			Error issueReadyInstructions();

	// Used to notify instructions eliminated at register renaming stage.			// Used to notify instructions eliminated at register renaming stage.
	Error handleInstructionEliminated(InstRef &IR);			Error handleInstructionEliminated(InstRef &IR);

	ExecuteStage(const ExecuteStage &Other) = delete;			ExecuteStage(const ExecuteStage &Other) = delete;
	ExecuteStage &operator=(const ExecuteStage &Other) = delete;			ExecuteStage &operator=(const ExecuteStage &Other) = delete;

	public:			public:
	ExecuteStage(Scheduler &S) : Stage(), HWS(S) {}			ExecuteStage(Scheduler &S) : ExecuteStage(S, false) {}
				ExecuteStage(Scheduler &S, bool ShouldPerformBottleneckAnalysis)
				: Stage(), HWS(S), NumDispatchedOpcodes(0), NumIssuedOpcodes(0),
				EnablePressureEvents(ShouldPerformBottleneckAnalysis) {}

	// This stage works under the assumption that the Pipeline will eventually			// This stage works under the assumption that the Pipeline will eventually
	// execute a retire stage. We don't need to check if pipelines and/or			// execute a retire stage. We don't need to check if pipelines and/or
	// schedulers have instructions to process, because those instructions are			// schedulers have instructions to process, because those instructions are
	// also tracked by the retire control unit. That means,			// also tracked by the retire control unit. That means,
	// RetireControlUnit::hasWorkToComplete() is responsible for checking if there			// RetireControlUnit::hasWorkToComplete() is responsible for checking if there
	// are still instructions in-flight in the out-of-order backend.			// are still instructions in-flight in the out-of-order backend.
	bool hasWorkToComplete() const override { return false; }			bool hasWorkToComplete() const override { return false; }
	bool isAvailable(const InstRef &IR) const override;			bool isAvailable(const InstRef &IR) const override;

	// Notifies the scheduler that a new cycle just started.			// Notifies the scheduler that a new cycle just started.
	//			//
	// This method notifies the scheduler that a new cycle started.			// This method notifies the scheduler that a new cycle started.
	// This method is also responsible for notifying listeners about instructions			// This method is also responsible for notifying listeners about instructions
	// state changes, and processor resources freed by the scheduler.			// state changes, and processor resources freed by the scheduler.
	// Instructions that transitioned to the 'Executed' state are automatically			// Instructions that transitioned to the 'Executed' state are automatically
	// moved to the next stage (i.e. RetireStage).			// moved to the next stage (i.e. RetireStage).
	Error cycleStart() override;			Error cycleStart() override;
				Error cycleEnd() override;
	Error execute(InstRef &IR) override;			Error execute(InstRef &IR) override;

	void notifyInstructionIssued(			void notifyInstructionIssued(
	const InstRef &IR,			const InstRef &IR,
	MutableArrayRef<std::pair<ResourceRef, ResourceCycles>> Used) const;			MutableArrayRef<std::pair<ResourceRef, ResourceCycles>> Used) const;
	void notifyInstructionExecuted(const InstRef &IR) const;			void notifyInstructionExecuted(const InstRef &IR) const;
	void notifyInstructionReady(const InstRef &IR) const;			void notifyInstructionReady(const InstRef &IR) const;
	void notifyResourceAvailable(const ResourceRef &RR) const;			void notifyResourceAvailable(const ResourceRef &RR) const;
	Show All 9 Lines

llvm/trunk/lib/MCA/Context.cpp

Show All 36 Lines	Context::createDefaultPipeline(const PipelineOptions &Opts, InstrBuilder &IB,
auto LSU = llvm::make_unique<LSUnit>(SM, Opts.LoadQueueSize,		auto LSU = llvm::make_unique<LSUnit>(SM, Opts.LoadQueueSize,
Opts.StoreQueueSize, Opts.AssumeNoAlias);		Opts.StoreQueueSize, Opts.AssumeNoAlias);
auto HWS = llvm::make_unique<Scheduler>(SM, *LSU);		auto HWS = llvm::make_unique<Scheduler>(SM, *LSU);

// Create the pipeline stages.		// Create the pipeline stages.
auto Fetch = llvm::make_unique<EntryStage>(SrcMgr);		auto Fetch = llvm::make_unique<EntryStage>(SrcMgr);
auto Dispatch = llvm::make_unique<DispatchStage>(STI, MRI, Opts.DispatchWidth,		auto Dispatch = llvm::make_unique<DispatchStage>(STI, MRI, Opts.DispatchWidth,
RCU, PRF);		RCU, PRF);
auto Execute = llvm::make_unique<ExecuteStage>(*HWS);		auto Execute =
		llvm::make_unique<ExecuteStage>(*HWS, Opts.EnableBottleneckAnalysis);
auto Retire = llvm::make_unique<RetireStage>(RCU, PRF);		auto Retire = llvm::make_unique<RetireStage>(RCU, PRF);

// Pass the ownership of all the hardware units to this Context.		// Pass the ownership of all the hardware units to this Context.
addHardwareUnit(std::move(RCU));		addHardwareUnit(std::move(RCU));
addHardwareUnit(std::move(PRF));		addHardwareUnit(std::move(PRF));
addHardwareUnit(std::move(LSU));		addHardwareUnit(std::move(LSU));
addHardwareUnit(std::move(HWS));		addHardwareUnit(std::move(HWS));

Show All 11 Lines

llvm/trunk/lib/MCA/HardwareUnits/Scheduler.cpp

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines
}		}

InstRef Scheduler::select() {		InstRef Scheduler::select() {
unsigned QueueIndex = ReadySet.size();		unsigned QueueIndex = ReadySet.size();
for (unsigned I = 0, E = ReadySet.size(); I != E; ++I) {		for (unsigned I = 0, E = ReadySet.size(); I != E; ++I) {
InstRef &IR = ReadySet[I];		InstRef &IR = ReadySet[I];
if (QueueIndex == ReadySet.size() \|\|		if (QueueIndex == ReadySet.size() \|\|
Strategy->compare(IR, ReadySet[QueueIndex])) {		Strategy->compare(IR, ReadySet[QueueIndex])) {
const InstrDesc &D = IR.getInstruction()->getDesc();		Instruction &IS = *IR.getInstruction();
uint64_t BusyResourceMask = Resources->checkAvailability(D);		uint64_t BusyResourceMask = Resources->checkAvailability(IS.getDesc());
IR.getInstruction()->updateCriticalResourceMask(BusyResourceMask);		IS.setCriticalResourceMask(BusyResourceMask);
BusyResourceUnits \|= BusyResourceMask;		BusyResourceUnits \|= BusyResourceMask;
if (!BusyResourceMask)		if (!BusyResourceMask)
QueueIndex = I;		QueueIndex = I;
}		}
}		}

if (QueueIndex == ReadySet.size())		if (QueueIndex == ReadySet.size())
return InstRef();		return InstRef();
Show All 25 Lines	for (auto I = IssuedSet.begin(), E = IssuedSet.end(); I != E;) {
++RemovedElements;		++RemovedElements;
IR.invalidate();		IR.invalidate();
std::iter_swap(I, E - RemovedElements);		std::iter_swap(I, E - RemovedElements);
}		}

IssuedSet.resize(IssuedSet.size() - RemovedElements);		IssuedSet.resize(IssuedSet.size() - RemovedElements);
}		}

		uint64_t Scheduler::analyzeResourcePressure(SmallVectorImpl<InstRef> &Insts) {
		Insts.insert(Insts.end(), ReadySet.begin(), ReadySet.end());
		return BusyResourceUnits;
		}

		void Scheduler::analyzeDataDependencies(SmallVectorImpl<InstRef> &RegDeps,
		SmallVectorImpl<InstRef> &MemDeps) {
		const auto EndIt = PendingSet.end() - NumDispatchedToThePendingSet;
		for (InstRef &IR : make_range(PendingSet.begin(), EndIt)) {
		Instruction &IS = *IR.getInstruction();
		if (Resources->checkAvailability(IS.getDesc()))
		continue;

		if (IS.isReady() \|\|
		(IS.isMemOp() && LSU.isReady(IR) != IR.getSourceIndex())) {
		MemDeps.emplace_back(IR);
		} else {
		RegDeps.emplace_back(IR);
		}
		}
		}

void Scheduler::cycleEvent(SmallVectorImpl<ResourceRef> &Freed,		void Scheduler::cycleEvent(SmallVectorImpl<ResourceRef> &Freed,
SmallVectorImpl<InstRef> &Executed,		SmallVectorImpl<InstRef> &Executed,
SmallVectorImpl<InstRef> &Ready) {		SmallVectorImpl<InstRef> &Ready) {
// Release consumed resources.		// Release consumed resources.
Resources->cycleEvent(Freed);		Resources->cycleEvent(Freed);

for (InstRef &IR : IssuedSet)		for (InstRef &IR : IssuedSet)
IR.getInstruction()->cycleEvent();		IR.getInstruction()->cycleEvent();
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/trunk/lib/MCA/Stages/ExecuteStage.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	bool ExecuteStage::isAvailable(const InstRef &IR) const {

return true;		return true;
}		}

Error ExecuteStage::issueInstruction(InstRef &IR) {		Error ExecuteStage::issueInstruction(InstRef &IR) {
SmallVector<std::pair<ResourceRef, ResourceCycles>, 4> Used;		SmallVector<std::pair<ResourceRef, ResourceCycles>, 4> Used;
SmallVector<InstRef, 4> Ready;		SmallVector<InstRef, 4> Ready;
HWS.issueInstruction(IR, Used, Ready);		HWS.issueInstruction(IR, Used, Ready);
		NumIssuedOpcodes += IR.getInstruction()->getDesc().NumMicroOps;

notifyReservedOrReleasedBuffers(IR, /* Reserved */ false);		notifyReservedOrReleasedBuffers(IR, /* Reserved */ false);

notifyInstructionIssued(IR, Used);		notifyInstructionIssued(IR, Used);
if (IR.getInstruction()->isExecuted()) {		if (IR.getInstruction()->isExecuted()) {
notifyInstructionExecuted(IR);		notifyInstructionExecuted(IR);
// FIXME: add a buffer of executed instructions.		// FIXME: add a buffer of executed instructions.
if (Error S = moveToTheNextStage(IR))		if (Error S = moveToTheNextStage(IR))
Show All 19 Lines
}		}

Error ExecuteStage::cycleStart() {		Error ExecuteStage::cycleStart() {
SmallVector<ResourceRef, 8> Freed;		SmallVector<ResourceRef, 8> Freed;
SmallVector<InstRef, 4> Executed;		SmallVector<InstRef, 4> Executed;
SmallVector<InstRef, 4> Ready;		SmallVector<InstRef, 4> Ready;

HWS.cycleEvent(Freed, Executed, Ready);		HWS.cycleEvent(Freed, Executed, Ready);
		NumDispatchedOpcodes = 0;
		NumIssuedOpcodes = 0;

for (const ResourceRef &RR : Freed)		for (const ResourceRef &RR : Freed)
notifyResourceAvailable(RR);		notifyResourceAvailable(RR);

for (InstRef &IR : Executed) {		for (InstRef &IR : Executed) {
notifyInstructionExecuted(IR);		notifyInstructionExecuted(IR);
// FIXME: add a buffer of executed instructions.		// FIXME: add a buffer of executed instructions.
if (Error S = moveToTheNextStage(IR))		if (Error S = moveToTheNextStage(IR))
return S;		return S;
}		}

for (const InstRef &IR : Ready)		for (const InstRef &IR : Ready)
notifyInstructionReady(IR);		notifyInstructionReady(IR);

return issueReadyInstructions();		return issueReadyInstructions();
}		}

		Error ExecuteStage::cycleEnd() {
		if (!EnablePressureEvents)
		return ErrorSuccess();

		// Always conservatively report any backpressure events if the dispatch logic
		// was stalled due to unavailable scheduler resources.
		if (!HWS.hadTokenStall() && NumDispatchedOpcodes <= NumIssuedOpcodes)
		return ErrorSuccess();

		SmallVector<InstRef, 8> Insts;
		uint64_t Mask = HWS.analyzeResourcePressure(Insts);
		if (Mask) {
		LLVM_DEBUG(dbgs() << "[E] Backpressure increased because of unavailable "
		"pipeline resources: "
		<< format_hex(Mask, 16) << '\n');
		HWPressureEvent Ev(HWPressureEvent::RESOURCES, Insts, Mask);
		notifyEvent(Ev);
		return ErrorSuccess();
		}

		SmallVector<InstRef, 8> RegDeps;
		SmallVector<InstRef, 8> MemDeps;
		HWS.analyzeDataDependencies(RegDeps, MemDeps);
		if (RegDeps.size()) {
		LLVM_DEBUG(
		dbgs() << "[E] Backpressure increased by register dependencies\n");
		HWPressureEvent Ev(HWPressureEvent::REGISTER_DEPS, RegDeps);
		notifyEvent(Ev);
		}

		if (MemDeps.size()) {
		LLVM_DEBUG(dbgs() << "[E] Backpressure increased by memory dependencies\n");
		HWPressureEvent Ev(HWPressureEvent::MEMORY_DEPS, MemDeps);
		notifyEvent(Ev);
		}

		return ErrorSuccess();
		}

#ifndef NDEBUG		#ifndef NDEBUG
static void verifyInstructionEliminated(const InstRef &IR) {		static void verifyInstructionEliminated(const InstRef &IR) {
const Instruction &Inst = *IR.getInstruction();		const Instruction &Inst = *IR.getInstruction();
assert(Inst.isEliminated() && "Instruction was not eliminated!");		assert(Inst.isEliminated() && "Instruction was not eliminated!");
assert(Inst.isReady() && "Instruction in an inconsistent state!");		assert(Inst.isReady() && "Instruction in an inconsistent state!");

// Ensure that instructions eliminated at register renaming stage are in a		// Ensure that instructions eliminated at register renaming stage are in a
// consistent state.		// consistent state.
Show All 25 Lines	#endif
if (IR.getInstruction()->isEliminated())		if (IR.getInstruction()->isEliminated())
return handleInstructionEliminated(IR);		return handleInstructionEliminated(IR);

// Reserve a slot in each buffered resource. Also, mark units with		// Reserve a slot in each buffered resource. Also, mark units with
// BufferSize=0 as reserved. Resources with a buffer size of zero will only		// BufferSize=0 as reserved. Resources with a buffer size of zero will only
// be released after MCIS is issued, and all the ResourceCycles for those		// be released after MCIS is issued, and all the ResourceCycles for those
// units have been consumed.		// units have been consumed.
bool IsReadyInstruction = HWS.dispatch(IR);		bool IsReadyInstruction = HWS.dispatch(IR);
		NumDispatchedOpcodes += IR.getInstruction()->getDesc().NumMicroOps;
notifyReservedOrReleasedBuffers(IR, /* Reserved */ true);		notifyReservedOrReleasedBuffers(IR, /* Reserved */ true);
if (!IsReadyInstruction)		if (!IsReadyInstruction)
return ErrorSuccess();		return ErrorSuccess();

// If we did not return early, then the scheduler is ready for execution.		// If we did not return early, then the scheduler is ready for execution.
notifyInstructionReady(IR);		notifyInstructionReady(IR);

// If we cannot issue immediately, the HWS will add IR to its ready queue for		// If we cannot issue immediately, the HWS will add IR to its ready queue for
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-1.s

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=1 -bottleneck-analysis < %s \| FileCheck %s

				add %eax, %ebx
				add %ebx, %ecx
				add %ecx, %edx
				add %edx, %eax

				# CHECK: Iterations: 100
				# CHECK-NEXT: Instructions: 400
				# CHECK-NEXT: Total Cycles: 403
				# CHECK-NEXT: Total uOps: 400

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 0.99
				# CHECK-NEXT: IPC: 0.99
				# CHECK-NEXT: Block RThroughput: 2.0

				# CHECK: Cycles with backend pressure increase [ 94.04% ]
				# CHECK-NEXT: Throughput Bottlenecks:
				# CHECK-NEXT: Resource Pressure [ 0.00% ]
				# CHECK-NEXT: Data Dependencies: [ 94.04% ]
				# CHECK-NEXT: - Register Dependencies [ 94.04% ]
				# CHECK-NEXT: - Memory Dependencies [ 0.00% ]

				# CHECK: Instruction Info:
				# CHECK-NEXT: [1]: #uOps
				# CHECK-NEXT: [2]: Latency
				# CHECK-NEXT: [3]: RThroughput
				# CHECK-NEXT: [4]: MayLoad
				# CHECK-NEXT: [5]: MayStore
				# CHECK-NEXT: [6]: HasSideEffects (U)

				# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
				# CHECK-NEXT: 1 1 0.50 addl %eax, %ebx
				# CHECK-NEXT: 1 1 0.50 addl %ebx, %ecx
				# CHECK-NEXT: 1 1 0.50 addl %ecx, %edx
				# CHECK-NEXT: 1 1 0.50 addl %edx, %eax

				# CHECK: Resources:
				# CHECK-NEXT: [0] - JALU0
				# CHECK-NEXT: [1] - JALU1
				# CHECK-NEXT: [2] - JDiv
				# CHECK-NEXT: [3] - JFPA
				# CHECK-NEXT: [4] - JFPM
				# CHECK-NEXT: [5] - JFPU0
				# CHECK-NEXT: [6] - JFPU1
				# CHECK-NEXT: [7] - JLAGU
				# CHECK-NEXT: [8] - JMul
				# CHECK-NEXT: [9] - JSAGU
				# CHECK-NEXT: [10] - JSTC
				# CHECK-NEXT: [11] - JVALU0
				# CHECK-NEXT: [12] - JVALU1
				# CHECK-NEXT: [13] - JVIMUL

				# CHECK: Resource pressure per iteration:
				# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
				# CHECK-NEXT: 2.00 2.00 - - - - - - - - - - - -

				# CHECK: Resource pressure by instruction:
				# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
				# CHECK-NEXT: - 1.00 - - - - - - - - - - - - addl %eax, %ebx
				# CHECK-NEXT: 1.00 - - - - - - - - - - - - - addl %ebx, %ecx
				# CHECK-NEXT: - 1.00 - - - - - - - - - - - - addl %ecx, %edx
				# CHECK-NEXT: 1.00 - - - - - - - - - - - - - addl %edx, %eax

				# CHECK: Timeline view:
				# CHECK-NEXT: Index 0123456

				# CHECK: [0,0] DeER .. addl %eax, %ebx
				# CHECK-NEXT: [0,1] D=eER.. addl %ebx, %ecx
				# CHECK-NEXT: [0,2] .D=eER. addl %ecx, %edx
				# CHECK-NEXT: [0,3] .D==eER addl %edx, %eax

				# CHECK: Average Wait times (based on the timeline view):
				# CHECK-NEXT: [0]: Executions
				# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
				# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
				# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

				# CHECK: [0] [1] [2] [3]
				# CHECK-NEXT: 0. 1 1.0 1.0 0.0 addl %eax, %ebx
				# CHECK-NEXT: 1. 1 2.0 0.0 0.0 addl %ebx, %ecx
				# CHECK-NEXT: 2. 1 2.0 0.0 0.0 addl %ecx, %edx
				# CHECK-NEXT: 3. 1 3.0 0.0 0.0 addl %edx, %eax

llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-2.s

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=100 -timeline -timeline-max-iterations=1 -bottleneck-analysis < %s \| FileCheck %s

				vhaddps %xmm0, %xmm0, %xmm1

				# CHECK: Iterations: 100
				# CHECK-NEXT: Instructions: 100
				# CHECK-NEXT: Total Cycles: 106
				# CHECK-NEXT: Total uOps: 100

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 0.94
				# CHECK-NEXT: IPC: 0.94
				# CHECK-NEXT: Block RThroughput: 1.0

				# CHECK: Cycles with backend pressure increase [ 76.42% ]
				# CHECK-NEXT: Throughput Bottlenecks:
				# CHECK-NEXT: Resource Pressure [ 76.42% ]
				# CHECK-NEXT: - JFPA [ 76.42% ]
				# CHECK-NEXT: - JFPU0 [ 76.42% ]
				# CHECK-NEXT: Data Dependencies: [ 0.00% ]
				# CHECK-NEXT: - Register Dependencies [ 0.00% ]
				# CHECK-NEXT: - Memory Dependencies [ 0.00% ]

				# CHECK: Instruction Info:
				# CHECK-NEXT: [1]: #uOps
				# CHECK-NEXT: [2]: Latency
				# CHECK-NEXT: [3]: RThroughput
				# CHECK-NEXT: [4]: MayLoad
				# CHECK-NEXT: [5]: MayStore
				# CHECK-NEXT: [6]: HasSideEffects (U)

				# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
				# CHECK-NEXT: 1 4 1.00 vhaddps %xmm0, %xmm0, %xmm1

				# CHECK: Resources:
				# CHECK-NEXT: [0] - JALU0
				# CHECK-NEXT: [1] - JALU1
				# CHECK-NEXT: [2] - JDiv
				# CHECK-NEXT: [3] - JFPA
				# CHECK-NEXT: [4] - JFPM
				# CHECK-NEXT: [5] - JFPU0
				# CHECK-NEXT: [6] - JFPU1
				# CHECK-NEXT: [7] - JLAGU
				# CHECK-NEXT: [8] - JMul
				# CHECK-NEXT: [9] - JSAGU
				# CHECK-NEXT: [10] - JSTC
				# CHECK-NEXT: [11] - JVALU0
				# CHECK-NEXT: [12] - JVALU1
				# CHECK-NEXT: [13] - JVIMUL

				# CHECK: Resource pressure per iteration:
				# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
				# CHECK-NEXT: - - - 1.00 - 1.00 - - - - - - - -

				# CHECK: Resource pressure by instruction:
				# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
				# CHECK-NEXT: - - - 1.00 - 1.00 - - - - - - - - vhaddps %xmm0, %xmm0, %xmm1

				# CHECK: Timeline view:
				# CHECK-NEXT: Index 0123456

				# CHECK: [0,0] DeeeeER vhaddps %xmm0, %xmm0, %xmm1

				# CHECK: Average Wait times (based on the timeline view):
				# CHECK-NEXT: [0]: Executions
				# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
				# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
				# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

				# CHECK: [0] [1] [2] [3]
				# CHECK-NEXT: 0. 1 1.0 1.0 0.0 vhaddps %xmm0, %xmm0, %xmm1

llvm/trunk/test/tools/llvm-mca/X86/BtVer2/bottleneck-hints-3.s

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -noalias=false -timeline -timeline-max-iterations=1 -bottleneck-analysis < %s \| FileCheck %s

				vmovaps (%rsi), %xmm0
				vmovaps %xmm0, (%rdi)
				vmovaps 16(%rsi), %xmm0
				vmovaps %xmm0, 16(%rdi)
				vmovaps 32(%rsi), %xmm0
				vmovaps %xmm0, 32(%rdi)
				vmovaps 48(%rsi), %xmm0
				vmovaps %xmm0, 48(%rdi)

				# CHECK: Iterations: 1500
				# CHECK-NEXT: Instructions: 12000
				# CHECK-NEXT: Total Cycles: 36003
				# CHECK-NEXT: Total uOps: 12000

				# CHECK: Dispatch Width: 2
				# CHECK-NEXT: uOps Per Cycle: 0.33
				# CHECK-NEXT: IPC: 0.33
				# CHECK-NEXT: Block RThroughput: 4.0

				# CHECK: Cycles with backend pressure increase [ 99.89% ]
				# CHECK-NEXT: Throughput Bottlenecks:
				# CHECK-NEXT: Resource Pressure [ 0.00% ]
				# CHECK-NEXT: Data Dependencies: [ 99.89% ]
				# CHECK-NEXT: - Register Dependencies [ 0.00% ]
				# CHECK-NEXT: - Memory Dependencies [ 99.89% ]

				# CHECK: Instruction Info:
				# CHECK-NEXT: [1]: #uOps
				# CHECK-NEXT: [2]: Latency
				# CHECK-NEXT: [3]: RThroughput
				# CHECK-NEXT: [4]: MayLoad
				# CHECK-NEXT: [5]: MayStore
				# CHECK-NEXT: [6]: HasSideEffects (U)

				# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
				# CHECK-NEXT: 1 5 1.00 * vmovaps (%rsi), %xmm0
				# CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, (%rdi)
				# CHECK-NEXT: 1 5 1.00 * vmovaps 16(%rsi), %xmm0
				# CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 16(%rdi)
				# CHECK-NEXT: 1 5 1.00 * vmovaps 32(%rsi), %xmm0
				# CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 32(%rdi)
				# CHECK-NEXT: 1 5 1.00 * vmovaps 48(%rsi), %xmm0
				# CHECK-NEXT: 1 1 1.00 * vmovaps %xmm0, 48(%rdi)

				# CHECK: Resources:
				# CHECK-NEXT: [0] - JALU0
				# CHECK-NEXT: [1] - JALU1
				# CHECK-NEXT: [2] - JDiv
				# CHECK-NEXT: [3] - JFPA
				# CHECK-NEXT: [4] - JFPM
				# CHECK-NEXT: [5] - JFPU0
				# CHECK-NEXT: [6] - JFPU1
				# CHECK-NEXT: [7] - JLAGU
				# CHECK-NEXT: [8] - JMul
				# CHECK-NEXT: [9] - JSAGU
				# CHECK-NEXT: [10] - JSTC
				# CHECK-NEXT: [11] - JVALU0
				# CHECK-NEXT: [12] - JVALU1
				# CHECK-NEXT: [13] - JVIMUL

				# CHECK: Resource pressure per iteration:
				# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
				# CHECK-NEXT: - - - 2.00 2.00 4.00 4.00 4.00 - 4.00 4.00 - - -

				# CHECK: Resource pressure by instruction:
				# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
				# CHECK-NEXT: - - - - 1.00 1.00 - 1.00 - - - - - - vmovaps (%rsi), %xmm0
				# CHECK-NEXT: - - - - - - 1.00 - - 1.00 1.00 - - - vmovaps %xmm0, (%rdi)
				# CHECK-NEXT: - - - 1.00 - 1.00 - 1.00 - - - - - - vmovaps 16(%rsi), %xmm0
				# CHECK-NEXT: - - - - - - 1.00 - - 1.00 1.00 - - - vmovaps %xmm0, 16(%rdi)
				# CHECK-NEXT: - - - - 1.00 1.00 - 1.00 - - - - - - vmovaps 32(%rsi), %xmm0
				# CHECK-NEXT: - - - - - - 1.00 - - 1.00 1.00 - - - vmovaps %xmm0, 32(%rdi)
				# CHECK-NEXT: - - - 1.00 - 1.00 - 1.00 - - - - - - vmovaps 48(%rsi), %xmm0
				# CHECK-NEXT: - - - - - - 1.00 - - 1.00 1.00 - - - vmovaps %xmm0, 48(%rdi)

				# CHECK: Timeline view:
				# CHECK-NEXT: 0123456789
				# CHECK-NEXT: Index 0123456789 0123456

				# CHECK: [0,0] DeeeeeER . . . .. vmovaps (%rsi), %xmm0
				# CHECK-NEXT: [0,1] D=====eER . . . .. vmovaps %xmm0, (%rdi)
				# CHECK-NEXT: [0,2] .D=====eeeeeER . . .. vmovaps 16(%rsi), %xmm0
				# CHECK-NEXT: [0,3] .D==========eER. . .. vmovaps %xmm0, 16(%rdi)
				# CHECK-NEXT: [0,4] . D==========eeeeeER. .. vmovaps 32(%rsi), %xmm0
				# CHECK-NEXT: [0,5] . D===============eER .. vmovaps %xmm0, 32(%rdi)
				# CHECK-NEXT: [0,6] . D===============eeeeeER. vmovaps 48(%rsi), %xmm0
				# CHECK-NEXT: [0,7] . D====================eER vmovaps %xmm0, 48(%rdi)

				# CHECK: Average Wait times (based on the timeline view):
				# CHECK-NEXT: [0]: Executions
				# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
				# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
				# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

				# CHECK: [0] [1] [2] [3]
				# CHECK-NEXT: 0. 1 1.0 1.0 0.0 vmovaps (%rsi), %xmm0
				# CHECK-NEXT: 1. 1 6.0 0.0 0.0 vmovaps %xmm0, (%rdi)
				# CHECK-NEXT: 2. 1 6.0 0.0 0.0 vmovaps 16(%rsi), %xmm0
				# CHECK-NEXT: 3. 1 11.0 0.0 0.0 vmovaps %xmm0, 16(%rdi)
				# CHECK-NEXT: 4. 1 11.0 0.0 0.0 vmovaps 32(%rsi), %xmm0
				# CHECK-NEXT: 5. 1 16.0 0.0 0.0 vmovaps %xmm0, 32(%rdi)
				# CHECK-NEXT: 6. 1 16.0 0.0 0.0 vmovaps 48(%rsi), %xmm0
				# CHECK-NEXT: 7. 1 21.0 0.0 0.0 vmovaps %xmm0, 48(%rdi)

llvm/trunk/tools/llvm-mca/Views/SummaryView.h

	Show All 39 Lines
	class SummaryView : public View {			class SummaryView : public View {
	const llvm::MCSchedModel &SM;			const llvm::MCSchedModel &SM;
	llvm::ArrayRef<llvm::MCInst> Source;			llvm::ArrayRef<llvm::MCInst> Source;
	const unsigned DispatchWidth;			const unsigned DispatchWidth;
	unsigned LastInstructionIdx;			unsigned LastInstructionIdx;
	unsigned TotalCycles;			unsigned TotalCycles;
	// The total number of micro opcodes contributed by a block of instructions.			// The total number of micro opcodes contributed by a block of instructions.
	unsigned NumMicroOps;			unsigned NumMicroOps;

				struct BackPressureInfo {
				// Cycles where backpressure increased.
				unsigned PressureIncreaseCycles;
				// Cycles where backpressure increased because of pipeline pressure.
				unsigned ResourcePressureCycles;
				// Cycles where backpressure increased because of data dependencies.
				unsigned DataDependencyCycles;
				// Cycles where backpressure increased because of register dependencies.
				unsigned RegisterDependencyCycles;
				// Cycles where backpressure increased because of memory dependencies.
				unsigned MemoryDependencyCycles;
				};
				BackPressureInfo BPI;

				// Resource pressure distribution. There is an element for every processor
				// resource declared by the scheduling model. Quantities are number of cycles.
				llvm::SmallVector<unsigned, 8> ResourcePressureDistribution;

	// For each processor resource, this vector stores the cumulative number of			// For each processor resource, this vector stores the cumulative number of
	// resource cycles consumed by the analyzed code block.			// resource cycles consumed by the analyzed code block.
	llvm::SmallVector<unsigned, 8> ProcResourceUsage;			llvm::SmallVector<unsigned, 8> ProcResourceUsage;

	// Each processor resource is associated with a so-called processor resource			// Each processor resource is associated with a so-called processor resource
	// mask. This vector allows to correlate processor resource IDs with processor			// mask. This vector allows to correlate processor resource IDs with processor
	// resource masks. There is exactly one element per each processor resource			// resource masks. There is exactly one element per each processor resource
	// declared by the scheduling model.			// declared by the scheduling model.
	llvm::SmallVector<uint64_t, 8> ProcResourceMasks;			llvm::SmallVector<uint64_t, 8> ProcResourceMasks;

	// Used to map resource indices to actual processor resource IDs.			// Used to map resource indices to actual processor resource IDs.
	llvm::SmallVector<unsigned, 8> ResIdx2ProcResID;			llvm::SmallVector<unsigned, 8> ResIdx2ProcResID;

				// True if resource pressure events were notified during this cycle.
				bool PressureIncreasedBecauseOfResources;
				bool PressureIncreasedBecauseOfDataDependencies;

				// True if throughput was affected by dispatch stalls.
				bool SeenStallCycles;

	// Compute the reciprocal throughput for the analyzed code block.			// Compute the reciprocal throughput for the analyzed code block.
	// The reciprocal block throughput is computed as the MAX between:			// The reciprocal block throughput is computed as the MAX between:
	// - NumMicroOps / DispatchWidth			// - NumMicroOps / DispatchWidth
	// - Total Resource Cycles / #Units (for every resource consumed).			// - Total Resource Cycles / #Units (for every resource consumed).
	double getBlockRThroughput() const;			double getBlockRThroughput() const;

				// Prints a bottleneck message to OS.
				void printBottleneckHints(llvm::raw_ostream &OS) const;

	public:			public:
	SummaryView(const llvm::MCSchedModel &Model, llvm::ArrayRef<llvm::MCInst> S,			SummaryView(const llvm::MCSchedModel &Model, llvm::ArrayRef<llvm::MCInst> S,
	unsigned Width);			unsigned Width);

	void onCycleEnd() override { ++TotalCycles; }			void onCycleEnd() override {
				++TotalCycles;
				if (PressureIncreasedBecauseOfResources \|\|
				PressureIncreasedBecauseOfDataDependencies) {
				++BPI.PressureIncreaseCycles;
				if (PressureIncreasedBecauseOfDataDependencies)
				++BPI.DataDependencyCycles;
				PressureIncreasedBecauseOfResources = false;
				PressureIncreasedBecauseOfDataDependencies = false;
				}
				}
	void onEvent(const HWInstructionEvent &Event) override;			void onEvent(const HWInstructionEvent &Event) override;
				void onEvent(const HWStallEvent &Event) override {
				SeenStallCycles = true;
				}

				void onEvent(const HWPressureEvent &Event) override;

	void printView(llvm::raw_ostream &OS) const override;			void printView(llvm::raw_ostream &OS) const override;
	};			};
	} // namespace mca			} // namespace mca
	} // namespace llvm			} // namespace llvm

	#endif			#endif

llvm/trunk/tools/llvm-mca/Views/SummaryView.cpp

Show All 19 Lines
namespace llvm {		namespace llvm {
namespace mca {		namespace mca {

#define DEBUG_TYPE "llvm-mca"		#define DEBUG_TYPE "llvm-mca"

SummaryView::SummaryView(const MCSchedModel &Model, ArrayRef<MCInst> S,		SummaryView::SummaryView(const MCSchedModel &Model, ArrayRef<MCInst> S,
unsigned Width)		unsigned Width)
: SM(Model), Source(S), DispatchWidth(Width), LastInstructionIdx(0),		: SM(Model), Source(S), DispatchWidth(Width), LastInstructionIdx(0),
TotalCycles(0), NumMicroOps(0),		TotalCycles(0), NumMicroOps(0), BPI({0, 0, 0, 0}),
		ResourcePressureDistribution(Model.getNumProcResourceKinds(), 0),
ProcResourceUsage(Model.getNumProcResourceKinds(), 0),		ProcResourceUsage(Model.getNumProcResourceKinds(), 0),
ProcResourceMasks(Model.getNumProcResourceKinds()),		ProcResourceMasks(Model.getNumProcResourceKinds()),
ResIdx2ProcResID(Model.getNumProcResourceKinds(), 0) {		ResIdx2ProcResID(Model.getNumProcResourceKinds(), 0),
		PressureIncreasedBecauseOfResources(false),
		PressureIncreasedBecauseOfDataDependencies(false),
		SeenStallCycles(false) {
computeProcResourceMasks(SM, ProcResourceMasks);		computeProcResourceMasks(SM, ProcResourceMasks);
for (unsigned I = 1, E = SM.getNumProcResourceKinds(); I < E; ++I) {		for (unsigned I = 1, E = SM.getNumProcResourceKinds(); I < E; ++I) {
unsigned Index = getResourceStateIndex(ProcResourceMasks[I]);		unsigned Index = getResourceStateIndex(ProcResourceMasks[I]);
ResIdx2ProcResID[Index] = I;		ResIdx2ProcResID[Index] = I;
}		}
}		}

void SummaryView::onEvent(const HWInstructionEvent &Event) {		void SummaryView::onEvent(const HWInstructionEvent &Event) {
Show All 16 Lines	void SummaryView::onEvent(const HWInstructionEvent &Event) {
for (const std::pair<uint64_t, const ResourceUsage> &RU : Desc.Resources) {		for (const std::pair<uint64_t, const ResourceUsage> &RU : Desc.Resources) {
if (RU.second.size()) {		if (RU.second.size()) {
unsigned ProcResID = ResIdx2ProcResID[getResourceStateIndex(RU.first)];		unsigned ProcResID = ResIdx2ProcResID[getResourceStateIndex(RU.first)];
ProcResourceUsage[ProcResID] += RU.second.size();		ProcResourceUsage[ProcResID] += RU.second.size();
}		}
}		}
}		}

		void SummaryView::onEvent(const HWPressureEvent &Event) {
		assert(Event.Reason != HWPressureEvent::INVALID &&
		"Unexpected invalid event!");

		switch (Event.Reason) {
		default:
		break;

		case HWPressureEvent::RESOURCES: {
		PressureIncreasedBecauseOfResources = true;
		++BPI.ResourcePressureCycles;
		uint64_t ResourceMask = Event.ResourceMask;
		while (ResourceMask) {
		uint64_t Current = ResourceMask & (-ResourceMask);
		unsigned Index = getResourceStateIndex(Current);
		unsigned ProcResID = ResIdx2ProcResID[Index];
		const MCProcResourceDesc &PRDesc = *SM.getProcResource(ProcResID);
		if (!PRDesc.SubUnitsIdxBegin) {
		ResourcePressureDistribution[Index]++;
		ResourceMask ^= Current;
		continue;
		}

		for (unsigned I = 0, E = PRDesc.NumUnits; I < E; ++I) {
		unsigned OtherProcResID = PRDesc.SubUnitsIdxBegin[I];
		unsigned OtherMask = ProcResourceMasks[OtherProcResID];
		ResourcePressureDistribution[getResourceStateIndex(OtherMask)]++;
		}

		ResourceMask ^= Current;
		}
		}

		break;
		case HWPressureEvent::REGISTER_DEPS:
		PressureIncreasedBecauseOfDataDependencies = true;
		++BPI.RegisterDependencyCycles;
		break;
		case HWPressureEvent::MEMORY_DEPS:
		PressureIncreasedBecauseOfDataDependencies = true;
		++BPI.MemoryDependencyCycles;
		break;
		}
		}

		void SummaryView::printBottleneckHints(raw_ostream &OS) const {
		if (!SeenStallCycles \|\| !BPI.PressureIncreaseCycles)
		return;

		double PressurePerCycle =
		(double)BPI.PressureIncreaseCycles * 100 / TotalCycles;
		double ResourcePressurePerCycle =
		(double)BPI.ResourcePressureCycles * 100 / TotalCycles;
		double DDPerCycle = (double)BPI.DataDependencyCycles * 100 / TotalCycles;
		double RegDepPressurePerCycle =
		(double)BPI.RegisterDependencyCycles * 100 / TotalCycles;
		double MemDepPressurePerCycle =
		(double)BPI.MemoryDependencyCycles * 100 / TotalCycles;

		OS << "\nCycles with backend pressure increase [ "
		<< format("%.2f", floor((PressurePerCycle * 100) + 0.5) / 100) << "% ]";

		OS << "\nThroughput Bottlenecks: "
		<< "\n Resource Pressure [ "
		<< format("%.2f", floor((ResourcePressurePerCycle * 100) + 0.5) / 100)
		<< "% ]";

		if (BPI.PressureIncreaseCycles) {
		for (unsigned I = 0, E = ResourcePressureDistribution.size(); I < E; ++I) {
		if (ResourcePressureDistribution[I]) {
		double Frequency =
		(double)ResourcePressureDistribution[I] * 100 / TotalCycles;
		unsigned Index = ResIdx2ProcResID[getResourceStateIndex(1ULL << I)];
		const MCProcResourceDesc &PRDesc = *SM.getProcResource(Index);
		OS << "\n - " << PRDesc.Name << " [ "
		<< format("%.2f", floor((Frequency * 100) + 0.5) / 100) << "% ]";
		}
		}
		}

		OS << "\n Data Dependencies: [ "
		<< format("%.2f", floor((DDPerCycle * 100) + 0.5) / 100) << "% ]";

		OS << "\n - Register Dependencies [ "
		<< format("%.2f", floor((RegDepPressurePerCycle * 100) + 0.5) / 100)
		<< "% ]";

		OS << "\n - Memory Dependencies [ "
		<< format("%.2f", floor((MemDepPressurePerCycle * 100) + 0.5) / 100)
		<< "% ]\n\n";
		}

void SummaryView::printView(raw_ostream &OS) const {		void SummaryView::printView(raw_ostream &OS) const {
unsigned Instructions = Source.size();		unsigned Instructions = Source.size();
unsigned Iterations = (LastInstructionIdx / Instructions) + 1;		unsigned Iterations = (LastInstructionIdx / Instructions) + 1;
unsigned TotalInstructions = Instructions * Iterations;		unsigned TotalInstructions = Instructions * Iterations;
unsigned TotalUOps = NumMicroOps * Iterations;		unsigned TotalUOps = NumMicroOps * Iterations;
double IPC = (double)TotalInstructions / TotalCycles;		double IPC = (double)TotalInstructions / TotalCycles;
double UOpsPerCycle = (double)TotalUOps / TotalCycles;		double UOpsPerCycle = (double)TotalUOps / TotalCycles;
double BlockRThroughput = computeBlockRThroughput(		double BlockRThroughput = computeBlockRThroughput(
SM, DispatchWidth, NumMicroOps, ProcResourceUsage);		SM, DispatchWidth, NumMicroOps, ProcResourceUsage);

std::string Buffer;		std::string Buffer;
raw_string_ostream TempStream(Buffer);		raw_string_ostream TempStream(Buffer);
TempStream << "Iterations: " << Iterations;		TempStream << "Iterations: " << Iterations;
TempStream << "\nInstructions: " << TotalInstructions;		TempStream << "\nInstructions: " << TotalInstructions;
TempStream << "\nTotal Cycles: " << TotalCycles;		TempStream << "\nTotal Cycles: " << TotalCycles;
TempStream << "\nTotal uOps: " << TotalUOps << '\n';		TempStream << "\nTotal uOps: " << TotalUOps << '\n';
TempStream << "\nDispatch Width: " << DispatchWidth;		TempStream << "\nDispatch Width: " << DispatchWidth;
TempStream << "\nuOps Per Cycle: "		TempStream << "\nuOps Per Cycle: "
<< format("%.2f", floor((UOpsPerCycle * 100) + 0.5) / 100);		<< format("%.2f", floor((UOpsPerCycle * 100) + 0.5) / 100);
TempStream << "\nIPC: "		TempStream << "\nIPC: "
<< format("%.2f", floor((IPC * 100) + 0.5) / 100);		<< format("%.2f", floor((IPC * 100) + 0.5) / 100);
TempStream << "\nBlock RThroughput: "		TempStream << "\nBlock RThroughput: "
<< format("%.1f", floor((BlockRThroughput * 10) + 0.5) / 10)		<< format("%.1f", floor((BlockRThroughput * 10) + 0.5) / 10)
<< '\n';		<< '\n';

		printBottleneckHints(TempStream);
TempStream.flush();		TempStream.flush();
OS << Buffer;		OS << Buffer;
}		}
} // namespace mca.		} // namespace mca.
} // namespace llvm		} // namespace llvm

llvm/trunk/tools/llvm-mca/llvm-mca.cpp

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableAllStats("all-stats",
cl::desc("Print all hardware statistics"),		cl::desc("Print all hardware statistics"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

static cl::opt<bool>		static cl::opt<bool>
EnableAllViews("all-views",		EnableAllViews("all-views",
cl::desc("Print all views including hardware statistics"),		cl::desc("Print all views including hardware statistics"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

		static cl::opt<bool> EnableBottleneckAnalysis(
		"bottleneck-analysis",
		cl::desc("Enable bottleneck analysis (disabled by default)"),
		cl::cat(ViewOptions), cl::init(false));

namespace {		namespace {

const Target getTarget(const char ProgName) {		const Target getTarget(const char ProgName) {
if (TripleName.empty())		if (TripleName.empty())
TripleName = Triple::normalize(sys::getDefaultTargetTriple());		TripleName = Triple::normalize(sys::getDefaultTargetTriple());
Triple TheTriple(TripleName);		Triple TheTriple(TripleName);

// Get the target specific parser.		// Get the target specific parser.
▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {

// Create an instruction builder.		// Create an instruction builder.
mca::InstrBuilder IB(STI, MCII, *MRI, MCIA.get());		mca::InstrBuilder IB(STI, MCII, *MRI, MCIA.get());

// Create a context to control ownership of the pipeline hardware.		// Create a context to control ownership of the pipeline hardware.
mca::Context MCA(MRI, STI);		mca::Context MCA(MRI, STI);

mca::PipelineOptions PO(Width, RegisterFileSize, LoadQueueSize,		mca::PipelineOptions PO(Width, RegisterFileSize, LoadQueueSize,
StoreQueueSize, AssumeNoAlias);		StoreQueueSize, AssumeNoAlias,
		EnableBottleneckAnalysis);

// Number each region in the sequence.		// Number each region in the sequence.
unsigned RegionIdx = 0;		unsigned RegionIdx = 0;

for (const std::unique_ptr<mca::CodeRegion> &Region : Regions) {		for (const std::unique_ptr<mca::CodeRegion> &Region : Regions) {
// Skip empty code regions.		// Skip empty code regions.
if (Region->empty())		if (Region->empty())
continue;		continue;
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines