This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/CommandGuide/
-
CommandGuide/
-
llvm-mca.rst
-
include/llvm/MCA/
-
llvm/
-
MCA/
-
CodeEmitter.h
-
lib/MCA/
-
MCA/
-
CMakeLists.txt
1/3
CodeEmitter.cpp
-
test/tools/llvm-mca/X86/
-
tools/
-
llvm-mca/
-
X86/
-
show-encoding.s
-
tools/llvm-mca/
-
llvm-mca/
-
Views/
-
InstructionInfoView.h
-
InstructionInfoView.cpp
1
llvm-mca.cpp

Differential D65948

[MCA] Add flag -show-encoding to llvm-mca.
ClosedPublic

Authored by andreadb on Aug 8 2019, 6:28 AM.

Download Raw Diff

Details

Reviewers

RKSimon
lebedev.ri
courbet
gchatelet
ondrasej

Commits

rGcbec9af6bfb0: [MCA] Add flag -show-encoding to llvm-mca.
rL368432: [MCA] Add flag -show-encoding to llvm-mca.

Summary

Flag -show-encoding enables the printing of instruction encodings as part of the the instruction info view.

Example (with flags -mtriple=x86_64-- -mcpu=btver2):

Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[7]: Encoding Size

[1]    [2]    [3]    [4]    [5]    [6]    [7]    Encodings:                    Instructions:
 1      2     1.00                         4     c5 f0 59 d0                   vmulps   %xmm0, %xmm1, %xmm2
 1      4     1.00                         4     c5 eb 7c da                   vhaddps  %xmm2, %xmm2, %xmm3
 1      4     1.00                         4     c5 e3 7c e3                   vhaddps  %xmm3, %xmm3, %xmm4

In this example, column Encoding Size is the size in bytes of the instruction encoding.
Column Encodings reports the actual instruction encodings as byte sequences in hex (objdump style).

The computation of encodings is done by a utility class named mca::CodeEmitter.

In future, I plan to expose the CodeEmitter to the instruction builder, so that information about instruction encoding sizes can be used by the simulator. That would be a first step towards simulating the throughput from the decoders in the hardware frontend.

Diff Detail

Event Timeline

andreadb created this revision.Aug 8 2019, 6:28 AM

Herald added subscribers: gbedwell, tschuett, mgorny. · View Herald TranscriptAug 8 2019, 6:28 AM

Nice!
Please wait for other reviewers to have a look.

lib/MCA/CodeEmitter.cpp
21	I'm not a huge fan of using the address of `MCInst` as the key in the cache, for instance the following code would be buggy: CodeEmitter CE; MCInst Inst = GetInst(); CE.getOrCreateEncodingInfo(Inst); Inst = GetAnotherInst(); CE.getOrCreateEncodingInfo(Inst); If you still want to use the address for performance reasons you can pass by pointer instead of reference. At least this will be a red flag,

This revision is now accepted and ready to land.Aug 8 2019, 6:44 AM

Docs missing.

In D65948#1621151, @lebedev.ri wrote:

Docs missing.

I will add documentation for it.

lib/MCA/CodeEmitter.cpp
21	Yes it would be bugged for that case. However, that case is not going to happen in MCA because the sequence of MCInst is constructed once at the beginning, and then it is never changed. It is passed around as an ArrayRef and no module is meant to change any MCInsts or add extra elements to the sequence. That sequence is never invalidated nor changed during the entire execution. So yes, I totally agree with you that using a pointer is ugly and scary. But in practice - for our particular scenario - it works in a fast and easy. That being said, I could use an instruction index. However, that would complicate future patches to the InstrBuilder (which doesn't know anything about the instruction index). I'll see if I can use the instruction index and avoid any controversial change :).

+1 for the update to the llvm-mca.rst

tools/llvm-mca/llvm-mca.cpp
290	Add this back.

Address review comments.

Added documentation for the new flag.

Changed how the CodeEmitter works.
Class CodeEmitter now uses instruction identifiers to reference MCInsts of a sequence.
It no longer uses a map. It uses a vector instead, and the instruction index is used to address that vector.

Is it already transitively enabled by -all-stats/-all-views, i think not?

In D65948#1621308, @lebedev.ri wrote:

Is it already transitively enabled by -all-stats/-all-views, i think not?

No it is not.

In D65948#1621314, @andreadb wrote:

In D65948#1621308, @lebedev.ri wrote:

Is it already transitively enabled by -all-stats/-all-views, i think not?

No it is not.

TBH I'd prefer that this is disabled by default - it bulks out the instruction view quite a bit

In D65948#1621321, @RKSimon wrote:

In D65948#1621314, @andreadb wrote:

In D65948#1621308, @lebedev.ri wrote:

Is it already transitively enabled by -all-stats/-all-views, i think not?

No it is not.

TBH I'd prefer that this is disabled by default - it bulks out the instruction view quite a bit

By default - i totally agree. But not by an explicit "please show me everything; no but really please do".

In D65948#1621331, @lebedev.ri wrote:

In D65948#1621321, @RKSimon wrote:

In D65948#1621314, @andreadb wrote:

In D65948#1621308, @lebedev.ri wrote:

Is it already transitively enabled by -all-stats/-all-views, i think not?

No it is not.

TBH I'd prefer that this is disabled by default - it bulks out the instruction view quite a bit

By default - i totally agree. But not by an explicit "please show me everything; no but really please do".

To me, "enable all views" doesn't mean "show me everything". It literally means that all the views should contribute to the final report.
Extra flags like -show-encoding could then be used to further customize the output.
This is at least how I see it.

In D65948#1621339, @andreadb wrote:

In D65948#1621331, @lebedev.ri wrote:

In D65948#1621321, @RKSimon wrote:

In D65948#1621314, @andreadb wrote:

In D65948#1621308, @lebedev.ri wrote:

Is it already transitively enabled by -all-stats/-all-views, i think not?

No it is not.

TBH I'd prefer that this is disabled by default - it bulks out the instruction view quite a bit

By default - i totally agree. But not by an explicit "please show me everything; no but really please do".

To me, "enable all views" doesn't mean "show me everything". It literally means that all the views should contribute to the final report.
Extra flags like -show-encoding could then be used to further customize the output.
This is at least how I see it.

No strong opinion in this case, could potentially be changed later.
LG

gchatelet accepted this revision.Aug 9 2019, 1:40 AM

gchatelet added inline comments.

lib/MCA/CodeEmitter.cpp
21	It looks much better now IMHO : ) Thx for taking the time to improve on it.

Thanks Guillaume.

I agree that it is better this way.

Closed by commit rL368432: [MCA] Add flag -show-encoding to llvm-mca. (authored by adibiagio). · Explain WhyAug 9 2019, 4:26 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2019, 4:26 AM

Revision Contents

Path

Size

docs/

CommandGuide/

llvm-mca.rst

37 lines

include/

llvm/

MCA/

CodeEmitter.h

72 lines

lib/

MCA/

CMakeLists.txt

1 line

CodeEmitter.cpp

37 lines

test/

tools/

llvm-mca/

X86/

show-encoding.s

77 lines

tools/

llvm-mca/

Views/

InstructionInfoView.h

13 lines

InstructionInfoView.cpp

31 lines

llvm-mca.cpp

23 lines

Diff 214171

docs/CommandGuide/llvm-mca.rst

	Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
	.. option:: -retire-stats			.. option:: -retire-stats

	Enable extra retire control unit statistics. This view is disabled by default.			Enable extra retire control unit statistics. This view is disabled by default.

	.. option:: -instruction-info			.. option:: -instruction-info

	Enable the instruction info view. This is enabled by default.			Enable the instruction info view. This is enabled by default.

				.. option:: -show-encoding

				Enable the printing of instruction encodings within the instruction info view.

	.. option:: -all-stats			.. option:: -all-stats

	Print all hardware statistics. This enables extra statistics related to the			Print all hardware statistics. This enables extra statistics related to the
	dispatch logic, the hardware schedulers, the register file(s), and the retire			dispatch logic, the hardware schedulers, the register file(s), and the retire
	control unit. This option is disabled by default.			control unit. This option is disabled by default.

	.. option:: -all-views			.. option:: -all-views

	▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines
	In this example, ``uOps per iteration/Block RThroughput`` is 1.50. Since there			In this example, ``uOps per iteration/Block RThroughput`` is 1.50. Since there
	are no loop-carried dependencies, the observed `uOps Per Cycle` is expected to			are no loop-carried dependencies, the observed `uOps Per Cycle` is expected to
	approach 1.50 when the number of iterations tends to infinity. The delta between			approach 1.50 when the number of iterations tends to infinity. The delta between
	the Dispatch Width (2.00), and the theoretical maximum uOp throughput (1.50) is			the Dispatch Width (2.00), and the theoretical maximum uOp throughput (1.50) is
	an indicator of a performance bottleneck caused by the lack of hardware			an indicator of a performance bottleneck caused by the lack of hardware
	resources, and the Resource pressure view can help to identify the problematic			resources, and the Resource pressure view can help to identify the problematic
	resource usage.			resource usage.

	The second section of the report shows the latency and reciprocal			The second section of the report is the `instruction info view`. It shows the
	throughput of every instruction in the sequence. That section also reports			latency and reciprocal throughput of every instruction in the sequence. It also
	extra information related to the number of micro opcodes, and opcode properties			reports extra information related to the number of micro opcodes, and opcode
	(i.e., 'MayLoad', 'MayStore', and 'HasSideEffects').			properties (i.e., 'MayLoad', 'MayStore', and 'HasSideEffects').

	Field RThroughput is the reciprocal of the instruction throughput. Throughput			Field RThroughput is the reciprocal of the instruction throughput. Throughput
	is computed as the maximum number of instructions of a same type that can be			is computed as the maximum number of instructions of a same type that can be
	executed per clock cycle in the absence of operand dependencies. In this			executed per clock cycle in the absence of operand dependencies. In this
	example, the reciprocal throughput of a vector float multiply is 1			example, the reciprocal throughput of a vector float multiply is 1
	cycles/instruction. That is because the FP multiplier JFPM is only available			cycles/instruction. That is because the FP multiplier JFPM is only available
	from pipeline JFPU1.			from pipeline JFPU1.

				Instruction encodings are displayed within the instruction info view when flag
				`-show-encoding` is specified.

				Below is an example of `-show-encoding` output for the dot-product kernel:

				.. code-block:: none

				Instruction Info:
				[1]: #uOps
				[2]: Latency
				[3]: RThroughput
				[4]: MayLoad
				[5]: MayStore
				[6]: HasSideEffects (U)
				[7]: Encoding Size

				[1] [2] [3] [4] [5] [6] [7] Encodings: Instructions:
				1 2 1.00 4 c5 f0 59 d0 vmulps %xmm0, %xmm1, %xmm2
				1 4 1.00 4 c5 eb 7c da vhaddps %xmm2, %xmm2, %xmm3
				1 4 1.00 4 c5 e3 7c e3 vhaddps %xmm3, %xmm3, %xmm4

				The `Encoding Size` column shows the size in bytes of instructions. The
				`Encodings` column shows the actual instruction encodings (byte sequences in
				hex).

	The third section is the Resource pressure view. This view reports			The third section is the Resource pressure view. This view reports
	the average number of resource cycles consumed every iteration by instructions			the average number of resource cycles consumed every iteration by instructions
	for every processor resource unit available on the target. Information is			for every processor resource unit available on the target. Information is
	structured in two tables. The first table reports the number of resource cycles			structured in two tables. The first table reports the number of resource cycles
	spent on average every iteration. The second table correlates the resource			spent on average every iteration. The second table correlates the resource
	cycles to the machine instruction in the sequence. For example, every iteration			cycles to the machine instruction in the sequence. For example, every iteration
	of the instruction vmulps always executes on resource unit [6]			of the instruction vmulps always executes on resource unit [6]
	(JFPU1 - floating point pipeline #1), consuming an average of 1 resource cycle			(JFPU1 - floating point pipeline #1), consuming an average of 1 resource cycle
	▲ Show 20 Lines • Show All 481 Lines • Show Last 20 Lines

include/llvm/MCA/CodeEmitter.h

				//===--------------------- CodeEmitter.h ------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// A utility class used to compute instruction encodings. It buffers encodings
				/// for later usage. It exposes a simple API to compute and get the encodings as
				/// StringRef.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_MCA_CODEEMITTER_H
				#define LLVM_MCA_CODEEMITTER_H

				#include "llvm/ADT/ArrayRef.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/MC/MCAsmBackend.h"
				#include "llvm/MC/MCCodeEmitter.h"
				#include "llvm/MC/MCFixup.h"
				#include "llvm/MC/MCInst.h"
				#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/MCA/Instruction.h"
				#include "llvm/MCA/Support.h"
				#include "llvm/Support/raw_ostream.h"

				#include <string>

				namespace llvm {
				namespace mca {

				/// A utility class used to compute instruction encodings for a code region.
				///
				/// It provides a simple API to compute and return instruction encodings as
				/// strings. Encodings are cached internally for later usage.
				class CodeEmitter {
				const MCSubtargetInfo &STI;
				const MCAsmBackend &MAB;
				const MCCodeEmitter &MCE;

				SmallString<256> Code;
				raw_svector_ostream VecOS;
				ArrayRef<MCInst> Sequence;

				// An EncodingInfo pair stores <base, length> information. Base (i.e. first)
				// is an index to the `Code`. Length (i.e. second) is the encoding size.
				using EncodingInfo = std::pair<unsigned, unsigned>;

				// A cache of encodings.
				SmallVector<EncodingInfo, 16> Encodings;

				EncodingInfo getOrCreateEncodingInfo(unsigned MCID);

				public:
				CodeEmitter(const MCSubtargetInfo &ST, const MCAsmBackend &AB,
				const MCCodeEmitter &CE, ArrayRef<MCInst> S)
				: STI(ST), MAB(AB), MCE(CE), VecOS(Code), Sequence(S),
				Encodings(S.size()) {}

				StringRef getEncoding(unsigned MCID) {
				EncodingInfo EI = getOrCreateEncodingInfo(MCID);
				return StringRef(&Code[EI.first], EI.second);
				}
				};

				} // namespace mca
				} // namespace llvm

				#endif // LLVM_MCA_CODEEMITTER_H

lib/MCA/CMakeLists.txt

	add_llvm_library(LLVMMCA			add_llvm_library(LLVMMCA
				CodeEmitter.cpp
	Context.cpp			Context.cpp
	HWEventListener.cpp			HWEventListener.cpp
	HardwareUnits/HardwareUnit.cpp			HardwareUnits/HardwareUnit.cpp
	HardwareUnits/LSUnit.cpp			HardwareUnits/LSUnit.cpp
	HardwareUnits/RegisterFile.cpp			HardwareUnits/RegisterFile.cpp
	HardwareUnits/ResourceManager.cpp			HardwareUnits/ResourceManager.cpp
	HardwareUnits/RetireControlUnit.cpp			HardwareUnits/RetireControlUnit.cpp
	HardwareUnits/Scheduler.cpp			HardwareUnits/Scheduler.cpp
	Show All 15 Lines

lib/MCA/CodeEmitter.cpp

				//===--------------------- CodeEmitter.cpp ----------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the CodeEmitter API.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/MCA/CodeEmitter.h"

				namespace llvm {
				namespace mca {

				CodeEmitter::EncodingInfo
				CodeEmitter::getOrCreateEncodingInfo(unsigned MCID) {
				EncodingInfo &EI = Encodings[MCID];
				if (EI.second)
				gchateletUnsubmitted Not Done Reply Inline Actions I'm not a huge fan of using the address of `MCInst` as the key in the cache, for instance the following code would be buggy: CodeEmitter CE; MCInst Inst = GetInst(); CE.getOrCreateEncodingInfo(Inst); Inst = GetAnotherInst(); CE.getOrCreateEncodingInfo(Inst); If you still want to use the address for performance reasons you can pass by pointer instead of reference. At least this will be a red flag, gchatelet: I'm not a huge fan of using the address of `MCInst` as the key in the cache, for instance the…
				andreadbAuthorUnsubmitted Done Reply Inline Actions Yes it would be bugged for that case. However, that case is not going to happen in MCA because the sequence of MCInst is constructed once at the beginning, and then it is never changed. It is passed around as an ArrayRef and no module is meant to change any MCInsts or add extra elements to the sequence. That sequence is never invalidated nor changed during the entire execution. So yes, I totally agree with you that using a pointer is ugly and scary. But in practice - for our particular scenario - it works in a fast and easy. That being said, I could use an instruction index. However, that would complicate future patches to the InstrBuilder (which doesn't know anything about the instruction index). I'll see if I can use the instruction index and avoid any controversial change :). andreadb: Yes it would be bugged for that case. However, that case is not going to happen in MCA because…
				gchateletUnsubmitted Not Done Reply Inline Actions It looks much better now IMHO : ) Thx for taking the time to improve on it. gchatelet: It looks much better now IMHO : ) Thx for taking the time to improve on it.
				return EI;

				SmallVector<llvm::MCFixup, 2> Fixups;
				const MCInst &Inst = Sequence[MCID];
				MCInst Relaxed(Sequence[MCID]);
				if (MAB.mayNeedRelaxation(Inst, STI))
				MAB.relaxInstruction(Inst, STI, Relaxed);

				EI.first = Code.size();
				MCE.encodeInstruction(Relaxed, VecOS, Fixups, STI);
				EI.second = Code.size() - EI.first;
				return EI;
				}

				} // namespace mca
				} // namespace llvm

test/tools/llvm-mca/X86/show-encoding.s

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -all-views=false -instruction-info < %s \| FileCheck %s --check-prefix=NORMAL
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -all-views=false -instruction-info -show-encoding=false < %s \| FileCheck %s --check-prefix=NORMAL
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -all-views=false -instruction-info -show-encoding < %s \| FileCheck %s --check-prefix=WITHENCODINGS

				movq 0x170(%rbp), %r10
				lea (%r8,%r8,2), %r9d
				movsx %r9d, %r9
				inc %r8d
				movq 0x178(%rbp), %r11
				vmovups (%r10,%r9,4), %xmm3
				vpslldq $0x4, %xmm3, %xmm2
				vpslldq $0x4, %xmm3, %xmm4
				vaddps %xmm2, %xmm3, %xmm6
				vpslldq $0xc, %xmm3, %xmm5
				vaddps %xmm4, %xmm5, %xmm7
				vaddps %xmm6, %xmm7, %xmm8
				vaddps %xmm8, %xmm0, %xmm9
				vshufps $0xff, %xmm9, %xmm9, %xmm0
				vmovups %xmm9, (%r11,%r9,4)
				cmp %r8d, %esi
				jl -90

				# NORMAL: Instruction Info:
				# NORMAL-NEXT: [1]: #uOps
				# NORMAL-NEXT: [2]: Latency
				# NORMAL-NEXT: [3]: RThroughput
				# NORMAL-NEXT: [4]: MayLoad
				# NORMAL-NEXT: [5]: MayStore
				# NORMAL-NEXT: [6]: HasSideEffects (U)

				# WITHENCODINGS: Instruction Info:
				# WITHENCODINGS-NEXT: [1]: #uOps
				# WITHENCODINGS-NEXT: [2]: Latency
				# WITHENCODINGS-NEXT: [3]: RThroughput
				# WITHENCODINGS-NEXT: [4]: MayLoad
				# WITHENCODINGS-NEXT: [5]: MayStore
				# WITHENCODINGS-NEXT: [6]: HasSideEffects (U)
				# WITHENCODINGS-NEXT: [7]: Encoding Size

				# NORMAL: [1] [2] [3] [4] [5] [6] Instructions:
				# NORMAL-NEXT: 1 3 1.00 * movq 368(%rbp), %r10
				# NORMAL-NEXT: 1 2 1.00 leal (%r8,%r8,2), %r9d
				# NORMAL-NEXT: 1 1 0.50 movslq %r9d, %r9
				# NORMAL-NEXT: 1 1 0.50 incl %r8d
				# NORMAL-NEXT: 1 3 1.00 * movq 376(%rbp), %r11
				# NORMAL-NEXT: 1 5 1.00 * vmovups (%r10,%r9,4), %xmm3
				# NORMAL-NEXT: 1 1 0.50 vpslldq $4, %xmm3, %xmm2
				# NORMAL-NEXT: 1 1 0.50 vpslldq $4, %xmm3, %xmm4
				# NORMAL-NEXT: 1 3 1.00 vaddps %xmm2, %xmm3, %xmm6
				# NORMAL-NEXT: 1 1 0.50 vpslldq $12, %xmm3, %xmm5
				# NORMAL-NEXT: 1 3 1.00 vaddps %xmm4, %xmm5, %xmm7
				# NORMAL-NEXT: 1 3 1.00 vaddps %xmm6, %xmm7, %xmm8
				# NORMAL-NEXT: 1 3 1.00 vaddps %xmm8, %xmm0, %xmm9
				# NORMAL-NEXT: 1 1 0.50 vshufps $255, %xmm9, %xmm9, %xmm0
				# NORMAL-NEXT: 1 1 1.00 * vmovups %xmm9, (%r11,%r9,4)
				# NORMAL-NEXT: 1 1 0.50 cmpl %r8d, %esi
				# NORMAL-NEXT: 1 1 0.50 jl -90

				# WITHENCODINGS: [1] [2] [3] [4] [5] [6] [7] Encodings: Instructions:
				# WITHENCODINGS-NEXT: 1 3 1.00 * 7 4c 8b 95 70 01 00 00 movq 368(%rbp), %r10
				# WITHENCODINGS-NEXT: 1 2 1.00 4 47 8d 0c 40 leal (%r8,%r8,2), %r9d
				# WITHENCODINGS-NEXT: 1 1 0.50 3 4d 63 c9 movslq %r9d, %r9
				# WITHENCODINGS-NEXT: 1 1 0.50 3 41 ff c0 incl %r8d
				# WITHENCODINGS-NEXT: 1 3 1.00 * 7 4c 8b 9d 78 01 00 00 movq 376(%rbp), %r11
				# WITHENCODINGS-NEXT: 1 5 1.00 * 6 c4 81 78 10 1c 8a vmovups (%r10,%r9,4), %xmm3
				# WITHENCODINGS-NEXT: 1 1 0.50 5 c5 e9 73 fb 04 vpslldq $4, %xmm3, %xmm2
				# WITHENCODINGS-NEXT: 1 1 0.50 5 c5 d9 73 fb 04 vpslldq $4, %xmm3, %xmm4
				# WITHENCODINGS-NEXT: 1 3 1.00 4 c5 e0 58 f2 vaddps %xmm2, %xmm3, %xmm6
				# WITHENCODINGS-NEXT: 1 1 0.50 5 c5 d1 73 fb 0c vpslldq $12, %xmm3, %xmm5
				# WITHENCODINGS-NEXT: 1 3 1.00 4 c5 d0 58 fc vaddps %xmm4, %xmm5, %xmm7
				# WITHENCODINGS-NEXT: 1 3 1.00 4 c5 40 58 c6 vaddps %xmm6, %xmm7, %xmm8
				# WITHENCODINGS-NEXT: 1 3 1.00 5 c4 41 78 58 c8 vaddps %xmm8, %xmm0, %xmm9
				# WITHENCODINGS-NEXT: 1 1 0.50 6 c4 c1 30 c6 c1 ff vshufps $255, %xmm9, %xmm9, %xmm0
				# WITHENCODINGS-NEXT: 1 1 1.00 * 6 c4 01 78 11 0c 8b vmovups %xmm9, (%r11,%r9,4)
				# WITHENCODINGS-NEXT: 1 1 0.50 3 44 39 c6 cmpl %r8d, %esi
				# WITHENCODINGS-NEXT: 1 1 0.50 6 0f 8c 00 00 00 00 jl -90

tools/llvm-mca/Views/InstructionInfoView.h

	Show All 34 Lines
	#define LLVM_TOOLS_LLVM_MCA_INSTRUCTIONINFOVIEW_H			#define LLVM_TOOLS_LLVM_MCA_INSTRUCTIONINFOVIEW_H

	#include "Views/View.h"			#include "Views/View.h"
	#include "llvm/ADT/ArrayRef.h"			#include "llvm/ADT/ArrayRef.h"
	#include "llvm/MC/MCInst.h"			#include "llvm/MC/MCInst.h"
	#include "llvm/MC/MCInstPrinter.h"			#include "llvm/MC/MCInstPrinter.h"
	#include "llvm/MC/MCInstrInfo.h"			#include "llvm/MC/MCInstrInfo.h"
	#include "llvm/MC/MCSubtargetInfo.h"			#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/MCA/CodeEmitter.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	#define DEBUG_TYPE "llvm-mca"			#define DEBUG_TYPE "llvm-mca"

	namespace llvm {			namespace llvm {
	namespace mca {			namespace mca {

	/// A view that prints out generic instruction information.			/// A view that prints out generic instruction information.
	class InstructionInfoView : public View {			class InstructionInfoView : public View {
	const llvm::MCSubtargetInfo &STI;			const llvm::MCSubtargetInfo &STI;
	const llvm::MCInstrInfo &MCII;			const llvm::MCInstrInfo &MCII;
				CodeEmitter &CE;
				bool PrintEncodings;
	llvm::ArrayRef<llvm::MCInst> Source;			llvm::ArrayRef<llvm::MCInst> Source;
	llvm::MCInstPrinter &MCIP;			llvm::MCInstPrinter &MCIP;

	public:			public:
	InstructionInfoView(const llvm::MCSubtargetInfo &sti,			InstructionInfoView(const llvm::MCSubtargetInfo &ST,
	const llvm::MCInstrInfo &mcii,			const llvm::MCInstrInfo &II, CodeEmitter &C,
	llvm::ArrayRef<llvm::MCInst> S, llvm::MCInstPrinter &IP)			bool ShouldPrintEncodings, llvm::ArrayRef<llvm::MCInst> S,
	: STI(sti), MCII(mcii), Source(S), MCIP(IP) {}			llvm::MCInstPrinter &IP)
				: STI(ST), MCII(II), CE(C), PrintEncodings(ShouldPrintEncodings),
				Source(S), MCIP(IP) {}

	void printView(llvm::raw_ostream &OS) const override;			void printView(llvm::raw_ostream &OS) const override;
	};			};
	} // namespace mca			} // namespace mca
	} // namespace llvm			} // namespace llvm

	#endif			#endif

tools/llvm-mca/Views/InstructionInfoView.cpp

//===--------------------- InstructionInfoView.cpp --------------- C++ --===//		//===--------------------- InstructionInfoView.cpp --------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
/// \file		/// \file
///		///
/// This file implements the InstructionInfoView API.		/// This file implements the InstructionInfoView API.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Views/InstructionInfoView.h"		#include "Views/InstructionInfoView.h"
		#include "llvm/Support/FormattedStream.h"

namespace llvm {		namespace llvm {
namespace mca {		namespace mca {

void InstructionInfoView::printView(raw_ostream &OS) const {		void InstructionInfoView::printView(raw_ostream &OS) const {
std::string Buffer;		std::string Buffer;
raw_string_ostream TempStream(Buffer);		raw_string_ostream TempStream(Buffer);
const MCSchedModel &SM = STI.getSchedModel();		const MCSchedModel &SM = STI.getSchedModel();

std::string Instruction;		std::string Instruction;
raw_string_ostream InstrStream(Instruction);		raw_string_ostream InstrStream(Instruction);

TempStream << "\n\nInstruction Info:\n";		TempStream << "\n\nInstruction Info:\n";
TempStream << "[1]: #uOps\n[2]: Latency\n[3]: RThroughput\n"		TempStream << "[1]: #uOps\n[2]: Latency\n[3]: RThroughput\n"
<< "[4]: MayLoad\n[5]: MayStore\n[6]: HasSideEffects (U)\n\n";		<< "[4]: MayLoad\n[5]: MayStore\n[6]: HasSideEffects (U)\n";
		if (PrintEncodings) {
		TempStream << "[7]: Encoding Size\n";
		TempStream << "\n[1] [2] [3] [4] [5] [6] [7] "
		<< "Encodings: Instructions:\n";
		} else {
		TempStream << "\n[1] [2] [3] [4] [5] [6] Instructions:\n";
		}

TempStream << "[1] [2] [3] [4] [5] [6] Instructions:\n";		for (unsigned I = 0, E = Source.size(); I < E; ++I) {
for (const MCInst &Inst : Source) {		const MCInst &Inst = Source[I];
const MCInstrDesc &MCDesc = MCII.get(Inst.getOpcode());		const MCInstrDesc &MCDesc = MCII.get(Inst.getOpcode());

// Obtain the scheduling class information from the instruction.		// Obtain the scheduling class information from the instruction.
unsigned SchedClassID = MCDesc.getSchedClass();		unsigned SchedClassID = MCDesc.getSchedClass();
unsigned CPUID = SM.getProcessorID();		unsigned CPUID = SM.getProcessorID();

// Try to solve variant scheduling classes.		// Try to solve variant scheduling classes.
while (SchedClassID && SM.getSchedClassDesc(SchedClassID)->isVariant())		while (SchedClassID && SM.getSchedClassDesc(SchedClassID)->isVariant())
Show All 26 Lines	if (RThroughput.hasValue()) {
TempStream << " ";		TempStream << " ";
else if (RT < 100.0)		else if (RT < 100.0)
TempStream << ' ';		TempStream << ' ';
} else {		} else {
TempStream << " - ";		TempStream << " - ";
}		}
TempStream << (MCDesc.mayLoad() ? " * " : " ");		TempStream << (MCDesc.mayLoad() ? " * " : " ");
TempStream << (MCDesc.mayStore() ? " * " : " ");		TempStream << (MCDesc.mayStore() ? " * " : " ");
TempStream << (MCDesc.hasUnmodeledSideEffects() ? " U " : " ");		TempStream << (MCDesc.hasUnmodeledSideEffects() ? " U " : " ");

		if (PrintEncodings) {
		StringRef Encoding(CE.getEncoding(I));
		unsigned EncodingSize = Encoding.size();
		TempStream << " " << EncodingSize
		<< (EncodingSize < 10 ? " " : " ");
		TempStream.flush();
		formatted_raw_ostream FOS(TempStream);
		for (unsigned i = 0, e = Encoding.size(); i != e; ++i)
		FOS << format("%02x ", (uint8_t)Encoding[i]);
		FOS.PadToColumn(30);
		FOS.flush();
		}

MCIP.printInst(&Inst, InstrStream, "", STI);		MCIP.printInst(&Inst, InstrStream, "", STI);
InstrStream.flush();		InstrStream.flush();

// Consume any tabs or spaces at the beginning of the string.		// Consume any tabs or spaces at the beginning of the string.
StringRef Str(Instruction);		StringRef Str(Instruction);
Str = Str.ltrim();		Str = Str.ltrim();
TempStream << " " << Str << '\n';		TempStream << Str << '\n';
Instruction = "";		Instruction = "";
}		}

TempStream.flush();		TempStream.flush();
OS << Buffer;		OS << Buffer;
}		}
} // namespace mca.		} // namespace mca.
} // namespace llvm		} // namespace llvm

tools/llvm-mca/llvm-mca.cpp

Show All 26 Lines
#include "Views/DispatchStatistics.h"		#include "Views/DispatchStatistics.h"
#include "Views/InstructionInfoView.h"		#include "Views/InstructionInfoView.h"
#include "Views/RegisterFileStatistics.h"		#include "Views/RegisterFileStatistics.h"
#include "Views/ResourcePressureView.h"		#include "Views/ResourcePressureView.h"
#include "Views/RetireControlUnitStatistics.h"		#include "Views/RetireControlUnitStatistics.h"
#include "Views/SchedulerStatistics.h"		#include "Views/SchedulerStatistics.h"
#include "Views/SummaryView.h"		#include "Views/SummaryView.h"
#include "Views/TimelineView.h"		#include "Views/TimelineView.h"
		#include "llvm/MC/MCAsmBackend.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCCodeEmitter.h"		#include "llvm/MC/MCCodeEmitter.h"
		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCObjectFileInfo.h"		#include "llvm/MC/MCObjectFileInfo.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
		#include "llvm/MC/MCTargetOptionsCommandFlags.inc"
		#include "llvm/MCA/CodeEmitter.h"
#include "llvm/MCA/Context.h"		#include "llvm/MCA/Context.h"
#include "llvm/MCA/InstrBuilder.h"		#include "llvm/MCA/InstrBuilder.h"
#include "llvm/MCA/Pipeline.h"		#include "llvm/MCA/Pipeline.h"
#include "llvm/MCA/Stages/EntryStage.h"		#include "llvm/MCA/Stages/EntryStage.h"
#include "llvm/MCA/Stages/InstructionTables.h"		#include "llvm/MCA/Stages/InstructionTables.h"
#include "llvm/MCA/Support.h"		#include "llvm/MCA/Support.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	EnableAllViews("all-views",
cl::desc("Print all views including hardware statistics"),		cl::desc("Print all views including hardware statistics"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

static cl::opt<bool> EnableBottleneckAnalysis(		static cl::opt<bool> EnableBottleneckAnalysis(
"bottleneck-analysis",		"bottleneck-analysis",
cl::desc("Enable bottleneck analysis (disabled by default)"),		cl::desc("Enable bottleneck analysis (disabled by default)"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

		static cl::opt<bool> ShowEncoding(
		"show-encoding",
		cl::desc("Print encoding information in the instruction info view"),
		cl::cat(ViewOptions), cl::init(false));

namespace {		namespace {

const Target getTarget(const char ProgName) {		const Target getTarget(const char ProgName) {
if (TripleName.empty())		if (TripleName.empty())
TripleName = Triple::normalize(sys::getDefaultTargetTriple());		TripleName = Triple::normalize(sys::getDefaultTargetTriple());
Triple TheTriple(TripleName);		Triple TheTriple(TripleName);

// Get the target specific parser.		// Get the target specific parser.
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
// Enable printing of available targets when flag --version is specified.		// Enable printing of available targets when flag --version is specified.
cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);		cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);

cl::HideUnrelatedOptions({&ToolOptions, &ViewOptions});		cl::HideUnrelatedOptions({&ToolOptions, &ViewOptions});

// Parse flags and initialize target options.		// Parse flags and initialize target options.
cl::ParseCommandLineOptions(argc, argv,		cl::ParseCommandLineOptions(argc, argv,
"llvm machine code performance analyzer.\n");		"llvm machine code performance analyzer.\n");

RKSimonUnsubmitted Not Done Reply Inline Actions Add this back. RKSimon: Add this back.
// Get the target from the triple. If a triple is not specified, then select		// Get the target from the triple. If a triple is not specified, then select
// the default triple for the host. If the triple doesn't correspond to any		// the default triple for the host. If the triple doesn't correspond to any
// registered target, then exit with an error message.		// registered target, then exit with an error message.
const char *ProgName = argv[0];		const char *ProgName = argv[0];
const Target *TheTarget = getTarget(ProgName);		const Target *TheTarget = getTarget(ProgName);
if (!TheTarget)		if (!TheTarget)
return 1;		return 1;

▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {

mca::PipelineOptions PO(MicroOpQueue, DecoderThroughput, DispatchWidth,		mca::PipelineOptions PO(MicroOpQueue, DecoderThroughput, DispatchWidth,
RegisterFileSize, LoadQueueSize, StoreQueueSize,		RegisterFileSize, LoadQueueSize, StoreQueueSize,
AssumeNoAlias, EnableBottleneckAnalysis);		AssumeNoAlias, EnableBottleneckAnalysis);

// Number each region in the sequence.		// Number each region in the sequence.
unsigned RegionIdx = 0;		unsigned RegionIdx = 0;

		std::unique_ptr<MCCodeEmitter> MCE(
		TheTarget->createMCCodeEmitter(MCII, MRI, Ctx));

		std::unique_ptr<MCAsmBackend> MAB(TheTarget->createMCAsmBackend(
		STI, MRI, InitMCTargetOptionsFromFlags()));

for (const std::unique_ptr<mca::CodeRegion> &Region : Regions) {		for (const std::unique_ptr<mca::CodeRegion> &Region : Regions) {
// Skip empty code regions.		// Skip empty code regions.
if (Region->empty())		if (Region->empty())
continue;		continue;

// Don't print the header of this region if it is the default region, and		// Don't print the header of this region if it is the default region, and
// it doesn't have an end location.		// it doesn't have an end location.
if (Region->startLoc().isValid() \|\| Region->endLoc().isValid()) {		if (Region->startLoc().isValid() \|\| Region->endLoc().isValid()) {
TOF->os() << "\n[" << RegionIdx++ << "] Code Region";		TOF->os() << "\n[" << RegionIdx++ << "] Code Region";
StringRef Desc = Region->getDescription();		StringRef Desc = Region->getDescription();
if (!Desc.empty())		if (!Desc.empty())
TOF->os() << " - " << Desc;		TOF->os() << " - " << Desc;
TOF->os() << "\n\n";		TOF->os() << "\n\n";
}		}

// Lower the MCInst sequence into an mca::Instruction sequence.		// Lower the MCInst sequence into an mca::Instruction sequence.
ArrayRef<MCInst> Insts = Region->getInstructions();		ArrayRef<MCInst> Insts = Region->getInstructions();
		mca::CodeEmitter CE(STI, MAB, *MCE, Insts);
std::vector<std::unique_ptr<mca::Instruction>> LoweredSequence;		std::vector<std::unique_ptr<mca::Instruction>> LoweredSequence;
for (const MCInst &MCI : Insts) {		for (const MCInst &MCI : Insts) {
Expected<std::unique_ptr<mca::Instruction>> Inst =		Expected<std::unique_ptr<mca::Instruction>> Inst =
IB.createInstruction(MCI);		IB.createInstruction(MCI);
if (!Inst) {		if (!Inst) {
if (auto NewE = handleErrors(		if (auto NewE = handleErrors(
Inst.takeError(),		Inst.takeError(),
[&IP, &STI](const mca::InstructionError<MCInst> &IE) {		[&IP, &STI](const mca::InstructionError<MCInst> &IE) {
Show All 21 Lines	if (PrintInstructionTables) {
auto P = llvm::make_unique<mca::Pipeline>();		auto P = llvm::make_unique<mca::Pipeline>();
P->appendStage(llvm::make_unique<mca::EntryStage>(S));		P->appendStage(llvm::make_unique<mca::EntryStage>(S));
P->appendStage(llvm::make_unique<mca::InstructionTables>(SM));		P->appendStage(llvm::make_unique<mca::InstructionTables>(SM));
mca::PipelinePrinter Printer(*P);		mca::PipelinePrinter Printer(*P);

// Create the views for this pipeline, execute, and emit a report.		// Create the views for this pipeline, execute, and emit a report.
if (PrintInstructionInfoView) {		if (PrintInstructionInfoView) {
Printer.addView(llvm::make_unique<mca::InstructionInfoView>(		Printer.addView(llvm::make_unique<mca::InstructionInfoView>(
STI, MCII, Insts, *IP));		STI, MCII, CE, ShowEncoding, Insts, *IP));
}		}
Printer.addView(		Printer.addView(
llvm::make_unique<mca::ResourcePressureView>(STI, IP, Insts));		llvm::make_unique<mca::ResourcePressureView>(STI, IP, Insts));

if (!runPipeline(*P))		if (!runPipeline(*P))
return 1;		return 1;

Printer.printReport(TOF->os());		Printer.printReport(TOF->os());
Show All 9 Lines	if (PrintSummaryView)
llvm::make_unique<mca::SummaryView>(SM, Insts, DispatchWidth));		llvm::make_unique<mca::SummaryView>(SM, Insts, DispatchWidth));

if (EnableBottleneckAnalysis) {		if (EnableBottleneckAnalysis) {
Printer.addView(llvm::make_unique<mca::BottleneckAnalysis>(		Printer.addView(llvm::make_unique<mca::BottleneckAnalysis>(
STI, IP, Insts, S.getNumIterations()));		STI, IP, Insts, S.getNumIterations()));
}		}

if (PrintInstructionInfoView)		if (PrintInstructionInfoView)
Printer.addView(		Printer.addView(llvm::make_unique<mca::InstructionInfoView>(
llvm::make_unique<mca::InstructionInfoView>(STI, MCII, Insts, *IP));		STI, MCII, CE, ShowEncoding, Insts, *IP));

if (PrintDispatchStats)		if (PrintDispatchStats)
Printer.addView(llvm::make_unique<mca::DispatchStatistics>());		Printer.addView(llvm::make_unique<mca::DispatchStatistics>());

if (PrintSchedulerStats)		if (PrintSchedulerStats)
Printer.addView(llvm::make_unique<mca::SchedulerStatistics>(*STI));		Printer.addView(llvm::make_unique<mca::SchedulerStatistics>(*STI));

if (PrintRetireStats)		if (PrintRetireStats)
Show All 29 Lines