This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Target/
-
llvm/
-
Target/
-
TargetSchedule.td
-
lib/Target/X86/
-
Target/
-
X86/
3
X86ScheduleBtVer2.td
-
test/tools/llvm-mca/X86/BtVer2/
-
tools/
-
llvm-mca/
-
X86/
-
BtVer2/
-
partial-reg-update-2.s
-
partial-reg-update-3.s
-
partial-reg-update-4.s
-
partial-reg-update-5.s
-
partial-reg-update-6.s
-
tools/llvm-mca/
-
llvm-mca/
-
Instruction.h
-
Instruction.cpp
-
RegisterFile.h
-
RegisterFile.cpp

Differential D49196

[llvm-mca][BtVer2] teach how to identify false dependencies on partially written registers.
ClosedPublic

Authored by andreadb on Jul 11 2018, 9:25 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
courbet
mattd
craig.topper
lebedev.ri

Commits

rGff630c2cdc7f: [llvm-mca][BtVer2] teach how to identify false dependencies on partially…
rL337123: [llvm-mca][BtVer2] teach how to identify false dependencies on partially written

Summary

The goal of this patch is to improve the throughput analysis in llvm-mca for the case where instructions in the input assembly sequence perform partial register writes.

On x86, partial register writes are quite difficult to model, mainly because different processors tend to implement different register merging schemes in hardware.

When the code contains partial register writes, the IPC (instructions per cycles) estimated by llvm-mca tends to diverge quite significantly from the observed IPC (using perf).

Modern AMD processors (at least, from Bulldozer onwards) don't rename partial registers. Quoting Agner Fog's microarchitecture.pdf:

The processor always keeps the different parts of an integer register together. For example, AL and AH are not treated as independent by the out-of-order execution mechanism. An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it.

This patch is a first important step towards improving the analysis of partial register updates.

This patch changes the semantic of RegisterFile descriptors in tablegen, and teaches llvm-mca how to identify false dependences in the presence of partial register writes (for more details: see the new code comments in include/Target/TargetSchedule.h - class RegisterFile).

This patch doesn't address the case where a write to a part of a register is followed by a read from the whole register.
On Intel chips, high8 registers (AH/BH/CH/DH)) can be stored in separate physical registers. However, a later (dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which adds extra latency (and potentially affects the pipe usage).
This is a very interesting article about partial register writes on Intel chips here: https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to

In future, the definition of RegisterFile can be extended with extra information that may be used to identify cases where a register read is slowed down by a merge of a partial write.

Please let me know if okay to commit.

-Andrea

Diff Detail

Repository: rL LLVM

Event Timeline

andreadb created this revision.Jul 11 2018, 9:25 AM

Herald added subscribers: gbedwell, tschuett. · View Herald TranscriptJul 11 2018, 9:25 AM

andreadb edited the summary of this revision. (Show Details)Jul 11 2018, 9:27 AM

mattd added inline comments.Jul 11 2018, 11:04 AM

include/llvm/Target/TargetSchedule.td
506 ↗	(On Diff #155014)	s/wit/with/
tools/llvm-mca/Instruction.cpp
143 ↗	(On Diff #155014)	Style: You could just use WriteLatency in place of the call to getCyclesLeft.
tools/llvm-mca/RegisterFile.cpp
29 ↗	(On Diff #155014)	I think you can treat the {0,1U} as a call to the IndexPlusCostPairTy constructor. That would be a bit easier to read.
99 ↗	(On Diff #155014)	We should note that index 0 is a special case, and probably have a similar comment in RegisterFile.h where IndexPlusCostPairTy is declared.
153 ↗	(On Diff #155014)	Style: Remove a line of white space.
243 ↗	(On Diff #155014)	This block looks duplicated from line 233 above. It looks like we will invalidate the supers of RegID twice if WS are clearing super regs.
tools/llvm-mca/RegisterFile.h
155 ↗	(On Diff #155014)	Add 'to' to the following phrase: ... allows us to classify ...

RKSimon added inline comments.Jul 11 2018, 11:12 AM

tools/llvm-mca/RegisterFile.cpp
155 ↗	(On Diff #155014)	Use RRI.RenameAs direct and make RRI const?
331 ↗	(On Diff #155014)	Is this worth it? Just use RRI.IndexPlusCost directly below?

Patch updated.

Addressed review comments.

andreadb added inline comments.Jul 12 2018, 3:48 AM

tools/llvm-mca/RegisterFile.cpp
99 ↗	(On Diff #155014)	This is already explained in a couple of places in RegisterFile.h. See the comments on field `RegisterFiles`, and type `RegisterMapping`. If possible, I prefer not to add another comment about it.
243 ↗	(On Diff #155014)	The previous block is invalidating sub-registers of RegID. This block invalidates super-registers of RegID. So, we cannot remove it.

I'm happy with the changes made, and don't see anything obviously wrong, but you might want to wait a day/weekend to see if anyone else has anything to add. LGTM!

This revision is now accepted and ready to land.Jul 13 2018, 9:51 AM

mattd mentioned this in D49310: [llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms..Jul 13 2018, 3:35 PM

lebedev.ri added inline comments.Jul 14 2018, 6:19 AM

include/llvm/Target/TargetSchedule.td
465–466 ↗	(On Diff #155147)	This should probably be a TODO. "Software Optimization Guide for AMD Family 17h Processors" http://developer.amd.com/wordpress/media/2013/12/55723_SOG_Fam_17h_Processors_3.00.pdf "34 Microarchitecture of AMD Family 17h Processor Chapter 2" "2.11 Floating-Point Unit" It can handle dispatch and renaming of 4 floating point micro ops per cycle So maybe having one single `DispatchWidth` per `ProcResGroup` and using that for both the dispatch and renaming is the way to go.

LGTM with a couple of final minor docs/comments fixes.

include/llvm/Target/TargetSchedule.td
465–466 ↗	(On Diff #155147)	Yes, please put a TODO in front of this paragraph: TODO: This implementation currently assumes that there is no limit in the number of renames per cycle, which might not be true for all hardware or register classes.
lib/Target/X86/X86ScheduleBtVer2.td
50 ↗	(On Diff #155147)	Please can you check the section numbers - it doesn't seem to match the latest version of Agner's doc - as he seems to reorder things with each release, adding the full secton/subsection titles might make sense here.

Closed by commit rL337123: [llvm-mca][BtVer2] teach how to identify false dependencies on partially written (authored by adibiagio). · Explain WhyJul 15 2018, 4:06 AM

This revision was automatically updated to reflect the committed changes.

ZnVer1 will need updating, too.
I could take a look, but i have an immediate question about FP registers:

llvm/trunk/lib/Target/X86/X86ScheduleBtVer2.td
56	A bit late, but i'm not sure about the FP registers. But the high and low 64 bits of a 128 bit register are treated as independent on Bobcat, and the high and low 128 bits of a 256 bit register are treated as independent on Jaguar. Does that mean that it should be `def JFpuPRF: RegisterFile<72, [VR128, VR256], [1, 1, 2]>;` ? Or does that mean `on Bobcat <and later>`?

andreadb added inline comments.Jul 16 2018, 8:09 AM

llvm/trunk/lib/Target/X86/X86ScheduleBtVer2.td
56	Register class VR64 is the class used by x87/mmx registers. Those eight registers don't alias with XMM/YMM registers, and they are subject to register renaming. Bobcat provides native support for 64-bit data types, but not 128-bit data types. So, operations on 128-bit data types are split into 2x64-bit pairs. Internally, the PRF consumes two physical registers to map a single XMM write. Agner describes this in Section 21.1: `The Bobcat has 64-bit physical registers and uses two such registers to save a 128-bit vector.` I hope it helps. -Andrea

lebedev.ri added inline comments.Jul 16 2018, 8:59 AM

llvm/trunk/lib/Target/X86/X86ScheduleBtVer2.td
56	Thank you. Yes, i think it did. Basically, i think it confirmed that the FP `RegisterFile` for both the btver2, and znver1 is correct. So i just need to adjust the integer `RegisterFile` on znver1.

lebedev.ri mentioned this in D49393: [NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on partially written registers..Jul 16 2018, 11:48 AM

Diffusion mentioned this in rL337676: [NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on….Jul 23 2018, 3:10 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetSchedule.td

63 lines

lib/

Target/

X86/

X86ScheduleBtVer2.td

9 lines

test/

tools/

llvm-mca/

X86/

BtVer2/

partial-reg-update-2.s

15 lines

partial-reg-update-3.s

35 lines

partial-reg-update-4.s

30 lines

partial-reg-update-5.s

14 lines

partial-reg-update-6.s

32 lines

tools/

llvm-mca/

12 lines

25 lines

35 lines

135 lines

Diff 155583

llvm/trunk/include/llvm/Target/TargetSchedule.td

	Show First 20 Lines • Show All 447 Lines • ▼ Show 20 Lines
	// SchedModel will usually be provided by surrounding let statement			// SchedModel will usually be provided by surrounding let statement
	// and ties this SchedAlias mapping to a processor.			// and ties this SchedAlias mapping to a processor.
	class SchedAlias<SchedReadWrite match, SchedReadWrite alias> {			class SchedAlias<SchedReadWrite match, SchedReadWrite alias> {
	SchedReadWrite MatchRW = match;			SchedReadWrite MatchRW = match;
	SchedReadWrite AliasRW = alias;			SchedReadWrite AliasRW = alias;
	SchedMachineModel SchedModel = ?;			SchedMachineModel SchedModel = ?;
	}			}

	// Allow the definition of processor register files.			// Allow the definition of processor register files for register renaming
	// Each processor register file declares the number of physical registers, as			// purposes.
	// well as a optional register cost information. The cost of a register R is the			//
	// number of physical registers used to rename R (at register renaming stage).			// Each processor register file declares:
	// That value defaults to 1, to all the registers contained in the register			// - The set of registers that can be renamed.
	// file. The set of target register files is inferred from the list of register			// - The number of physical registers which can be used for register renaming
	// classes. Register costs are defined at register class granularity. An empty			// purpose.
	// list of register classes means that this register file contains all the			// - The cost of a register rename.
	// registers defined by the target.			//
				// The cost of a rename is the number of physical registers allocated by the
				// register alias table to map the new definition. By default, register can be
				// renamed at the cost of a single physical register. Note that register costs
				// are defined at register class granularity (see field `Costs`).
				//
				// The set of registers that are subject to register renaming is declared using
				// a list of register classes (see field `RegClasses`). An empty list of
				// register classes means: all the logical registers defined by the target can
				// be fully renamed.
				//
				// A register R can be renamed if its register class appears in the `RegClasses`
				// set. When R is written, a new alias is allocated at the cost of one or more
				// physical registers; as a result, false dependencies on R are removed.
				//
				// A sub-register V of register R is implicitly part of the same register file.
				// However, V is only renamed if its register class is part of `RegClasses`.
				// Otherwise, the processor keeps it (as well as any other different part
				// of R) together with R, and a write of V always causes a compulsory read of R.
				//
				// This is what happens for example on AMD processors (at least from Bulldozer
				// onwards), where AL and AH are not treated as independent from AX, and AX is
				// not treated as independent from EAX. A write to AL has an implicity false
				// dependency on the last write to EAX (or a portion of EAX). As a consequence,
				// a write to AL cannot go in parallel with a write to AH.
				//
				// There is no false dependency if the partial register write belongs to a
				// register class that is in `RegClasses`.
				// There is also no penalty for writes that "clear the content a super-register"
				// (see MC/MCInstrAnalysis.h - method MCInstrAnalysis::clearsSuperRegisters()).
				// On x86-64, 32-bit GPR writes implicitly zero the upper half of the underlying
				// physical register, effectively removing any false dependencies with the
				// previous register definition.
				//
				// TODO: This implementation assumes that there is no limit in the number of
				// renames per cycle, which might not be true for all hardware or register
				// classes. Also, there is no limit to how many times the same logical register
				// can be renamed during the same cycle.
				//
				// TODO: we don't currently model merge penalties for the case where a write to
				// a part of a register is followed by a read from a larger part of the same
				// register. On some Intel chips, different parts of a GPR can be stored in
				// different physical registers. However, there is a cost to pay for when the
				// partial write is combined with the previous super-register definition. We
				// should add support for these cases, and correctly model merge problems with
				// partial register accesses.
	class RegisterFile<int numPhysRegs, list<RegisterClass> Classes = [],			class RegisterFile<int numPhysRegs, list<RegisterClass> Classes = [],
	list<int> Costs = []> {			list<int> Costs = []> {
	list<RegisterClass> RegClasses = Classes;			list<RegisterClass> RegClasses = Classes;
	list<int> RegCosts = Costs;			list<int> RegCosts = Costs;
	int NumPhysRegs = numPhysRegs;			int NumPhysRegs = numPhysRegs;
	SchedMachineModel SchedModel = ?;			SchedMachineModel SchedModel = ?;
	}			}

	Show All 36 Lines

llvm/trunk/lib/Target/X86/X86ScheduleBtVer2.td

	Show All 35 Lines
	def JLAGU : ProcResource<1>; // Integer Pipe2: LAGU			def JLAGU : ProcResource<1>; // Integer Pipe2: LAGU
	def JSAGU : ProcResource<1>; // Integer Pipe3: SAGU (also handles 3-operand LEA)			def JSAGU : ProcResource<1>; // Integer Pipe3: SAGU (also handles 3-operand LEA)
	def JFPU0 : ProcResource<1>; // Vector/FPU Pipe0: VALU0/VIMUL/FPA			def JFPU0 : ProcResource<1>; // Vector/FPU Pipe0: VALU0/VIMUL/FPA
	def JFPU1 : ProcResource<1>; // Vector/FPU Pipe1: VALU1/STC/FPM			def JFPU1 : ProcResource<1>; // Vector/FPU Pipe1: VALU1/STC/FPM

	// The Integer PRF for Jaguar is 64 entries, and it holds the architectural and			// The Integer PRF for Jaguar is 64 entries, and it holds the architectural and
	// speculative version of the 64-bit integer registers.			// speculative version of the 64-bit integer registers.
	// Reference: www.realworldtech.com/jaguar/4/			// Reference: www.realworldtech.com/jaguar/4/
	def JIntegerPRF : RegisterFile<64, [GR8, GR16, GR32, GR64, CCR]>;			//
				// The processor always keeps the different parts of an integer register
				// together. An instruction that writes to a part of a register will therefore
				// have a false dependence on any previous write to the same register or any
				// part of it.
				// Reference: Section 21.10 "AMD Bobcat and Jaguar pipeline: Partial register
				// access" - Agner Fog's "microarchitecture.pdf".
				def JIntegerPRF : RegisterFile<64, [GR64, CCR]>;

	// The Jaguar FP Retire Queue renames SIMD and FP uOps onto a pool of 72 SSE			// The Jaguar FP Retire Queue renames SIMD and FP uOps onto a pool of 72 SSE
	// registers. Operations on 256-bit data types are cracked into two COPs.			// registers. Operations on 256-bit data types are cracked into two COPs.
	// Reference: www.realworldtech.com/jaguar/4/			// Reference: www.realworldtech.com/jaguar/4/
	def JFpuPRF: RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>;			def JFpuPRF: RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>;
				lebedev.riUnsubmitted Not Done Reply Inline Actions A bit late, but i'm not sure about the FP registers. But the high and low 64 bits of a 128 bit register are treated as independent on Bobcat, and the high and low 128 bits of a 256 bit register are treated as independent on Jaguar. Does that mean that it should be `def JFpuPRF: RegisterFile<72, [VR128, VR256], [1, 1, 2]>;` ? Or does that mean `on Bobcat <and later>`? lebedev.ri: A bit late, but i'm not sure about the FP registers. ``` But the high and low 64 bits of a 128…
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions Register class VR64 is the class used by x87/mmx registers. Those eight registers don't alias with XMM/YMM registers, and they are subject to register renaming. Bobcat provides native support for 64-bit data types, but not 128-bit data types. So, operations on 128-bit data types are split into 2x64-bit pairs. Internally, the PRF consumes two physical registers to map a single XMM write. Agner describes this in Section 21.1: `The Bobcat has 64-bit physical registers and uses two such registers to save a 128-bit vector.` I hope it helps. -Andrea andreadb: Register class VR64 is the class used by x87/mmx registers. Those eight registers don't alias…
				lebedev.riUnsubmitted Not Done Reply Inline Actions Thank you. Yes, i think it did. Basically, i think it confirmed that the FP `RegisterFile` for both the btver2, and znver1 is correct. So i just need to adjust the integer `RegisterFile` on znver1. lebedev.ri: Thank you. Yes, i think it did. Basically, i think it confirmed that the FP `RegisterFile` for…

	// The retire control unit (RCU) can track up to 64 macro-ops in-flight. It can			// The retire control unit (RCU) can track up to 64 macro-ops in-flight. It can
	// retire up to two macro-ops per cycle.			// retire up to two macro-ops per cycle.
	// Reference: "Software Optimization Guide for AMD Family 16h Processors"			// Reference: "Software Optimization Guide for AMD Family 16h Processors"
	def JRCU : RetireControlUnit<64, 2>;			def JRCU : RetireControlUnit<64, 2>;

	// Integer Pipe Scheduler			// Integer Pipe Scheduler
	def JALU01 : ProcResGroup<[JALU0, JALU1]> {			def JALU01 : ProcResGroup<[JALU0, JALU1]> {
	▲ Show 20 Lines • Show All 603 Lines • Show Last 20 Lines

llvm/trunk/test/tools/llvm-mca/X86/BtVer2/partial-reg-update-2.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1 -resource-pressure=false -timeline < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1 -resource-pressure=false -timeline < %s \| FileCheck %s

	imul %rax, %rbx			imul %rax, %rbx
	lzcnt %ax, %bx			lzcnt %ax, %bx
	add %ecx, %ebx			add %ecx, %ebx

	# CHECK: Iterations: 1			# CHECK: Iterations: 1
	# CHECK-NEXT: Instructions: 3			# CHECK-NEXT: Instructions: 3
	# CHECK-NEXT: Total Cycles: 10			# CHECK-NEXT: Total Cycles: 11
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 0.30			# CHECK-NEXT: IPC: 0.27
	# CHECK-NEXT: Block RThroughput: 4.0			# CHECK-NEXT: Block RThroughput: 4.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	# CHECK-NEXT: [6]: HasSideEffects (U)			# CHECK-NEXT: [6]: HasSideEffects (U)

	# CHECK: [1] [2] [3] [4] [5] [6] Instructions:			# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
	# CHECK-NEXT: 2 6 4.00 imulq %rax, %rbx			# CHECK-NEXT: 2 6 4.00 imulq %rax, %rbx
	# CHECK-NEXT: 1 1 0.50 lzcntw %ax, %bx			# CHECK-NEXT: 1 1 0.50 lzcntw %ax, %bx
	# CHECK-NEXT: 1 1 0.50 addl %ecx, %ebx			# CHECK-NEXT: 1 1 0.50 addl %ecx, %ebx

	# CHECK: Timeline view:			# CHECK: Timeline view:
				# CHECK-NEXT: 0
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeeeeeER. imulq %rax, %rbx			# CHECK: [0,0] DeeeeeeER . imulq %rax, %rbx
	# CHECK-NEXT: [0,1] .DeE----R. lzcntw %ax, %bx			# CHECK-NEXT: [0,1] .D=====eER. lzcntw %ax, %bx
	# CHECK-NEXT: [0,2] .D=====eER addl %ecx, %ebx			# CHECK-NEXT: [0,2] .D======eER addl %ecx, %ebx

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 1 1.0 1.0 0.0 imulq %rax, %rbx			# CHECK-NEXT: 0. 1 1.0 1.0 0.0 imulq %rax, %rbx
	# CHECK-NEXT: 1. 1 1.0 1.0 4.0 lzcntw %ax, %bx			# CHECK-NEXT: 1. 1 6.0 0.0 0.0 lzcntw %ax, %bx
	# CHECK-NEXT: 2. 1 6.0 0.0 0.0 addl %ecx, %ebx			# CHECK-NEXT: 2. 1 7.0 0.0 0.0 addl %ecx, %ebx

llvm/trunk/test/tools/llvm-mca/X86/BtVer2/partial-reg-update-3.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s

	# perf stat reports a throughput of 1.00 IPC for this code snippet.			# perf stat reports a throughput of 1.00 IPC for this code snippet.

	# The ILP is limited by the false dependency on %dx. So, the mov cannot execute			# The ILP is limited by the false dependency on %dx. So, the mov cannot execute
	# in parallel with the add.			# in parallel with the add.

	add %cx, %dx			add %cx, %dx
	mov %ax, %dx			mov %ax, %dx
	xor %bx, %dx			xor %bx, %dx

	# CHECK: Iterations: 1500			# CHECK: Iterations: 1500
	# CHECK-NEXT: Instructions: 4500			# CHECK-NEXT: Instructions: 4500
	# CHECK-NEXT: Total Cycles: 2254			# CHECK-NEXT: Total Cycles: 4503
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 2.00			# CHECK-NEXT: IPC: 1.00
	# CHECK-NEXT: Block RThroughput: 1.5			# CHECK-NEXT: Block RThroughput: 1.5

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 21 Lines
	# CHECK-NEXT: [13] - JVIMUL			# CHECK-NEXT: [13] - JVIMUL

	# CHECK: Resource pressure per iteration:			# CHECK: Resource pressure per iteration:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
	# CHECK-NEXT: 1.50 1.50 - - - - - - - - - - - -			# CHECK-NEXT: 1.50 1.50 - - - - - - - - - - - -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: 1.00 - - - - - - - - - - - - - addw %cx, %dx			# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - addw %cx, %dx
	# CHECK-NEXT: - 1.00 - - - - - - - - - - - - movw %ax, %dx			# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - movw %ax, %dx
	# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xorw %bx, %dx			# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - xorw %bx, %dx

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: Index 012345678			# CHECK-NEXT: 01
				# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeER . . addw %cx, %dx			# CHECK: [0,0] DeER . .. addw %cx, %dx
	# CHECK-NEXT: [0,1] DeER . . movw %ax, %dx			# CHECK-NEXT: [0,1] D=eER. .. movw %ax, %dx
	# CHECK-NEXT: [0,2] .DeER. . xorw %bx, %dx			# CHECK-NEXT: [0,2] .D=eER .. xorw %bx, %dx
	# CHECK-NEXT: [1,0] .D=eER . addw %cx, %dx			# CHECK-NEXT: [1,0] .D==eER .. addw %cx, %dx
	# CHECK-NEXT: [1,1] . DeER . movw %ax, %dx			# CHECK-NEXT: [1,1] . D==eER .. movw %ax, %dx
	# CHECK-NEXT: [1,2] . D=eER . xorw %bx, %dx			# CHECK-NEXT: [1,2] . D===eER .. xorw %bx, %dx
	# CHECK-NEXT: [2,0] . D=eER. addw %cx, %dx			# CHECK-NEXT: [2,0] . D===eER.. addw %cx, %dx
	# CHECK-NEXT: [2,1] . DeE-R. movw %ax, %dx			# CHECK-NEXT: [2,1] . D====eER. movw %ax, %dx
	# CHECK-NEXT: [2,2] . DeE-R xorw %bx, %dx			# CHECK-NEXT: [2,2] . D====eER xorw %bx, %dx

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 1.7 0.3 0.0 addw %cx, %dx			# CHECK-NEXT: 0. 3 2.7 0.3 0.0 addw %cx, %dx
	# CHECK-NEXT: 1. 3 1.0 1.0 0.3 movw %ax, %dx			# CHECK-NEXT: 1. 3 3.3 0.0 0.0 movw %ax, %dx
	# CHECK-NEXT: 2. 3 1.3 0.0 0.3 xorw %bx, %dx			# CHECK-NEXT: 2. 3 3.7 0.0 0.0 xorw %bx, %dx

llvm/trunk/test/tools/llvm-mca/X86/BtVer2/partial-reg-update-4.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s

	# perf stat reports a throughput of 0.60 IPC for this code snippet.			# perf stat reports a throughput of 0.60 IPC for this code snippet.

	# The lzcnt cannot execute in parallel with the imul because there is a false			# The lzcnt cannot execute in parallel with the imul because there is a false
	# dependency on %bx.			# dependency on %bx.

	imul %ax, %bx			imul %ax, %bx
	lzcnt %ax, %bx			lzcnt %ax, %bx
	add %cx, %bx			add %cx, %bx

	# CHECK: Iterations: 1500			# CHECK: Iterations: 1500
	# CHECK-NEXT: Instructions: 4500			# CHECK-NEXT: Instructions: 4500
	# CHECK-NEXT: Total Cycles: 3006			# CHECK-NEXT: Total Cycles: 7503
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 1.50			# CHECK-NEXT: IPC: 0.60
	# CHECK-NEXT: Block RThroughput: 2.0			# CHECK-NEXT: Block RThroughput: 2.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 26 Lines

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imulw %ax, %bx			# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imulw %ax, %bx
	# CHECK-NEXT: 1.00 - - - - - - - - - - - - - lzcntw %ax, %bx			# CHECK-NEXT: 1.00 - - - - - - - - - - - - - lzcntw %ax, %bx
	# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - addw %cx, %bx			# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - addw %cx, %bx

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 01			# CHECK-NEXT: 01234567
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeeER .. imulw %ax, %bx			# CHECK: [0,0] DeeeER . . . imulw %ax, %bx
	# CHECK-NEXT: [0,1] .DeE-R .. lzcntw %ax, %bx			# CHECK-NEXT: [0,1] .D==eER . . . lzcntw %ax, %bx
	# CHECK-NEXT: [0,2] .D=eE-R .. addw %cx, %bx			# CHECK-NEXT: [0,2] .D===eER . . . addw %cx, %bx
	# CHECK-NEXT: [1,0] . D=eeeER .. imulw %ax, %bx			# CHECK-NEXT: [1,0] . D===eeeER . . imulw %ax, %bx
	# CHECK-NEXT: [1,1] . DeE--R .. lzcntw %ax, %bx			# CHECK-NEXT: [1,1] . D=====eER . . lzcntw %ax, %bx
	# CHECK-NEXT: [1,2] . D=eE--R.. addw %cx, %bx			# CHECK-NEXT: [1,2] . D======eER . . addw %cx, %bx
	# CHECK-NEXT: [2,0] . D=eeeER. imulw %ax, %bx			# CHECK-NEXT: [2,0] . D======eeeER . imulw %ax, %bx
	# CHECK-NEXT: [2,1] . DeE--R. lzcntw %ax, %bx			# CHECK-NEXT: [2,1] . D========eER. lzcntw %ax, %bx
	# CHECK-NEXT: [2,2] . D=eE--R addw %cx, %bx			# CHECK-NEXT: [2,2] . D=========eER addw %cx, %bx

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 1.7 0.3 0.0 imulw %ax, %bx			# CHECK-NEXT: 0. 3 4.0 0.3 0.0 imulw %ax, %bx
	# CHECK-NEXT: 1. 3 1.0 1.0 1.7 lzcntw %ax, %bx			# CHECK-NEXT: 1. 3 6.0 0.0 0.0 lzcntw %ax, %bx
	# CHECK-NEXT: 2. 3 2.0 0.0 1.7 addw %cx, %bx			# CHECK-NEXT: 2. 3 7.0 0.0 0.0 addw %cx, %bx

llvm/trunk/test/tools/llvm-mca/X86/BtVer2/partial-reg-update-5.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s

	# perf stat reports a throughput of 1.00 IPC for this code snippet.			# perf stat reports a throughput of 1.00 IPC for this code snippet.

	lzcnt %ax, %bx ## partial register stall.			lzcnt %ax, %bx ## partial register stall.

	# CHECK: Iterations: 1500			# CHECK: Iterations: 1500
	# CHECK-NEXT: Instructions: 1500			# CHECK-NEXT: Instructions: 1500
	# CHECK-NEXT: Total Cycles: 753			# CHECK-NEXT: Total Cycles: 1503
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 1.99			# CHECK-NEXT: IPC: 1.00
	# CHECK-NEXT: Block RThroughput: 0.5			# CHECK-NEXT: Block RThroughput: 0.5

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 22 Lines
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
	# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - -			# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - lzcntw %ax, %bx			# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - lzcntw %ax, %bx

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: Index 01234			# CHECK-NEXT: Index 012345

	# CHECK: [0,0] DeER. lzcntw %ax, %bx			# CHECK: [0,0] DeER . lzcntw %ax, %bx
	# CHECK-NEXT: [1,0] DeER. lzcntw %ax, %bx			# CHECK-NEXT: [1,0] D=eER. lzcntw %ax, %bx
	# CHECK-NEXT: [2,0] .DeER lzcntw %ax, %bx			# CHECK-NEXT: [2,0] .D=eER lzcntw %ax, %bx

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 1.0 1.0 0.0 lzcntw %ax, %bx			# CHECK-NEXT: 0. 3 1.7 0.3 0.0 lzcntw %ax, %bx

llvm/trunk/test/tools/llvm-mca/X86/BtVer2/partial-reg-update-6.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=1500 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s

	# perf stat reports a throughput of 0.60 IPC for this code snippet.			# perf stat reports a throughput of 0.60 IPC for this code snippet.
	# Each lzcnt has a false dependency on %ecx; the first lzcnt has to wait on the			# Each lzcnt has a false dependency on %ecx; the first lzcnt has to wait on the
	# imul. However, the folded load can start immediately.			# imul. However, the folded load can start immediately.
	# The last lzcnt has a false dependency on %cx. However, even in this case, the			# The last lzcnt has a false dependency on %cx. However, even in this case, the
	# folded load can start immediately.			# folded load can start immediately.

	imul %edx, %ecx			imul %edx, %ecx
	lzcnt (%rsp), %cx			lzcnt (%rsp), %cx
	lzcnt 2(%rsp), %cx			lzcnt 2(%rsp), %cx

	# CHECK: Iterations: 1500			# CHECK: Iterations: 1500
	# CHECK-NEXT: Instructions: 4500			# CHECK-NEXT: Instructions: 4500
	# CHECK-NEXT: Total Cycles: 4507			# CHECK-NEXT: Total Cycles: 7504
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 1.00			# CHECK-NEXT: IPC: 0.60
	# CHECK-NEXT: Block RThroughput: 2.0			# CHECK-NEXT: Block RThroughput: 2.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 22 Lines

	# CHECK: Resource pressure per iteration:			# CHECK: Resource pressure per iteration:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
	# CHECK-NEXT: 1.50 1.50 - - - - - 2.00 1.00 - - - - -			# CHECK-NEXT: 1.50 1.50 - - - - - 2.00 1.00 - - - - -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %edx, %ecx			# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %edx, %ecx
	# CHECK-NEXT: 0.99 0.01 - - - - - 1.00 - - - - - - lzcntw (%rsp), %cx			# CHECK-NEXT: 1.00 - - - - - - 1.00 - - - - - - lzcntw (%rsp), %cx
	# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - - - - - - lzcntw 2(%rsp), %cx			# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - - - - - - lzcntw 2(%rsp), %cx

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 012345			# CHECK-NEXT: 012345678
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeeER . . imull %edx, %ecx			# CHECK: [0,0] DeeeER . . . imull %edx, %ecx
	# CHECK-NEXT: [0,1] .DeeeeER . . lzcntw (%rsp), %cx			# CHECK-NEXT: [0,1] .DeeeeER . . . lzcntw (%rsp), %cx
	# CHECK-NEXT: [0,2] .D=eeeeER . . lzcntw 2(%rsp), %cx			# CHECK-NEXT: [0,2] .D=eeeeER . . . lzcntw 2(%rsp), %cx
	# CHECK-NEXT: [1,0] . D====eeeER . imull %edx, %ecx			# CHECK-NEXT: [1,0] . D====eeeER . . imull %edx, %ecx
	# CHECK-NEXT: [1,1] . DeeeeE--R . lzcntw (%rsp), %cx			# CHECK-NEXT: [1,1] . D===eeeeER . . lzcntw (%rsp), %cx
	# CHECK-NEXT: [1,2] . D=eeeeE--R . lzcntw 2(%rsp), %cx			# CHECK-NEXT: [1,2] . D====eeeeER . . lzcntw 2(%rsp), %cx
	# CHECK-NEXT: [2,0] . D=====eeeER. imull %edx, %ecx			# CHECK-NEXT: [2,0] . D=======eeeER . imull %edx, %ecx
	# CHECK-NEXT: [2,1] . DeeeeE---R. lzcntw (%rsp), %cx			# CHECK-NEXT: [2,1] . D======eeeeER. lzcntw (%rsp), %cx
	# CHECK-NEXT: [2,2] . D=eeeeE---R lzcntw 2(%rsp), %cx			# CHECK-NEXT: [2,2] . D=======eeeeER lzcntw 2(%rsp), %cx

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 4.0 0.3 0.0 imull %edx, %ecx			# CHECK-NEXT: 0. 3 4.7 0.3 0.0 imull %edx, %ecx
	# CHECK-NEXT: 1. 3 1.0 1.0 1.7 lzcntw (%rsp), %cx			# CHECK-NEXT: 1. 3 4.0 0.3 0.0 lzcntw (%rsp), %cx
	# CHECK-NEXT: 2. 3 2.0 2.0 1.7 lzcntw 2(%rsp), %cx			# CHECK-NEXT: 2. 3 5.0 0.0 0.0 lzcntw 2(%rsp), %cx

llvm/trunk/tools/llvm-mca/Instruction.h

Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	class WriteState {
// For implicit writes, this field always matches the value of		// For implicit writes, this field always matches the value of
// field RegisterID from WD.		// field RegisterID from WD.
unsigned RegisterID;		unsigned RegisterID;

// True if this write implicitly clears the upper portion of RegisterID's		// True if this write implicitly clears the upper portion of RegisterID's
// super-registers.		// super-registers.
bool ClearsSuperRegs;		bool ClearsSuperRegs;

		// This field is set if this is a partial register write, and it has a false
		// dependency on any previous write of the same register (or a portion of it).
		// DependentWrite must be able to complete before this write completes, so
		// that we don't break the WAW, and the two writes can be merged together.
		const WriteState *DependentWrite;

// A list of dependent reads. Users is a set of dependent		// A list of dependent reads. Users is a set of dependent
// reads. A dependent read is added to the set only if CyclesLeft		// reads. A dependent read is added to the set only if CyclesLeft
// is "unknown". As soon as CyclesLeft is 'known', each user in the set		// is "unknown". As soon as CyclesLeft is 'known', each user in the set
// gets notified with the actual CyclesLeft.		// gets notified with the actual CyclesLeft.

// The 'second' element of a pair is a "ReadAdvance" number of cycles.		// The 'second' element of a pair is a "ReadAdvance" number of cycles.
std::set<std::pair<ReadState *, int>> Users;		std::set<std::pair<ReadState *, int>> Users;

public:		public:
WriteState(const WriteDescriptor &Desc, unsigned RegID,		WriteState(const WriteDescriptor &Desc, unsigned RegID,
bool clearsSuperRegs = false)		bool clearsSuperRegs = false)
: WD(Desc), CyclesLeft(UNKNOWN_CYCLES), RegisterID(RegID),		: WD(Desc), CyclesLeft(UNKNOWN_CYCLES), RegisterID(RegID),
ClearsSuperRegs(clearsSuperRegs) {}		ClearsSuperRegs(clearsSuperRegs), DependentWrite(nullptr) {}
WriteState(const WriteState &Other) = delete;		WriteState(const WriteState &Other) = delete;
WriteState &operator=(const WriteState &Other) = delete;		WriteState &operator=(const WriteState &Other) = delete;

int getCyclesLeft() const { return CyclesLeft; }		int getCyclesLeft() const { return CyclesLeft; }
unsigned getWriteResourceID() const { return WD.SClassOrWriteResourceID; }		unsigned getWriteResourceID() const { return WD.SClassOrWriteResourceID; }
unsigned getRegisterID() const { return RegisterID; }		unsigned getRegisterID() const { return RegisterID; }
unsigned getLatency() const { return WD.Latency; }		unsigned getLatency() const { return WD.Latency; }

void addUser(ReadState *Use, int ReadAdvance);		void addUser(ReadState *Use, int ReadAdvance);
unsigned getNumUsers() const { return Users.size(); }		unsigned getNumUsers() const { return Users.size(); }
bool clearsSuperRegisters() const { return ClearsSuperRegs; }		bool clearsSuperRegisters() const { return ClearsSuperRegs; }

		const WriteState *getDependentWrite() const { return DependentWrite; }
		void setDependentWrite(const WriteState *Write) { DependentWrite = Write; }

// On every cycle, update CyclesLeft and notify dependent users.		// On every cycle, update CyclesLeft and notify dependent users.
void cycleEvent();		void cycleEvent();
void onInstructionIssued();		void onInstructionIssued();

#ifndef NDEBUG		#ifndef NDEBUG
void dump() const;		void dump() const;
#endif		#endif
};		};
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	public:
Instruction &operator=(const Instruction &Other) = delete;		Instruction &operator=(const Instruction &Other) = delete;

VecDefs &getDefs() { return Defs; }		VecDefs &getDefs() { return Defs; }
const VecDefs &getDefs() const { return Defs; }		const VecDefs &getDefs() const { return Defs; }
VecUses &getUses() { return Uses; }		VecUses &getUses() { return Uses; }
const VecUses &getUses() const { return Uses; }		const VecUses &getUses() const { return Uses; }
const InstrDesc &getDesc() const { return Desc; }		const InstrDesc &getDesc() const { return Desc; }
unsigned getRCUTokenID() const { return RCUTokenID; }		unsigned getRCUTokenID() const { return RCUTokenID; }
		int getCyclesLeft() const { return CyclesLeft; }

unsigned getNumUsers() const {		unsigned getNumUsers() const {
unsigned NumUsers = 0;		unsigned NumUsers = 0;
for (const UniqueDef &Def : Defs)		for (const UniqueDef &Def : Defs)
NumUsers += Def->getNumUsers();		NumUsers += Def->getNumUsers();
return NumUsers;		return NumUsers;
}		}

▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/trunk/tools/llvm-mca/Instruction.cpp

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	void Instruction::execute() {

// Transition to the "executed" stage if this is a zero-latency instruction.		// Transition to the "executed" stage if this is a zero-latency instruction.
if (!CyclesLeft)		if (!CyclesLeft)
Stage = IS_EXECUTED;		Stage = IS_EXECUTED;
}		}

void Instruction::update() {		void Instruction::update() {
assert(isDispatched() && "Unexpected instruction stage found!");		assert(isDispatched() && "Unexpected instruction stage found!");
if (llvm::all_of(Uses, [](const UniqueUse &Use) { return Use->isReady(); }))
		if (!llvm::all_of(Uses, [](const UniqueUse &Use) { return Use->isReady(); }))
		return;

		// A partial register write cannot complete before a dependent write.
		auto IsDefReady = [&](const UniqueDef &Def) {
		if (const WriteState *Write = Def->getDependentWrite()) {
		int WriteLatency = Write->getCyclesLeft();
		if (WriteLatency == UNKNOWN_CYCLES)
		return false;
		return static_cast<unsigned>(WriteLatency) < Desc.MaxLatency;
		}
		return true;
		};

		if (llvm::all_of(Defs, IsDefReady))
Stage = IS_READY;		Stage = IS_READY;
}		}

void Instruction::cycleEvent() {		void Instruction::cycleEvent() {
if (isReady())		if (isReady())
return;		return;

if (isDispatched()) {		if (isDispatched()) {
bool IsReady = true;		for (UniqueUse &Use : Uses)
for (UniqueUse &Use : Uses) {
Use->cycleEvent();		Use->cycleEvent();
IsReady &= Use->isReady();
}

if (IsReady)		update();
Stage = IS_READY;
return;		return;
}		}

assert(isExecuting() && "Instruction not in-flight?");		assert(isExecuting() && "Instruction not in-flight?");
assert(CyclesLeft && "Instruction already executed?");		assert(CyclesLeft && "Instruction already executed?");
for (UniqueDef &Def : Defs)		for (UniqueDef &Def : Defs)
Def->cycleEvent();		Def->cycleEvent();
CyclesLeft--;		CyclesLeft--;
if (!CyclesLeft)		if (!CyclesLeft)
Stage = IS_EXECUTED;		Stage = IS_EXECUTED;
}		}

const unsigned WriteRef::INVALID_IID = std::numeric_limits<unsigned>::max();		const unsigned WriteRef::INVALID_IID = std::numeric_limits<unsigned>::max();

} // namespace mca		} // namespace mca

llvm/trunk/tools/llvm-mca/RegisterFile.h

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	class RegisterFile : public HardwareUnit {
// hardware registers declared by the target (i.e. all the register		// hardware registers declared by the target (i.e. all the register
// definitions in the target specific `XYZRegisterInfo.td` - where `XYZ` is		// definitions in the target specific `XYZRegisterInfo.td` - where `XYZ` is
// the target name).		// the target name).
//		//
// Users can limit the number of physical registers that are available in		// Users can limit the number of physical registers that are available in
// regsiter file #0 specifying command line flag `-register-file-size=<uint>`.		// regsiter file #0 specifying command line flag `-register-file-size=<uint>`.
llvm::SmallVector<RegisterMappingTracker, 4> RegisterFiles;		llvm::SmallVector<RegisterMappingTracker, 4> RegisterFiles;

// This pair is used to identify the owner of a register, as well as		// This type is used to propagate information about the owner of a register,
// the "register cost". Register cost is defined as the number of physical		// and the cost of allocating it in the PRF. Register cost is defined as the
// registers required to allocate a user register.		// number of physical registers consumed by the PRF to allocate a user
		// register.
		//
// For example: on X86 BtVer2, a YMM register consumes 2 128-bit physical		// For example: on X86 BtVer2, a YMM register consumes 2 128-bit physical
// registers. So, the cost of allocating a YMM register in BtVer2 is 2.		// registers. So, the cost of allocating a YMM register in BtVer2 is 2.
using IndexPlusCostPairTy = std::pair<unsigned, unsigned>;		using IndexPlusCostPairTy = std::pair<unsigned, unsigned>;

		// Struct RegisterRenamingInfo maps registers to register files.
		// There is a RegisterRenamingInfo object for every register defined by
		// the target. RegisteRenamingInfo objects are stored into vector
		// RegisterMappings, and register IDs can be used to reference them.
		struct RegisterRenamingInfo {
		IndexPlusCostPairTy IndexPlusCost;
		llvm::MCPhysReg RenameAs;
		};

// RegisterMapping objects are mainly used to track physical register		// RegisterMapping objects are mainly used to track physical register
// definitions. There is a RegisterMapping for every register defined by the		// definitions. There is a RegisterMapping for every register defined by the
// Target. For each register, a RegisterMapping pair contains a descriptor of		// Target. For each register, a RegisterMapping pair contains a descriptor of
// the last register write (in the form of a WriteRef object), as well as a		// the last register write (in the form of a WriteRef object), as well as a
// IndexPlusCostPairTy to quickly identify owning register files.		// RegisterRenamingInfo to quickly identify owning register files.
//		//
// This implementation does not allow overlapping register files. The only		// This implementation does not allow overlapping register files. The only
// register file that is allowed to overlap with other register files is		// register file that is allowed to overlap with other register files is
// register file #0. If we exclude register #0, every register is "owned" by		// register file #0. If we exclude register #0, every register is "owned" by
// at most one register file.		// at most one register file.
using RegisterMapping = std::pair<WriteRef, IndexPlusCostPairTy>;		using RegisterMapping = std::pair<WriteRef, RegisterRenamingInfo>;

// This map contains one entry for each register defined by the target.		// This map contains one entry for each register defined by the target.
std::vector<RegisterMapping> RegisterMappings;		std::vector<RegisterMapping> RegisterMappings;

// This method creates a new register file descriptor.		// This method creates a new register file descriptor.
// The new register file owns all of the registers declared by register		// The new register file owns all of the registers declared by register
// classes in the 'RegisterClasses' set.		// classes in the 'RegisterClasses' set.
//		//
// Processor models allow the definition of RegisterFile(s) via tablegen. For		// Processor models allow the definition of RegisterFile(s) via tablegen. For
// example, this is a tablegen definition for a x86 register file for		// example, this is a tablegen definition for a x86 register file for
// XMM[0-15] and YMM[0-15], that allows up to 60 renames (each rename costs 1		// XMM[0-15] and YMM[0-15], that allows up to 60 renames (each rename costs 1
// physical register).		// physical register).
//		//
// def FPRegisterFile : RegisterFile<60, [VR128RegClass, VR256RegClass]>		// def FPRegisterFile : RegisterFile<60, [VR128RegClass, VR256RegClass]>
//		//
// Here FPRegisterFile contains all the registers defined by register class		// Here FPRegisterFile contains all the registers defined by register class
// VR128RegClass and VR256RegClass. FPRegisterFile implements 60		// VR128RegClass and VR256RegClass. FPRegisterFile implements 60
// registers which can be used for register renaming purpose.		// registers which can be used for register renaming purpose.
void		void
addRegisterFile(llvm::ArrayRef<llvm::MCRegisterCostEntry> RegisterClasses,		addRegisterFile(llvm::ArrayRef<llvm::MCRegisterCostEntry> RegisterClasses,
unsigned NumPhysRegs);		unsigned NumPhysRegs);

// Consumes physical registers in each register file specified by the		// Consumes physical registers in each register file specified by the
// `IndexPlusCostPairTy`. This method is called from `addRegisterMapping()`.		// `IndexPlusCostPairTy`. This method is called from `addRegisterMapping()`.
void allocatePhysRegs(IndexPlusCostPairTy IPC,		void allocatePhysRegs(const RegisterRenamingInfo &Entry,
llvm::MutableArrayRef<unsigned> UsedPhysRegs);		llvm::MutableArrayRef<unsigned> UsedPhysRegs);

// Releases previously allocated physical registers from the register file(s)		// Releases previously allocated physical registers from the register file(s).
// referenced by the IndexPlusCostPairTy object. This method is called from		// This method is called from `invalidateRegisterMapping()`.
// `invalidateRegisterMapping()`.		void freePhysRegs(const RegisterRenamingInfo &Entry,
void freePhysRegs(IndexPlusCostPairTy IPC,
llvm::MutableArrayRef<unsigned> FreedPhysRegs);		llvm::MutableArrayRef<unsigned> FreedPhysRegs);

// Create an instance of RegisterMappingTracker for every register file		// Create an instance of RegisterMappingTracker for every register file
// specified by the processor model.		// specified by the processor model.
// If no register file is specified, then this method creates a default		// If no register file is specified, then this method creates a default
// register file with an unbounded number of physical registers.		// register file with an unbounded number of physical registers.
void initialize(const llvm::MCSchedModel &SM, unsigned NumRegs);		void initialize(const llvm::MCSchedModel &SM, unsigned NumRegs);

Show All 14 Lines	public:
void removeRegisterWrite(const WriteState &WS,		void removeRegisterWrite(const WriteState &WS,
llvm::MutableArrayRef<unsigned> FreedPhysRegs,		llvm::MutableArrayRef<unsigned> FreedPhysRegs,
bool ShouldFreePhysRegs = true);		bool ShouldFreePhysRegs = true);

// Checks if there are enough physical registers in the register files.		// Checks if there are enough physical registers in the register files.
// Returns a "response mask" where each bit represents the response from a		// Returns a "response mask" where each bit represents the response from a
// different register file. A mask of all zeroes means that all register		// different register file. A mask of all zeroes means that all register
// files are available. Otherwise, the mask can be used to identify which		// files are available. Otherwise, the mask can be used to identify which
// register file was busy. This sematic allows us classify dispatch dispatch		// register file was busy. This sematic allows us to classify dispatch
// stalls caused by the lack of register file resources.		// stalls caused by the lack of register file resources.
		//
		// Current implementation can simulate up to 32 register files (including the
		// special register file at index #0).
unsigned isAvailable(llvm::ArrayRef<unsigned> Regs) const;		unsigned isAvailable(llvm::ArrayRef<unsigned> Regs) const;
void collectWrites(llvm::SmallVectorImpl<WriteRef> &Writes,		void collectWrites(llvm::SmallVectorImpl<WriteRef> &Writes,
unsigned RegID) const;		unsigned RegID) const;
void updateOnRead(ReadState &RS, unsigned RegID);		void updateOnRead(ReadState &RS, unsigned RegID);

unsigned getNumRegisterFiles() const { return RegisterFiles.size(); }		unsigned getNumRegisterFiles() const { return RegisterFiles.size(); }

#ifndef NDEBUG		#ifndef NDEBUG
void dump() const;		void dump() const;
#endif		#endif
};		};

} // namespace mca		} // namespace mca

#endif // LLVM_TOOLS_LLVM_MCA_REGISTER_FILE_H		#endif // LLVM_TOOLS_LLVM_MCA_REGISTER_FILE_H

llvm/trunk/tools/llvm-mca/RegisterFile.cpp

Show All 20 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "llvm-mca"		#define DEBUG_TYPE "llvm-mca"

namespace mca {		namespace mca {

RegisterFile::RegisterFile(const llvm::MCSchedModel &SM,		RegisterFile::RegisterFile(const llvm::MCSchedModel &SM,
const llvm::MCRegisterInfo &mri, unsigned NumRegs)		const llvm::MCRegisterInfo &mri, unsigned NumRegs)
: MRI(mri), RegisterMappings(mri.getNumRegs(), {WriteRef(), {0, 0}}) {		: MRI(mri), RegisterMappings(mri.getNumRegs(),
		{WriteRef(), {IndexPlusCostPairTy(0, 1), 0}}) {
initialize(SM, NumRegs);		initialize(SM, NumRegs);
}		}

void RegisterFile::initialize(const MCSchedModel &SM, unsigned NumRegs) {		void RegisterFile::initialize(const MCSchedModel &SM, unsigned NumRegs) {
// Create a default register file that "sees" all the machine registers		// Create a default register file that "sees" all the machine registers
// declared by the target. The number of physical registers in the default		// declared by the target. The number of physical registers in the default
// register file is set equal to `NumRegs`. A value of zero for `NumRegs`		// register file is set equal to `NumRegs`. A value of zero for `NumRegs`
// means: this register file has an unbounded number of physical registers.		// means: this register file has an unbounded number of physical registers.
Show All 28 Lines	void RegisterFile::addRegisterFile(ArrayRef<MCRegisterCostEntry> Entries,
// registers in register file #0 through the command line flag		// registers in register file #0 through the command line flag
// `-register-file-size`.		// `-register-file-size`.
unsigned RegisterFileIndex = RegisterFiles.size();		unsigned RegisterFileIndex = RegisterFiles.size();
RegisterFiles.emplace_back(NumPhysRegs);		RegisterFiles.emplace_back(NumPhysRegs);

// Special case where there is no register class identifier in the set.		// Special case where there is no register class identifier in the set.
// An empty set of register classes means: this register file contains all		// An empty set of register classes means: this register file contains all
// the physical registers specified by the target.		// the physical registers specified by the target.
if (Entries.empty()) {		// We optimistically assume that a register can be renamed at the cost of a
for (std::pair<WriteRef, IndexPlusCostPairTy> &Mapping : RegisterMappings)		// single physical register. The constructor of RegisterFile ensures that
Mapping.second = std::make_pair(RegisterFileIndex, 1U);		// a RegisterMapping exists for each logical register defined by the Target.
		if (Entries.empty())
return;		return;
}

// Now update the cost of individual registers.		// Now update the cost of individual registers.
for (const MCRegisterCostEntry &RCE : Entries) {		for (const MCRegisterCostEntry &RCE : Entries) {
const MCRegisterClass &RC = MRI.getRegClass(RCE.RegisterClassID);		const MCRegisterClass &RC = MRI.getRegClass(RCE.RegisterClassID);
for (const MCPhysReg Reg : RC) {		for (const MCPhysReg Reg : RC) {
IndexPlusCostPairTy &Entry = RegisterMappings[Reg].second;		RegisterRenamingInfo &Entry = RegisterMappings[Reg].second;
if (Entry.first) {		IndexPlusCostPairTy &IPC = Entry.IndexPlusCost;
		if (IPC.first && IPC.first != RegisterFileIndex) {
// The only register file that is allowed to overlap is the default		// The only register file that is allowed to overlap is the default
// register file at index #0. The analysis is inaccurate if register		// register file at index #0. The analysis is inaccurate if register
// files overlap.		// files overlap.
errs() << "warning: register " << MRI.getName(Reg)		errs() << "warning: register " << MRI.getName(Reg)
<< " defined in multiple register files.";		<< " defined in multiple register files.";
}		}
Entry.first = RegisterFileIndex;		IPC = std::make_pair(RegisterFileIndex, RCE.Cost);
Entry.second = RCE.Cost;		Entry.RenameAs = Reg;

		// Assume the same cost for each sub-register.
		for (MCSubRegIterator I(Reg, &MRI); I.isValid(); ++I) {
		RegisterRenamingInfo &OtherEntry = RegisterMappings[*I].second;
		if (!OtherEntry.IndexPlusCost.first &&
		(!OtherEntry.RenameAs \|\|
		MRI.isSuperRegister(*I, OtherEntry.RenameAs))) {
		OtherEntry.IndexPlusCost = IPC;
		OtherEntry.RenameAs = Reg;
		}
		}
}		}
}		}
}		}

void RegisterFile::allocatePhysRegs(IndexPlusCostPairTy Entry,		void RegisterFile::allocatePhysRegs(const RegisterRenamingInfo &Entry,
MutableArrayRef<unsigned> UsedPhysRegs) {		MutableArrayRef<unsigned> UsedPhysRegs) {
unsigned RegisterFileIndex = Entry.first;		unsigned RegisterFileIndex = Entry.IndexPlusCost.first;
unsigned Cost = Entry.second;		unsigned Cost = Entry.IndexPlusCost.second;
if (RegisterFileIndex) {		if (RegisterFileIndex) {
RegisterMappingTracker &RMT = RegisterFiles[RegisterFileIndex];		RegisterMappingTracker &RMT = RegisterFiles[RegisterFileIndex];
RMT.NumUsedPhysRegs += Cost;		RMT.NumUsedPhysRegs += Cost;
UsedPhysRegs[RegisterFileIndex] += Cost;		UsedPhysRegs[RegisterFileIndex] += Cost;
}		}

// Now update the default register mapping tracker.		// Now update the default register mapping tracker.
RegisterFiles[0].NumUsedPhysRegs += Cost;		RegisterFiles[0].NumUsedPhysRegs += Cost;
UsedPhysRegs[0] += Cost;		UsedPhysRegs[0] += Cost;
}		}

void RegisterFile::freePhysRegs(IndexPlusCostPairTy Entry,		void RegisterFile::freePhysRegs(const RegisterRenamingInfo &Entry,
MutableArrayRef<unsigned> FreedPhysRegs) {		MutableArrayRef<unsigned> FreedPhysRegs) {
unsigned RegisterFileIndex = Entry.first;		unsigned RegisterFileIndex = Entry.IndexPlusCost.first;
unsigned Cost = Entry.second;		unsigned Cost = Entry.IndexPlusCost.second;
if (RegisterFileIndex) {		if (RegisterFileIndex) {
RegisterMappingTracker &RMT = RegisterFiles[RegisterFileIndex];		RegisterMappingTracker &RMT = RegisterFiles[RegisterFileIndex];
RMT.NumUsedPhysRegs -= Cost;		RMT.NumUsedPhysRegs -= Cost;
FreedPhysRegs[RegisterFileIndex] += Cost;		FreedPhysRegs[RegisterFileIndex] += Cost;
}		}

// Now update the default register mapping tracker.		// Now update the default register mapping tracker.
RegisterFiles[0].NumUsedPhysRegs -= Cost;		RegisterFiles[0].NumUsedPhysRegs -= Cost;
FreedPhysRegs[0] += Cost;		FreedPhysRegs[0] += Cost;
}		}

void RegisterFile::addRegisterWrite(WriteRef Write,		void RegisterFile::addRegisterWrite(WriteRef Write,
MutableArrayRef<unsigned> UsedPhysRegs,		MutableArrayRef<unsigned> UsedPhysRegs,
bool ShouldAllocatePhysRegs) {		bool ShouldAllocatePhysRegs) {
const WriteState &WS = *Write.getWriteState();		WriteState &WS = *Write.getWriteState();
unsigned RegID = WS.getRegisterID();		unsigned RegID = WS.getRegisterID();
assert(RegID && "Adding an invalid register definition?");		assert(RegID && "Adding an invalid register definition?");

RegisterMapping &Mapping = RegisterMappings[RegID];		LLVM_DEBUG({
Mapping.first = Write;		dbgs() << "RegisterFile: addRegisterWrite [ " << Write.getSourceIndex()
		<< ", " << MRI.getName(RegID) << "]\n";
		});

		// If RenameAs is equal to RegID, then RegID is subject to register renaming
		// and false dependencies on RegID are all eliminated.

		// If RenameAs references the invalid register, then we optimistically assume
		// that it can be renamed. In the absence of tablegen descriptors for register
		// files, RenameAs is always set to the invalid register ID. In all other
		// cases, RenameAs must be either equal to RegID, or it must reference a
		// super-register of RegID.

		// If RenameAs is a super-register of RegID, then a write to RegID has always
		// a false dependency on RenameAs. The only exception is for when the write
		// implicitly clears the upper portion of the underlying register.
		// If a write clears its super-registers, then it is renamed as `RenameAs`.
		const RegisterRenamingInfo &RRI = RegisterMappings[RegID].second;
		if (RRI.RenameAs && RRI.RenameAs != RegID) {
		RegID = RRI.RenameAs;
		const WriteRef &OtherWrite = RegisterMappings[RegID].first;

		if (!WS.clearsSuperRegisters()) {
		// The processor keeps the definition of `RegID` together with register
		// `RenameAs`. Since this partial write is not renamed, no physical
		// register is allocated.
		ShouldAllocatePhysRegs = false;

		if (OtherWrite.getSourceIndex() != Write.getSourceIndex()) {
		// This partial write has a false dependency on RenameAs.
		WS.setDependentWrite(OtherWrite.getWriteState());
		}
		}
		}

		// Update the mapping for register RegID including its sub-registers.
		RegisterMappings[RegID].first = Write;
for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I)		for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I)
RegisterMappings[*I].first = Write;		RegisterMappings[*I].first = Write;

// No physical registers are allocated for instructions that are optimized in		// No physical registers are allocated for instructions that are optimized in
// hardware. For example, zero-latency data-dependency breaking instructions		// hardware. For example, zero-latency data-dependency breaking instructions
// don't consume physical registers.		// don't consume physical registers.
if (ShouldAllocatePhysRegs)		if (ShouldAllocatePhysRegs)
allocatePhysRegs(Mapping.second, UsedPhysRegs);		allocatePhysRegs(RegisterMappings[RegID].second, UsedPhysRegs);

// If this is a partial update, then we are done.
if (!WS.clearsSuperRegisters())		if (!WS.clearsSuperRegisters())
return;		return;

for (MCSuperRegIterator I(RegID, &MRI); I.isValid(); ++I)		for (MCSuperRegIterator I(RegID, &MRI); I.isValid(); ++I)
RegisterMappings[*I].first = Write;		RegisterMappings[*I].first = Write;
}		}

void RegisterFile::removeRegisterWrite(const WriteState &WS,		void RegisterFile::removeRegisterWrite(const WriteState &WS,
MutableArrayRef<unsigned> FreedPhysRegs,		MutableArrayRef<unsigned> FreedPhysRegs,
bool ShouldFreePhysRegs) {		bool ShouldFreePhysRegs) {
unsigned RegID = WS.getRegisterID();		unsigned RegID = WS.getRegisterID();
bool ShouldInvalidateSuperRegs = WS.clearsSuperRegisters();

assert(RegID != 0 && "Invalidating an already invalid register?");		assert(RegID != 0 && "Invalidating an already invalid register?");
assert(WS.getCyclesLeft() != -512 &&		assert(WS.getCyclesLeft() != UNKNOWN_CYCLES &&
"Invalidating a write of unknown cycles!");		"Invalidating a write of unknown cycles!");
assert(WS.getCyclesLeft() <= 0 && "Invalid cycles left for this write!");		assert(WS.getCyclesLeft() <= 0 && "Invalid cycles left for this write!");
RegisterMapping &Mapping = RegisterMappings[RegID];
WriteRef &WR = Mapping.first;		unsigned RenameAs = RegisterMappings[RegID].second.RenameAs;
if (!WR.isValid())		if (RenameAs && RenameAs != RegID) {
return;		RegID = RenameAs;

		if (!WS.clearsSuperRegisters()) {
		// Keep the definition of `RegID` together with register `RenameAs`.
		ShouldFreePhysRegs = false;
		}
		}

if (ShouldFreePhysRegs)		if (ShouldFreePhysRegs)
freePhysRegs(Mapping.second, FreedPhysRegs);		freePhysRegs(RegisterMappings[RegID].second, FreedPhysRegs);

		WriteRef &WR = RegisterMappings[RegID].first;
if (WR.getWriteState() == &WS)		if (WR.getWriteState() == &WS)
WR.invalidate();		WR.invalidate();

for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I) {		for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I) {
WR = RegisterMappings[*I].first;		WriteRef &OtherWR = RegisterMappings[*I].first;
if (WR.getWriteState() == &WS)		if (OtherWR.getWriteState() == &WS)
WR.invalidate();		OtherWR.invalidate();
}		}

if (!ShouldInvalidateSuperRegs)		if (!WS.clearsSuperRegisters())
return;		return;

for (MCSuperRegIterator I(RegID, &MRI); I.isValid(); ++I) {		for (MCSuperRegIterator I(RegID, &MRI); I.isValid(); ++I) {
WR = RegisterMappings[*I].first;		WriteRef &OtherWR = RegisterMappings[*I].first;
if (WR.getWriteState() == &WS)		if (OtherWR.getWriteState() == &WS)
WR.invalidate();		OtherWR.invalidate();
}		}
}		}

void RegisterFile::collectWrites(SmallVectorImpl<WriteRef> &Writes,		void RegisterFile::collectWrites(SmallVectorImpl<WriteRef> &Writes,
unsigned RegID) const {		unsigned RegID) const {
assert(RegID && RegID < RegisterMappings.size());		assert(RegID && RegID < RegisterMappings.size());
		LLVM_DEBUG(dbgs() << "RegisterFile: collecting writes for register "
		<< MRI.getName(RegID) << '\n');
const WriteRef &WR = RegisterMappings[RegID].first;		const WriteRef &WR = RegisterMappings[RegID].first;
if (WR.isValid())		if (WR.isValid())
Writes.push_back(WR);		Writes.push_back(WR);

// Handle potential partial register updates.		// Handle potential partial register updates.
for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I) {		for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I) {
const WriteRef &WR = RegisterMappings[*I].first;		const WriteRef &WR = RegisterMappings[*I].first;
if (WR.isValid())		if (WR.isValid())
Show All 18 Lines	void RegisterFile::collectWrites(SmallVectorImpl<WriteRef> &Writes,
});		});
}		}

unsigned RegisterFile::isAvailable(ArrayRef<unsigned> Regs) const {		unsigned RegisterFile::isAvailable(ArrayRef<unsigned> Regs) const {
SmallVector<unsigned, 4> NumPhysRegs(getNumRegisterFiles());		SmallVector<unsigned, 4> NumPhysRegs(getNumRegisterFiles());

// Find how many new mappings must be created for each register file.		// Find how many new mappings must be created for each register file.
for (const unsigned RegID : Regs) {		for (const unsigned RegID : Regs) {
const IndexPlusCostPairTy &Entry = RegisterMappings[RegID].second;		const RegisterRenamingInfo &RRI = RegisterMappings[RegID].second;
		const IndexPlusCostPairTy &Entry = RRI.IndexPlusCost;
if (Entry.first)		if (Entry.first)
NumPhysRegs[Entry.first] += Entry.second;		NumPhysRegs[Entry.first] += Entry.second;
NumPhysRegs[0] += Entry.second;		NumPhysRegs[0] += Entry.second;
}		}

unsigned Response = 0;		unsigned Response = 0;
for (unsigned I = 0, E = getNumRegisterFiles(); I < E; ++I) {		for (unsigned I = 0, E = getNumRegisterFiles(); I < E; ++I) {
unsigned NumRegs = NumPhysRegs[I];		unsigned NumRegs = NumPhysRegs[I];
Show All 24 Lines
}		}

#ifndef NDEBUG		#ifndef NDEBUG
void RegisterFile::dump() const {		void RegisterFile::dump() const {
for (unsigned I = 0, E = MRI.getNumRegs(); I < E; ++I) {		for (unsigned I = 0, E = MRI.getNumRegs(); I < E; ++I) {
const RegisterMapping &RM = RegisterMappings[I];		const RegisterMapping &RM = RegisterMappings[I];
if (!RM.first.getWriteState())		if (!RM.first.getWriteState())
continue;		continue;
const std::pair<unsigned, unsigned> &IndexPlusCost = RM.second;		const RegisterRenamingInfo &RRI = RM.second;
dbgs() << MRI.getName(I) << ", " << I << ", PRF=" << IndexPlusCost.first		dbgs() << MRI.getName(I) << ", " << I << ", PRF=" << RRI.IndexPlusCost.first
<< ", Cost=" << IndexPlusCost.second		<< ", Cost=" << RRI.IndexPlusCost.second
<< ", ";		<< ", RenameAs=" << RRI.RenameAs << ", ";
RM.first.dump();		RM.first.dump();
dbgs() << '\n';		dbgs() << '\n';
}		}

for (unsigned I = 0, E = getNumRegisterFiles(); I < E; ++I) {		for (unsigned I = 0, E = getNumRegisterFiles(); I < E; ++I) {
dbgs() << "Register File #" << I;		dbgs() << "Register File #" << I;
const RegisterMappingTracker &RMT = RegisterFiles[I];		const RegisterMappingTracker &RMT = RegisterFiles[I];
dbgs() << "\n TotalMappings: " << RMT.NumPhysRegs		dbgs() << "\n TotalMappings: " << RMT.NumPhysRegs
<< "\n NumUsedMappings: " << RMT.NumUsedPhysRegs << '\n';		<< "\n NumUsedMappings: " << RMT.NumUsedPhysRegs << '\n';
}		}
}		}
#endif		#endif

} // namespace mca		} // namespace mca