This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/MC/
-
llvm/
-
MC/
7
MCInstrAnalysis.h
-
lib/
-
MC/
-
MCInstrAnalysis.cpp
-
Target/X86/MCTargetDesc/
-
X86/
-
MCTargetDesc/
2
X86MCTargetDesc.cpp
-
test/tools/llvm-mca/X86/BtVer2/
-
tools/
-
llvm-mca/
-
X86/
-
BtVer2/
-
dependency-breaking-cmp.s
-
dependency-breaking-pcmpeq.s
-
dependency-breaking-sbb-2.s
2
one-idioms.s
-
tools/llvm-mca/
-
llvm-mca/
7
DispatchStage.cpp
-
InstrBuilder.cpp
-
Instruction.h
-
RetireStage.cpp

Differential D49310

[llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms.
ClosedPublic

Authored by andreadb on Jul 13 2018, 11:19 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
craig.topper
lebedev.ri
mattd
courbet

Commits

rGa1852b619450: [llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms.
rL338372: [llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms.

Summary

This patch teaches llvm-mca how to identify dependency breaking instructions on btver2.

An example of dependency breaking instruction is the zero-idiom XOR (example: XOR %eax, %eax), which always generates zero regardless of the actual input register value.
Dependency breaking instructions don't have to wait on their input register operands before executing. This is because the result of the execution is known in advance (i.e. it is not dependent on the actual value of the input register).

Not all dependency breaking idioms are also zero-latency instructions. For example, CMPEQ %xmm1, %xmm1 sets %xmm1 to all-ones, and it is executed by a pipeline.

This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis interface. That method takes as input an instruction (i.e. MCInst) and a MCSubtargetInfo.
The default implementation of isDependencyBreaking() conservatively returns false for all instructions. Targets may override the default behavior, and return a value which better matches the subtarget processor behavior.

BtVer2 tests that have been affected by this change now return the expected number of instructions per cycle.

Please let me know if okay to commit.

-Andrea

Diff Detail

Event Timeline

andreadb created this revision.Jul 13 2018, 11:19 AM

Herald added subscribers: gbedwell, tschuett. · View Herald TranscriptJul 13 2018, 11:19 AM

This change LGTM, but you should let a few others take a peek before landing this.

include/llvm/MC/MCInstrAnalysis.h
90	Suggestion: It might be useful to reference Agner Fog's microarchitecture.pdf 17.9 "Breaking dependence," similar to how you reference another part of that doc in D49196. Although, I think your description is clearer than 17.9, but just a thought. A reader might find other parts of 17.x useful, such as 17.8.

andreadb added inline comments.Jul 14 2018, 5:42 AM

include/llvm/MC/MCInstrAnalysis.h
90	Good point. I will add a reference to the Agner Fog's "microarchitecture" to the definition of `isDependencyBreaking()` in X86MCTargetDesc.cpp. I prefer to keep this comment simple if possible.

Can you add a comment to the end of the btver sched profile, that X86MCInstrAnalysis::isDependencyBreaking()
is also responsible (as in, needs to be modified) in detection of dep-breaking patterns?

include/llvm/MC/MCInstrAnalysis.h
93	s/know/known/?
lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
318	Hmm. Hardcoding. Are there plans on somehow generalizing this, maybe using sched profiles?

In D49310#1162667, @lebedev.ri wrote:

Can you add a comment to the end of the btver sched profile, that X86MCInstrAnalysis::isDependencyBreaking()
is also responsible (as in, needs to be modified) in detection of dep-breaking patterns?

If the goal is to document this behavior, a comment in the class MCInstrAnalysis should be enough.
If the goal is to provide guidelines for people that want to use llvm-mca, then we should have all this features documented in a .rst file.
In the next future (realistically by the end of august), with the help of Matt, there is a plan improve the llvm-mca documentation. We would describe all the changes that would help improving the analysis in llvm-mca.

include/llvm/MC/MCInstrAnalysis.h
93	Thanks. I will fix it.
lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
318	There is no plan. I have experimented with alternative approaches. The 'isDependencyBreaking' could be generated by tablegen for scheduling models. However, it would require a non trivial change in the tablegen backends. In all honestly, I don't think it is worthy to do it just for the sake of better supporting dep-breaking idioms. In my opinion, this patch is simple and less invasive than any other solution involving tablegen. If we use a tablegen approach, then the only advantage is that we generate nicer checks against CPUID, instead of forcing string compares (CPU names). The rest of the auto-generated code would be pretty much identical (in my experience, a bit less readable). However, we would end up paying a non negligible design cost for a feature that is not critical imho. So, I don't think that there are better approaches (at least for now).

In D49310#1162868, @andreadb wrote:

In D49310#1162667, @lebedev.ri wrote:

Can you add a comment to the end of the btver sched profile, that X86MCInstrAnalysis::isDependencyBreaking()
is also responsible (as in, needs to be modified) in detection of dep-breaking patterns?

If the goal is to document this behavior, a comment in the class MCInstrAnalysis should be enough.
If the goal is to provide guidelines for people that want to use llvm-mca, then we should have all this features documented in a .rst file.

The goal is to spare time of whoever will be adding some new scheduling model by explaining that the X86MCInstrAnalysis::isDependencyBreaking() needs to be modified too, for mca to be able to properly handle these dependency-breaking idioms.

In the next future (realistically by the end of august), with the help of Matt, there is a plan improve the llvm-mca documentation. We would describe all the changes that would help improving the analysis in llvm-mca.

Patch updated.

Added a comment to X86MCInstrAnalysis::isDependencyBreaking.

@craig.topper do you think this approach is acceptable?

RKSimon added inline comments.Jul 17 2018, 3:38 AM

include/llvm/MC/MCInstrAnalysis.h
97	I don't know if this description is too x86 specific or not - AFAICT the actual 'same register' test is inside X86MCInstrAnalysis::isDependencyBreaking so it isn't a generic requirement? An alternative approach could be for MCInstrAnalysis::isDependencyBreaking to return true if the instruction is independent of some/all of its input operands and a APInt& Mask (or similar) is used to list the operands that it doesn't depend upon - not sure if we need all that functionality or not though so it might be over-engineering it. Also, do you know if we can make use of this in the machine schedulers or whatever? I'm not against it being inside MCInstrAnalysis in general although the fact that its so cpu-target specific suggests it could get bulky. @craig.topper What do you think?

andreadb added inline comments.Jul 17 2018, 4:04 AM

include/llvm/MC/MCInstrAnalysis.h
97	I am okay with either approach. We can use an APInt to identify register reads that can be bypassed. If we want to use this knowledge in the machine schedulers, then we can add a hook to the InstructionInfo interface. Each target can then describe dependency breaking instructions using a similar logic. However, that new hook would take a `MachineInstr` in input, and not a `MCInst`. So, we still have to expose that information to MC. We could use the new scheduling predicates to construct a TII predicate. That predicate is translated by tablegen into a method to the TII interface. We can then expand that same predicate into a method in MCInstrAnalysis. That should be doable. I wanted to avoid this scenario. However, if the plan is to reuse this knowledge elsewhere, then I can revisit this patch.

lebedev.ri added inline comments.Jul 20 2018, 7:13 AM

test/tools/llvm-mca/X86/BtVer2/one-idioms.s
28	You probably wanted to drop this comment.
tools/llvm-mca/DispatchStage.cpp
110	So it will no longer even consider the sched profile?

andreadb added inline comments.Jul 20 2018, 7:43 AM

test/tools/llvm-mca/X86/BtVer2/one-idioms.s
28	I will remove it.
tools/llvm-mca/DispatchStage.cpp
110	Not sure I understand the question. I am definitely using profile information from the scheduling model.

lebedev.ri added inline comments.Jul 20 2018, 7:45 AM

tools/llvm-mca/DispatchStage.cpp
110	What i'm asking is - what happens if sched model says that instruction N has zero latency, but `isDependencyBreaking()` does not say that?

andreadb added inline comments.Jul 20 2018, 8:23 AM

tools/llvm-mca/DispatchStage.cpp
110	In that case, instruction N will have to wait in the scheduler until input registers are all available. Then it is executed.

andreadb added inline comments.Jul 20 2018, 8:40 AM

tools/llvm-mca/DispatchStage.cpp
110	If it doesn’t work like that, then there is a bug. I cannot test it at the moment as I am not at work. I will check it on next days. Thanks.

lebedev.ri added inline comments.Jul 20 2018, 8:41 AM

tools/llvm-mca/DispatchStage.cpp
110	So the scheduler-profile-aware instruction scheduler will schedule it as-if it has zero latency, but the mca-based analysis will not? And how would one go about detecting such inconsistencies? This all makes me uneasy.

andreadb added inline comments.Jul 20 2018, 8:53 AM

tools/llvm-mca/DispatchStage.cpp
110	No. I never said that. It would still execute zero cycles. It simply has to wait for the operands. Since I didn’t specifically test that bogus scenario, I will check that it really behaves like that. I cannot do that test now because I am not at work. I will be back in a few days.

Patch rebased and updated.

To answe to Roman's questions:
I verified that the patch works as intended for zero latency instructions that are not marked as "dependency breaking".
If a zero-latency instruction reads a register, then it has to wait on the input. If it is marked as dep-breaking, then it can issue immediately.
If a zero-latency is not marked dep-breaking, and the inputs are all ready, then it can be issued immediately.

As I wrote in a previous comment, we can have a follow-up patch where we teach to tablegen how to generate the body of isDependencyBreaking predicate for us (and potentially expose that same logic to TargetInstrInfo).

Couple of final minor comments

include/llvm/MC/MCInstrAnalysis.h
97	OK - we don't have a use for this yet but please can you add a TODO suggesting that per-operand dependency breaking might be useful in the future. Also, please update the comment as "if input register operands are all the same register." isn't mandatory for an instruction to be dependency breaking (move that part of the comment into the X86 XOR example) - for instance XOP VPCOM instructions ignores the input registers if the compare mode (immediate value) is TRUE or FALSE.

Patch updated.

Addressed review comments.

LGTM - thanks

This revision is now accepted and ready to land.Jul 31 2018, 4:08 AM

Closed by commit rL338372: [llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms. (authored by adibiagio). · Explain WhyJul 31 2018, 6:22 AM

This revision was automatically updated to reflect the committed changes.

andreadb mentioned this in D52174: [TableGen][SubtargetEmitter] Add the ability for processor models to describe dependency breaking instructions..Sep 17 2018, 7:39 AM

Diffusion mentioned this in rL342555: [TableGen][SubtargetEmitter] Add the ability for processor models to describe….Sep 19 2018, 8:59 AM

Revision Contents

Path

Size

include/

llvm/

MC/

MCInstrAnalysis.h

11 lines

lib/

MC/

MCInstrAnalysis.cpp

5 lines

Target/

X86/

MCTargetDesc/

X86MCTargetDesc.cpp

74 lines

test/

tools/

llvm-mca/

X86/

BtVer2/

dependency-breaking-cmp.s

22 lines

dependency-breaking-pcmpeq.s

39 lines

dependency-breaking-sbb-2.s

36 lines

one-idioms.s

112 lines

tools/

llvm-mca/

12 lines

4 lines

13 lines

8 lines

Diff 157958

include/llvm/MC/MCInstrAnalysis.h

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	public:
///		///
/// The assumption is that the bit-width of the APInt is correctly set by		/// The assumption is that the bit-width of the APInt is correctly set by
/// the caller. The default implementation conservatively assumes that none of		/// the caller. The default implementation conservatively assumes that none of
/// the writes clears the upper portion of a super-register.		/// the writes clears the upper portion of a super-register.
virtual bool clearsSuperRegisters(const MCRegisterInfo &MRI,		virtual bool clearsSuperRegisters(const MCRegisterInfo &MRI,
const MCInst &Inst,		const MCInst &Inst,
APInt &Writes) const;		APInt &Writes) const;

		/// Returns true if \param Inst is a dependency breaking instruction for the
		mattdUnsubmitted Not Done Reply Inline Actions Suggestion: It might be useful to reference Agner Fog's microarchitecture.pdf 17.9 "Breaking dependence," similar to how you reference another part of that doc in D49196. Although, I think your description is clearer than 17.9, but just a thought. A reader might find other parts of 17.x useful, such as 17.8. mattd: Suggestion: It might be useful to reference Agner Fog's microarchitecture.pdf 17.9 "Breaking…
		andreadbAuthorUnsubmitted Not Done Reply Inline Actions Good point. I will add a reference to the Agner Fog's "microarchitecture" to the definition of `isDependencyBreaking()` in X86MCTargetDesc.cpp. I prefer to keep this comment simple if possible. andreadb: Good point. I will add a reference to the Agner Fog's "microarchitecture" to the definition of…
		/// given subtarget.
		///
		/// A dependency breaking instruction is know to always compute the same value
		lebedev.riUnsubmitted Not Done Reply Inline Actions s/know/known/? lebedev.ri: s/know/known/?
		andreadbAuthorUnsubmitted Not Done Reply Inline Actions Thanks. I will fix it. andreadb: Thanks. I will fix it.
		/// if input register operands are all the same register. An example of
		/// dependency breaking instruction on X86 is the `XOR %eax, %eax`. A
		/// dependency breaking instruction doesn't wait on the input register, since
		/// the result is not dependent on its value.
		RKSimonUnsubmitted Not Done Reply Inline Actions I don't know if this description is too x86 specific or not - AFAICT the actual 'same register' test is inside X86MCInstrAnalysis::isDependencyBreaking so it isn't a generic requirement? An alternative approach could be for MCInstrAnalysis::isDependencyBreaking to return true if the instruction is independent of some/all of its input operands and a APInt& Mask (or similar) is used to list the operands that it doesn't depend upon - not sure if we need all that functionality or not though so it might be over-engineering it. Also, do you know if we can make use of this in the machine schedulers or whatever? I'm not against it being inside MCInstrAnalysis in general although the fact that its so cpu-target specific suggests it could get bulky. @craig.topper What do you think? RKSimon: I don't know if this description is too x86 specific or not - AFAICT the actual 'same register'…
		andreadbAuthorUnsubmitted Not Done Reply Inline Actions I am okay with either approach. We can use an APInt to identify register reads that can be bypassed. If we want to use this knowledge in the machine schedulers, then we can add a hook to the InstructionInfo interface. Each target can then describe dependency breaking instructions using a similar logic. However, that new hook would take a `MachineInstr` in input, and not a `MCInst`. So, we still have to expose that information to MC. We could use the new scheduling predicates to construct a TII predicate. That predicate is translated by tablegen into a method to the TII interface. We can then expand that same predicate into a method in MCInstrAnalysis. That should be doable. I wanted to avoid this scenario. However, if the plan is to reuse this knowledge elsewhere, then I can revisit this patch. andreadb: I am okay with either approach. We can use an APInt to identify register reads that can be…
		RKSimonUnsubmitted Not Done Reply Inline Actions OK - we don't have a use for this yet but please can you add a TODO suggesting that per-operand dependency breaking might be useful in the future. Also, please update the comment as "if input register operands are all the same register." isn't mandatory for an instruction to be dependency breaking (move that part of the comment into the X86 XOR example) - for instance XOP VPCOM instructions ignores the input registers if the compare mode (immediate value) is TRUE or FALSE. RKSimon: OK - we don't have a use for this yet but please can you add a TODO suggesting that per-operand…
		virtual bool isDependencyBreaking(const MCSubtargetInfo &STI,
		const MCInst &Inst) const;

/// Given a branch instruction try to get the address the branch		/// Given a branch instruction try to get the address the branch
/// targets. Return true on success, and the address in Target.		/// targets. Return true on success, and the address in Target.
virtual bool		virtual bool
evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,		evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
uint64_t &Target) const;		uint64_t &Target) const;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_MC_MCINSTRANALYSIS_H		#endif // LLVM_MC_MCINSTRANALYSIS_H

lib/MC/MCInstrAnalysis.cpp

	Show All 18 Lines

	bool MCInstrAnalysis::clearsSuperRegisters(const MCRegisterInfo &MRI,			bool MCInstrAnalysis::clearsSuperRegisters(const MCRegisterInfo &MRI,
	const MCInst &Inst,			const MCInst &Inst,
	APInt &Writes) const {			APInt &Writes) const {
	Writes.clearAllBits();			Writes.clearAllBits();
	return false;			return false;
	}			}

				bool MCInstrAnalysis::isDependencyBreaking(const MCSubtargetInfo &STI,
				const MCInst &Inst) const {
				return false;
				}

	bool MCInstrAnalysis::evaluateBranch(const MCInst &Inst, uint64_t Addr,			bool MCInstrAnalysis::evaluateBranch(const MCInst &Inst, uint64_t Addr,
	uint64_t Size, uint64_t &Target) const {			uint64_t Size, uint64_t &Target) const {
	if (Inst.getNumOperands() == 0 \|\|			if (Inst.getNumOperands() == 0 \|\|
	Info->get(Inst.getOpcode()).OpInfo[0].OperandType != MCOI::OPERAND_PCREL)			Info->get(Inst.getOpcode()).OpInfo[0].OperandType != MCOI::OPERAND_PCREL)
	return false;			return false;

	int64_t Imm = Inst.getOperand(0).getImm();			int64_t Imm = Inst.getOperand(0).getImm();
	Target = Addr+Size+Imm;			Target = Addr+Size+Imm;
	return true;			return true;
	}			}

lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp

	Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines
	class X86MCInstrAnalysis : public MCInstrAnalysis {			class X86MCInstrAnalysis : public MCInstrAnalysis {
	X86MCInstrAnalysis(const X86MCInstrAnalysis &) = delete;			X86MCInstrAnalysis(const X86MCInstrAnalysis &) = delete;
	X86MCInstrAnalysis &operator=(const X86MCInstrAnalysis &) = delete;			X86MCInstrAnalysis &operator=(const X86MCInstrAnalysis &) = delete;
	virtual ~X86MCInstrAnalysis() = default;			virtual ~X86MCInstrAnalysis() = default;

	public:			public:
	X86MCInstrAnalysis(const MCInstrInfo *MCII) : MCInstrAnalysis(MCII) {}			X86MCInstrAnalysis(const MCInstrInfo *MCII) : MCInstrAnalysis(MCII) {}

				bool isDependencyBreaking(const MCSubtargetInfo &STI,
				const MCInst &Inst) const override;
	bool clearsSuperRegisters(const MCRegisterInfo &MRI, const MCInst &Inst,			bool clearsSuperRegisters(const MCRegisterInfo &MRI, const MCInst &Inst,
	APInt &Mask) const override;			APInt &Mask) const override;
	};			};

				bool X86MCInstrAnalysis::isDependencyBreaking(const MCSubtargetInfo &STI,
				const MCInst &Inst) const {
				if (STI.getCPU() == "btver2") {
				lebedev.riUnsubmitted Not Done Reply Inline Actions Hmm. Hardcoding. Are there plans on somehow generalizing this, maybe using sched profiles? lebedev.ri: Hmm. Hardcoding. Are there plans on somehow generalizing this, maybe using sched profiles?
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions There is no plan. I have experimented with alternative approaches. The 'isDependencyBreaking' could be generated by tablegen for scheduling models. However, it would require a non trivial change in the tablegen backends. In all honestly, I don't think it is worthy to do it just for the sake of better supporting dep-breaking idioms. In my opinion, this patch is simple and less invasive than any other solution involving tablegen. If we use a tablegen approach, then the only advantage is that we generate nicer checks against CPUID, instead of forcing string compares (CPU names). The rest of the auto-generated code would be pretty much identical (in my experience, a bit less readable). However, we would end up paying a non negligible design cost for a feature that is not critical imho. So, I don't think that there are better approaches (at least for now). andreadb: There is no plan. I have experimented with alternative approaches. The 'isDependencyBreaking'…
				// Reference: Agner Fog's microarchitecture.pdf - Section 20 "AMD Bobcat and
				// Jaguar pipeline", subsection 8 "Dependency-breaking instructions".
				switch (Inst.getOpcode()) {
				default:
				return false;
				case X86::SUB32rr:
				case X86::SUB64rr:
				case X86::SBB32rr:
				case X86::SBB64rr:
				case X86::XOR32rr:
				case X86::XOR64rr:
				case X86::XORPSrr:
				case X86::XORPDrr:
				case X86::VXORPSrr:
				case X86::VXORPDrr:
				case X86::ANDNPSrr:
				case X86::VANDNPSrr:
				case X86::ANDNPDrr:
				case X86::VANDNPDrr:
				case X86::PXORrr:
				case X86::VPXORrr:
				case X86::PANDNrr:
				case X86::VPANDNrr:
				case X86::PSUBBrr:
				case X86::PSUBWrr:
				case X86::PSUBDrr:
				case X86::PSUBQrr:
				case X86::VPSUBBrr:
				case X86::VPSUBWrr:
				case X86::VPSUBDrr:
				case X86::VPSUBQrr:
				case X86::PCMPEQBrr:
				case X86::PCMPEQWrr:
				case X86::PCMPEQDrr:
				case X86::PCMPEQQrr:
				case X86::VPCMPEQBrr:
				case X86::VPCMPEQWrr:
				case X86::VPCMPEQDrr:
				case X86::VPCMPEQQrr:
				case X86::PCMPGTBrr:
				case X86::PCMPGTWrr:
				case X86::PCMPGTDrr:
				case X86::PCMPGTQrr:
				case X86::VPCMPGTBrr:
				case X86::VPCMPGTWrr:
				case X86::VPCMPGTDrr:
				case X86::VPCMPGTQrr:
				case X86::MMX_PXORirr:
				case X86::MMX_PANDNirr:
				case X86::MMX_PSUBBirr:
				case X86::MMX_PSUBDirr:
				case X86::MMX_PSUBQirr:
				case X86::MMX_PSUBWirr:
				case X86::MMX_PCMPGTBirr:
				case X86::MMX_PCMPGTDirr:
				case X86::MMX_PCMPGTWirr:
				case X86::MMX_PCMPEQBirr:
				case X86::MMX_PCMPEQDirr:
				case X86::MMX_PCMPEQWirr:
				return Inst.getOperand(1).getReg() == Inst.getOperand(2).getReg();
				case X86::CMP32rr:
				case X86::CMP64rr:
				return Inst.getOperand(0).getReg() == Inst.getOperand(1).getReg();
				}
				}

				return false;
				}

	bool X86MCInstrAnalysis::clearsSuperRegisters(const MCRegisterInfo &MRI,			bool X86MCInstrAnalysis::clearsSuperRegisters(const MCRegisterInfo &MRI,
	const MCInst &Inst,			const MCInst &Inst,
	APInt &Mask) const {			APInt &Mask) const {
	const MCInstrDesc &Desc = Info->get(Inst.getOpcode());			const MCInstrDesc &Desc = Info->get(Inst.getOpcode());
	unsigned NumDefs = Desc.getNumDefs();			unsigned NumDefs = Desc.getNumDefs();
	unsigned NumImplicitDefs = Desc.getNumImplicitDefs();			unsigned NumImplicitDefs = Desc.getNumImplicitDefs();
	assert(Mask.getBitWidth() == NumDefs + NumImplicitDefs &&			assert(Mask.getBitWidth() == NumDefs + NumImplicitDefs &&
	"Unexpected number of bits in the mask!");			"Unexpected number of bits in the mask!");
	▲ Show 20 Lines • Show All 277 Lines • Show Last 20 Lines

test/tools/llvm-mca/X86/BtVer2/dependency-breaking-cmp.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=3 -iterations=1500 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=3 -iterations=1500 < %s \| FileCheck %s

	# Perf stat reports an IPC of 1.97 for this block of code.			# Perf stat reports an IPC of 1.97 for this block of code.

	# The CMP instruction doesn't depend on the value of EAX. It can set the flags			# The CMP instruction doesn't depend on the value of EAX. It can set the flags
	# without having to read the inputs.			# without having to read the inputs.

	cmp %eax, %eax			cmp %eax, %eax
	cmovae %ebx, %eax			cmovae %ebx, %eax

	# CHECK: Iterations: 1500			# CHECK: Iterations: 1500
	# CHECK-NEXT: Instructions: 3000			# CHECK-NEXT: Instructions: 3000
	# CHECK-NEXT: Total Cycles: 3003			# CHECK-NEXT: Total Cycles: 1504
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 1.00			# CHECK-NEXT: IPC: 1.99
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 24 Lines
	# CHECK-NEXT: 1.00 1.00 - - - - - - - - - - - -			# CHECK-NEXT: 1.00 1.00 - - - - - - - - - - - -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: - 1.00 - - - - - - - - - - - - cmpl %eax, %eax			# CHECK-NEXT: - 1.00 - - - - - - - - - - - - cmpl %eax, %eax
	# CHECK-NEXT: 1.00 - - - - - - - - - - - - - cmovael %ebx, %eax			# CHECK-NEXT: 1.00 - - - - - - - - - - - - - cmovael %ebx, %eax

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: Index 012345678			# CHECK-NEXT: Index 0123456

	# CHECK: [0,0] DeER . . cmpl %eax, %eax			# CHECK: [0,0] DeER .. cmpl %eax, %eax
	# CHECK-NEXT: [0,1] D=eER. . cmovael %ebx, %eax			# CHECK-NEXT: [0,1] D=eER.. cmovael %ebx, %eax
	# CHECK-NEXT: [1,0] .D=eER . cmpl %eax, %eax			# CHECK-NEXT: [1,0] .DeER.. cmpl %eax, %eax
	# CHECK-NEXT: [1,1] .D==eER . cmovael %ebx, %eax			# CHECK-NEXT: [1,1] .D=eER. cmovael %ebx, %eax
	# CHECK-NEXT: [2,0] . D==eER. cmpl %eax, %eax			# CHECK-NEXT: [2,0] . DeER. cmpl %eax, %eax
	# CHECK-NEXT: [2,1] . D===eER cmovael %ebx, %eax			# CHECK-NEXT: [2,1] . D=eER cmovael %ebx, %eax

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 2.0 0.3 0.0 cmpl %eax, %eax			# CHECK-NEXT: 0. 3 1.0 1.0 0.0 cmpl %eax, %eax
	# CHECK-NEXT: 1. 3 3.0 0.0 0.0 cmovael %ebx, %eax			# CHECK-NEXT: 1. 3 2.0 0.0 0.0 cmovael %ebx, %eax

test/tools/llvm-mca/X86/BtVer2/dependency-breaking-pcmpeq.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=3 -iterations=1500 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=3 -iterations=1500 < %s \| FileCheck %s

	# perf stat reports an IPC of 2.00 for this block of code.			# perf stat reports an IPC of 2.00 for this block of code.

	# All of the vector packed compares from this test are dependency breaking			# All of the vector packed compares from this test are dependency breaking
	# instructions. That means, there is no RAW dependency between any of the			# instructions. That means, there is no RAW dependency between any of the
	# instructions, and the code can be fully parallelized in hardware.			# instructions, and the code can be fully parallelized in hardware.

	vpcmpeqb %xmm0, %xmm0, %xmm1			vpcmpeqb %xmm0, %xmm0, %xmm1
	vpcmpeqw %xmm1, %xmm1, %xmm2			vpcmpeqw %xmm1, %xmm1, %xmm2
	vpcmpeqd %xmm2, %xmm2, %xmm3			vpcmpeqd %xmm2, %xmm2, %xmm3
	vpcmpeqq %xmm3, %xmm3, %xmm0			vpcmpeqq %xmm3, %xmm3, %xmm0

	# CHECK: Iterations: 1500			# CHECK: Iterations: 1500
	# CHECK-NEXT: Instructions: 6000			# CHECK-NEXT: Instructions: 6000
	# CHECK-NEXT: Total Cycles: 6003			# CHECK-NEXT: Total Cycles: 3003
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 1.00			# CHECK-NEXT: IPC: 2.00
	# CHECK-NEXT: Block RThroughput: 2.0			# CHECK-NEXT: Block RThroughput: 2.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 28 Lines
	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - vpcmpeqb %xmm0, %xmm0, %xmm1			# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - vpcmpeqb %xmm0, %xmm0, %xmm1
	# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - vpcmpeqw %xmm1, %xmm1, %xmm2			# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - vpcmpeqw %xmm1, %xmm1, %xmm2
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - vpcmpeqd %xmm2, %xmm2, %xmm3			# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - vpcmpeqd %xmm2, %xmm2, %xmm3
	# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - vpcmpeqq %xmm3, %xmm3, %xmm0			# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - vpcmpeqq %xmm3, %xmm3, %xmm0

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 01234			# CHECK-NEXT: Index 012345678
	# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeER . . . vpcmpeqb %xmm0, %xmm0, %xmm1			# CHECK: [0,0] DeER . . vpcmpeqb %xmm0, %xmm0, %xmm1
	# CHECK-NEXT: [0,1] D=eER. . . vpcmpeqw %xmm1, %xmm1, %xmm2			# CHECK-NEXT: [0,1] DeER . . vpcmpeqw %xmm1, %xmm1, %xmm2
	# CHECK-NEXT: [0,2] .D=eER . . vpcmpeqd %xmm2, %xmm2, %xmm3			# CHECK-NEXT: [0,2] .DeER. . vpcmpeqd %xmm2, %xmm2, %xmm3
	# CHECK-NEXT: [0,3] .D==eER . . vpcmpeqq %xmm3, %xmm3, %xmm0			# CHECK-NEXT: [0,3] .DeER. . vpcmpeqq %xmm3, %xmm3, %xmm0
	# CHECK-NEXT: [1,0] . D==eER . . vpcmpeqb %xmm0, %xmm0, %xmm1			# CHECK-NEXT: [1,0] . DeER . vpcmpeqb %xmm0, %xmm0, %xmm1
	# CHECK-NEXT: [1,1] . D===eER . . vpcmpeqw %xmm1, %xmm1, %xmm2			# CHECK-NEXT: [1,1] . DeER . vpcmpeqw %xmm1, %xmm1, %xmm2
	# CHECK-NEXT: [1,2] . D===eER. . vpcmpeqd %xmm2, %xmm2, %xmm3			# CHECK-NEXT: [1,2] . DeER . vpcmpeqd %xmm2, %xmm2, %xmm3
	# CHECK-NEXT: [1,3] . D====eER . vpcmpeqq %xmm3, %xmm3, %xmm0			# CHECK-NEXT: [1,3] . DeER . vpcmpeqq %xmm3, %xmm3, %xmm0
	# CHECK-NEXT: [2,0] . D====eER . vpcmpeqb %xmm0, %xmm0, %xmm1			# CHECK-NEXT: [2,0] . DeER. vpcmpeqb %xmm0, %xmm0, %xmm1
	# CHECK-NEXT: [2,1] . D=====eER . vpcmpeqw %xmm1, %xmm1, %xmm2			# CHECK-NEXT: [2,1] . DeER. vpcmpeqw %xmm1, %xmm1, %xmm2
	# CHECK-NEXT: [2,2] . D=====eER. vpcmpeqd %xmm2, %xmm2, %xmm3			# CHECK-NEXT: [2,2] . DeER vpcmpeqd %xmm2, %xmm2, %xmm3
	# CHECK-NEXT: [2,3] . D======eER vpcmpeqq %xmm3, %xmm3, %xmm0			# CHECK-NEXT: [2,3] . DeER vpcmpeqq %xmm3, %xmm3, %xmm0

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 3.0 0.3 0.0 vpcmpeqb %xmm0, %xmm0, %xmm1			# CHECK-NEXT: 0. 3 1.0 1.0 0.0 vpcmpeqb %xmm0, %xmm0, %xmm1
	# CHECK-NEXT: 1. 3 4.0 0.0 0.0 vpcmpeqw %xmm1, %xmm1, %xmm2			# CHECK-NEXT: 1. 3 1.0 1.0 0.0 vpcmpeqw %xmm1, %xmm1, %xmm2
	# CHECK-NEXT: 2. 3 4.0 0.0 0.0 vpcmpeqd %xmm2, %xmm2, %xmm3			# CHECK-NEXT: 2. 3 1.0 1.0 0.0 vpcmpeqd %xmm2, %xmm2, %xmm3
	# CHECK-NEXT: 3. 3 5.0 0.0 0.0 vpcmpeqq %xmm3, %xmm3, %xmm0			# CHECK-NEXT: 3. 3 1.0 1.0 0.0 vpcmpeqq %xmm3, %xmm3, %xmm0

test/tools/llvm-mca/X86/BtVer2/dependency-breaking-sbb-2.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=3 -iterations=1500 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=3 -iterations=1500 < %s \| FileCheck %s

	# perf stat reports a throughput of 1.51 IPC for this block of code.			# perf stat reports a throughput of 1.51 IPC for this block of code.

	# The SBB does not depend on the value of register EAX. That means, it doesn't			# The SBB does not depend on the value of register EAX. That means, it doesn't
	# have to wait for the IMUL to write-back on EAX. However, it still depends on			# have to wait for the IMUL to write-back on EAX. However, it still depends on
	# the ADD for EFLAGS.			# the ADD for EFLAGS.

	imul %edx, %eax			imul %edx, %eax
	add %edx, %edx			add %edx, %edx
	sbb %eax, %eax			sbb %eax, %eax

	# CHECK: Iterations: 1500			# CHECK: Iterations: 1500
	# CHECK-NEXT: Instructions: 4500			# CHECK-NEXT: Instructions: 4500
	# CHECK-NEXT: Total Cycles: 6745			# CHECK-NEXT: Total Cycles: 3007
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 0.67			# CHECK-NEXT: IPC: 1.50
	# CHECK-NEXT: Block RThroughput: 2.0			# CHECK-NEXT: Block RThroughput: 2.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 17 Lines
	# CHECK-NEXT: [9] - JSAGU			# CHECK-NEXT: [9] - JSAGU
	# CHECK-NEXT: [10] - JSTC			# CHECK-NEXT: [10] - JSTC
	# CHECK-NEXT: [11] - JVALU0			# CHECK-NEXT: [11] - JVALU0
	# CHECK-NEXT: [12] - JVALU1			# CHECK-NEXT: [12] - JVALU1
	# CHECK-NEXT: [13] - JVIMUL			# CHECK-NEXT: [13] - JVIMUL

	# CHECK: Resource pressure per iteration:			# CHECK: Resource pressure per iteration:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
	# CHECK-NEXT: 2.01 1.99 - - - - - - 1.00 - - - - -			# CHECK-NEXT: 2.00 2.00 - - - - - - 1.00 - - - - -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %edx, %eax			# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %edx, %eax
	# CHECK-NEXT: 0.99 0.01 - - - - - - - - - - - - addl %edx, %edx			# CHECK-NEXT: - 1.00 - - - - - - - - - - - - addl %edx, %edx
	# CHECK-NEXT: 1.01 0.99 - - - - - - - - - - - - sbbl %eax, %eax			# CHECK-NEXT: 2.00 - - - - - - - - - - - - - sbbl %eax, %eax

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 012345			# CHECK-NEXT: 01
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeeER . . imull %edx, %eax			# CHECK: [0,0] DeeeER .. imull %edx, %eax
	# CHECK-NEXT: [0,1] .DeE-R . . addl %edx, %edx			# CHECK-NEXT: [0,1] .DeE-R .. addl %edx, %edx
	# CHECK-NEXT: [0,2] .D==eER . . sbbl %eax, %eax			# CHECK-NEXT: [0,2] .D=eE-R .. sbbl %eax, %eax
	# CHECK-NEXT: [1,0] . D===eeeER . imull %edx, %eax			# CHECK-NEXT: [1,0] . D==eeeER.. imull %edx, %eax
	# CHECK-NEXT: [1,1] . DeE----R . addl %edx, %edx			# CHECK-NEXT: [1,1] . DeE---R.. addl %edx, %edx
	# CHECK-NEXT: [1,2] . D=====eER . sbbl %eax, %eax			# CHECK-NEXT: [1,2] . D=eE---R. sbbl %eax, %eax
	# CHECK-NEXT: [2,0] . D=====eeeER. imull %edx, %eax			# CHECK-NEXT: [2,0] . D=eeeER. imull %edx, %eax
	# CHECK-NEXT: [2,1] . DeE------R. addl %edx, %edx			# CHECK-NEXT: [2,1] . D=eE--R addl %edx, %edx
	# CHECK-NEXT: [2,2] . D=======eER sbbl %eax, %eax			# CHECK-NEXT: [2,2] . D==eE-R sbbl %eax, %eax

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 3.7 0.7 0.0 imull %edx, %eax			# CHECK-NEXT: 0. 3 2.0 0.7 0.0 imull %edx, %eax
	# CHECK-NEXT: 1. 3 1.0 1.0 3.7 addl %edx, %edx			# CHECK-NEXT: 1. 3 1.3 1.3 2.0 addl %edx, %edx
	# CHECK-NEXT: 2. 3 5.7 0.0 0.0 sbbl %eax, %eax			# CHECK-NEXT: 2. 3 2.3 0.0 1.7 sbbl %eax, %eax

test/tools/llvm-mca/X86/BtVer2/one-idioms.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -register-file-stats -iterations=1 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=1 -register-file-stats < %s \| FileCheck %s

	# These are dependency-breaking one-idioms.			# These are dependency-breaking one-idioms.
	# Much like zero-idioms, but they produce ones, and do consume resources.			# Much like zero-idioms, but they produce ones, and do consume resources.

				# perf stats reports a throughput of 2.00 IPC.

	pcmpeqb %mm2, %mm2			pcmpeqb %mm2, %mm2
	pcmpeqd %mm2, %mm2			pcmpeqd %mm2, %mm2
	pcmpeqw %mm2, %mm2			pcmpeqw %mm2, %mm2

	pcmpeqb %xmm2, %xmm2			pcmpeqb %xmm2, %xmm2
	pcmpeqd %xmm2, %xmm2			pcmpeqd %xmm2, %xmm2
	pcmpeqq %xmm2, %xmm2			pcmpeqq %xmm2, %xmm2
	pcmpeqw %xmm2, %xmm2			pcmpeqw %xmm2, %xmm2

	vpcmpeqb %xmm3, %xmm3, %xmm3			vpcmpeqb %xmm3, %xmm3, %xmm3
	vpcmpeqd %xmm3, %xmm3, %xmm3			vpcmpeqd %xmm3, %xmm3, %xmm3
	vpcmpeqq %xmm3, %xmm3, %xmm3			vpcmpeqq %xmm3, %xmm3, %xmm3
	vpcmpeqw %xmm3, %xmm3, %xmm3			vpcmpeqw %xmm3, %xmm3, %xmm3

	vpcmpeqb %xmm3, %xmm3, %xmm5			vpcmpeqb %xmm3, %xmm3, %xmm5
	vpcmpeqd %xmm3, %xmm3, %xmm5			vpcmpeqd %xmm3, %xmm3, %xmm5
	vpcmpeqq %xmm3, %xmm3, %xmm5			vpcmpeqq %xmm3, %xmm3, %xmm5
	vpcmpeqw %xmm3, %xmm3, %xmm5			vpcmpeqw %xmm3, %xmm3, %xmm5

	# FIXME: their handling is broken in llvm-mca.			# FIXME: their handling is broken in llvm-mca.
				lebedev.riUnsubmitted Not Done Reply Inline Actions You probably wanted to drop this comment. lebedev.ri: You probably wanted to drop this comment.
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions I will remove it. andreadb: I will remove it.

	# CHECK: Iterations: 1			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 15			# CHECK-NEXT: Instructions: 1500
	# CHECK-NEXT: Total Cycles: 12			# CHECK-NEXT: Total Cycles: 753
	# CHECK-NEXT: Dispatch Width: 2			# CHECK-NEXT: Dispatch Width: 2
	# CHECK-NEXT: IPC: 1.25			# CHECK-NEXT: IPC: 1.99
	# CHECK-NEXT: Block RThroughput: 7.5			# CHECK-NEXT: Block RThroughput: 7.5

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 12 Lines
	# CHECK-NEXT: 1 1 0.50 vpcmpeqq %xmm3, %xmm3, %xmm3			# CHECK-NEXT: 1 1 0.50 vpcmpeqq %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: 1 1 0.50 vpcmpeqw %xmm3, %xmm3, %xmm3			# CHECK-NEXT: 1 1 0.50 vpcmpeqw %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: 1 1 0.50 vpcmpeqb %xmm3, %xmm3, %xmm5			# CHECK-NEXT: 1 1 0.50 vpcmpeqb %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: 1 1 0.50 vpcmpeqd %xmm3, %xmm3, %xmm5			# CHECK-NEXT: 1 1 0.50 vpcmpeqd %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: 1 1 0.50 vpcmpeqq %xmm3, %xmm3, %xmm5			# CHECK-NEXT: 1 1 0.50 vpcmpeqq %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: 1 1 0.50 vpcmpeqw %xmm3, %xmm3, %xmm5			# CHECK-NEXT: 1 1 0.50 vpcmpeqw %xmm3, %xmm3, %xmm5

	# CHECK: Register File statistics:			# CHECK: Register File statistics:
	# CHECK-NEXT: Total number of mappings created: 15			# CHECK-NEXT: Total number of mappings created: 1500
	# CHECK-NEXT: Max number of mappings used: 8			# CHECK-NEXT: Max number of mappings used: 6

	# CHECK: * Register File #1 -- JFpuPRF:			# CHECK: * Register File #1 -- JFpuPRF:
	# CHECK-NEXT: Number of physical registers: 72			# CHECK-NEXT: Number of physical registers: 72
	# CHECK-NEXT: Total number of mappings created: 15			# CHECK-NEXT: Total number of mappings created: 1500
	# CHECK-NEXT: Max number of mappings used: 8			# CHECK-NEXT: Max number of mappings used: 6

	# CHECK: * Register File #2 -- JIntegerPRF:			# CHECK: * Register File #2 -- JIntegerPRF:
	# CHECK-NEXT: Number of physical registers: 64			# CHECK-NEXT: Number of physical registers: 64
	# CHECK-NEXT: Total number of mappings created: 0			# CHECK-NEXT: Total number of mappings created: 0
	# CHECK-NEXT: Max number of mappings used: 0			# CHECK-NEXT: Max number of mappings used: 0

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0] - JALU0			# CHECK-NEXT: [0] - JALU0
	# CHECK-NEXT: [1] - JALU1			# CHECK-NEXT: [1] - JALU1
	# CHECK-NEXT: [2] - JDiv			# CHECK-NEXT: [2] - JDiv
	# CHECK-NEXT: [3] - JFPA			# CHECK-NEXT: [3] - JFPA
	# CHECK-NEXT: [4] - JFPM			# CHECK-NEXT: [4] - JFPM
	# CHECK-NEXT: [5] - JFPU0			# CHECK-NEXT: [5] - JFPU0
	# CHECK-NEXT: [6] - JFPU1			# CHECK-NEXT: [6] - JFPU1
	# CHECK-NEXT: [7] - JLAGU			# CHECK-NEXT: [7] - JLAGU
	# CHECK-NEXT: [8] - JMul			# CHECK-NEXT: [8] - JMul
	# CHECK-NEXT: [9] - JSAGU			# CHECK-NEXT: [9] - JSAGU
	# CHECK-NEXT: [10] - JSTC			# CHECK-NEXT: [10] - JSTC
	# CHECK-NEXT: [11] - JVALU0			# CHECK-NEXT: [11] - JVALU0
	# CHECK-NEXT: [12] - JVALU1			# CHECK-NEXT: [12] - JVALU1
	# CHECK-NEXT: [13] - JVIMUL			# CHECK-NEXT: [13] - JVIMUL

	# CHECK: Resource pressure per iteration:			# CHECK: Resource pressure per iteration:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
	# CHECK-NEXT: - - - - - 7.00 8.00 - - - - 7.00 8.00 -			# CHECK-NEXT: - - - - - 7.50 7.50 - - - - 7.50 7.50 -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - pcmpeqb %mm2, %mm2			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - pcmpeqb %mm2, %mm2
	# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - pcmpeqd %mm2, %mm2			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - pcmpeqd %mm2, %mm2
	# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - pcmpeqw %mm2, %mm2			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - pcmpeqw %mm2, %mm2
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - pcmpeqb %xmm2, %xmm2			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - pcmpeqb %xmm2, %xmm2
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - pcmpeqd %xmm2, %xmm2			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - pcmpeqd %xmm2, %xmm2
	# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - pcmpeqq %xmm2, %xmm2			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - pcmpeqq %xmm2, %xmm2
	# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - pcmpeqw %xmm2, %xmm2			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - pcmpeqw %xmm2, %xmm2
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - vpcmpeqb %xmm3, %xmm3, %xmm3			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - vpcmpeqb %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - vpcmpeqd %xmm3, %xmm3, %xmm3			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - vpcmpeqd %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - vpcmpeqq %xmm3, %xmm3, %xmm3			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - vpcmpeqq %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - vpcmpeqw %xmm3, %xmm3, %xmm3			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - vpcmpeqw %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - vpcmpeqb %xmm3, %xmm3, %xmm5			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - vpcmpeqb %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - vpcmpeqd %xmm3, %xmm3, %xmm5			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - vpcmpeqd %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: - - - - - 1.00 - - - - - 1.00 - - vpcmpeqq %xmm3, %xmm3, %xmm5			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - vpcmpeqq %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: - - - - - - 1.00 - - - - - 1.00 - vpcmpeqw %xmm3, %xmm3, %xmm5			# CHECK-NEXT: - - - - - 0.50 0.50 - - - - 0.50 0.50 - vpcmpeqw %xmm3, %xmm3, %xmm5

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 01			# CHECK-NEXT: 0
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeER . .. pcmpeqb %mm2, %mm2			# CHECK: [0,0] DeER . . pcmpeqb %mm2, %mm2
	# CHECK-NEXT: [0,1] D=eER. .. pcmpeqd %mm2, %mm2			# CHECK-NEXT: [0,1] DeER . . pcmpeqd %mm2, %mm2
	# CHECK-NEXT: [0,2] .D=eER .. pcmpeqw %mm2, %mm2			# CHECK-NEXT: [0,2] .DeER. . pcmpeqw %mm2, %mm2
	# CHECK-NEXT: [0,3] .DeE-R .. pcmpeqb %xmm2, %xmm2			# CHECK-NEXT: [0,3] .DeER. . pcmpeqb %xmm2, %xmm2
	# CHECK-NEXT: [0,4] . DeE-R .. pcmpeqd %xmm2, %xmm2			# CHECK-NEXT: [0,4] . DeER . pcmpeqd %xmm2, %xmm2
	# CHECK-NEXT: [0,5] . D=eER .. pcmpeqq %xmm2, %xmm2			# CHECK-NEXT: [0,5] . DeER . pcmpeqq %xmm2, %xmm2
	# CHECK-NEXT: [0,6] . D=eER .. pcmpeqw %xmm2, %xmm2			# CHECK-NEXT: [0,6] . DeER . pcmpeqw %xmm2, %xmm2
	# CHECK-NEXT: [0,7] . DeE-R .. vpcmpeqb %xmm3, %xmm3, %xmm3			# CHECK-NEXT: [0,7] . DeER . vpcmpeqb %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: [0,8] . DeE-R .. vpcmpeqd %xmm3, %xmm3, %xmm3			# CHECK-NEXT: [0,8] . DeER . vpcmpeqd %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: [0,9] . D=eER .. vpcmpeqq %xmm3, %xmm3, %xmm3			# CHECK-NEXT: [0,9] . DeER . vpcmpeqq %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: [0,10] . D=eER.. vpcmpeqw %xmm3, %xmm3, %xmm3			# CHECK-NEXT: [0,10] . DeER . vpcmpeqw %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: [0,11] . D==eER. vpcmpeqb %xmm3, %xmm3, %xmm5			# CHECK-NEXT: [0,11] . DeER . vpcmpeqb %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: [0,12] . .D=eER. vpcmpeqd %xmm3, %xmm3, %xmm5			# CHECK-NEXT: [0,12] . .DeER. vpcmpeqd %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: [0,13] . .D==eER vpcmpeqq %xmm3, %xmm3, %xmm5			# CHECK-NEXT: [0,13] . .DeER. vpcmpeqq %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: [0,14] . . D=eER vpcmpeqw %xmm3, %xmm3, %xmm5			# CHECK-NEXT: [0,14] . . DeER vpcmpeqw %xmm3, %xmm3, %xmm5

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 1 1.0 1.0 0.0 pcmpeqb %mm2, %mm2			# CHECK-NEXT: 0. 1 1.0 1.0 0.0 pcmpeqb %mm2, %mm2
	# CHECK-NEXT: 1. 1 2.0 0.0 0.0 pcmpeqd %mm2, %mm2			# CHECK-NEXT: 1. 1 1.0 1.0 0.0 pcmpeqd %mm2, %mm2
	# CHECK-NEXT: 2. 1 2.0 0.0 0.0 pcmpeqw %mm2, %mm2			# CHECK-NEXT: 2. 1 1.0 1.0 0.0 pcmpeqw %mm2, %mm2
	# CHECK-NEXT: 3. 1 1.0 1.0 1.0 pcmpeqb %xmm2, %xmm2			# CHECK-NEXT: 3. 1 1.0 1.0 0.0 pcmpeqb %xmm2, %xmm2
	# CHECK-NEXT: 4. 1 1.0 0.0 1.0 pcmpeqd %xmm2, %xmm2			# CHECK-NEXT: 4. 1 1.0 1.0 0.0 pcmpeqd %xmm2, %xmm2
	# CHECK-NEXT: 5. 1 2.0 0.0 0.0 pcmpeqq %xmm2, %xmm2			# CHECK-NEXT: 5. 1 1.0 1.0 0.0 pcmpeqq %xmm2, %xmm2
	# CHECK-NEXT: 6. 1 2.0 0.0 0.0 pcmpeqw %xmm2, %xmm2			# CHECK-NEXT: 6. 1 1.0 1.0 0.0 pcmpeqw %xmm2, %xmm2
	# CHECK-NEXT: 7. 1 1.0 1.0 1.0 vpcmpeqb %xmm3, %xmm3, %xmm3			# CHECK-NEXT: 7. 1 1.0 1.0 0.0 vpcmpeqb %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: 8. 1 1.0 0.0 1.0 vpcmpeqd %xmm3, %xmm3, %xmm3			# CHECK-NEXT: 8. 1 1.0 1.0 0.0 vpcmpeqd %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: 9. 1 2.0 0.0 0.0 vpcmpeqq %xmm3, %xmm3, %xmm3			# CHECK-NEXT: 9. 1 1.0 1.0 0.0 vpcmpeqq %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: 10. 1 2.0 0.0 0.0 vpcmpeqw %xmm3, %xmm3, %xmm3			# CHECK-NEXT: 10. 1 1.0 1.0 0.0 vpcmpeqw %xmm3, %xmm3, %xmm3
	# CHECK-NEXT: 11. 1 3.0 0.0 0.0 vpcmpeqb %xmm3, %xmm3, %xmm5			# CHECK-NEXT: 11. 1 1.0 1.0 0.0 vpcmpeqb %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: 12. 1 2.0 0.0 0.0 vpcmpeqd %xmm3, %xmm3, %xmm5			# CHECK-NEXT: 12. 1 1.0 1.0 0.0 vpcmpeqd %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: 13. 1 3.0 1.0 0.0 vpcmpeqq %xmm3, %xmm3, %xmm5			# CHECK-NEXT: 13. 1 1.0 1.0 0.0 vpcmpeqq %xmm3, %xmm3, %xmm5
	# CHECK-NEXT: 14. 1 2.0 1.0 0.0 vpcmpeqw %xmm3, %xmm3, %xmm5			# CHECK-NEXT: 14. 1 1.0 1.0 0.0 vpcmpeqw %xmm3, %xmm3, %xmm5

tools/llvm-mca/DispatchStage.cpp

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	void DispatchStage::dispatch(InstRef IR) {
}		}

// A dependency-breaking instruction doesn't have to wait on the register		// A dependency-breaking instruction doesn't have to wait on the register
// input operands, and it is often optimized at register renaming stage.		// input operands, and it is often optimized at register renaming stage.
// Update RAW dependencies if this instruction is not a dependency-breaking		// Update RAW dependencies if this instruction is not a dependency-breaking
// instruction. A dependency-breaking instruction is a zero-latency		// instruction. A dependency-breaking instruction is a zero-latency
// instruction that doesn't consume hardware resources.		// instruction that doesn't consume hardware resources.
// An example of dependency-breaking instruction on X86 is a zero-idiom XOR.		// An example of dependency-breaking instruction on X86 is a zero-idiom XOR.
if (!Desc.isZeroLatency())		bool IsDependencyBreaking = IS.isDependencyBreaking();
lebedev.riUnsubmitted Not Done Reply Inline Actions So it will no longer even consider the sched profile? lebedev.ri: So it will no longer even consider the sched profile?
andreadbAuthorUnsubmitted Not Done Reply Inline Actions Not sure I understand the question. I am definitely using profile information from the scheduling model. andreadb: Not sure I understand the question. I am definitely using profile information from the…
lebedev.riUnsubmitted Not Done Reply Inline Actions What i'm asking is - what happens if sched model says that instruction N has zero latency, but `isDependencyBreaking()` does not say that? lebedev.ri: What i'm asking is - what happens if sched model says that instruction N has zero latency, but…
andreadbAuthorUnsubmitted Not Done Reply Inline Actions In that case, instruction N will have to wait in the scheduler until input registers are all available. Then it is executed. andreadb: In that case, instruction N will have to wait in the scheduler until input registers are all…
andreadbAuthorUnsubmitted Not Done Reply Inline Actions If it doesn’t work like that, then there is a bug. I cannot test it at the moment as I am not at work. I will check it on next days. Thanks. andreadb: If it doesn’t work like that, then there is a bug. I cannot test it at the moment as I am not…
lebedev.riUnsubmitted Not Done Reply Inline Actions So the scheduler-profile-aware instruction scheduler will schedule it as-if it has zero latency, but the mca-based analysis will not? And how would one go about detecting such inconsistencies? This all makes me uneasy. lebedev.ri: So the scheduler-profile-aware instruction scheduler will schedule it as-if it has zero latency…
andreadbAuthorUnsubmitted Not Done Reply Inline Actions No. I never said that. It would still execute zero cycles. It simply has to wait for the operands. Since I didn’t specifically test that bogus scenario, I will check that it really behaves like that. I cannot do that test now because I am not at work. I will be back in a few days. andreadb: No. I never said that. It would still execute zero cycles. It simply has to wait for the…
for (std::unique_ptr<ReadState> &RS : IS.getUses())		for (std::unique_ptr<ReadState> &RS : IS.getUses())
		if (RS->isImplicitRead() \|\| !IsDependencyBreaking)
updateRAWDependencies(*RS, STI);		updateRAWDependencies(*RS, STI);

// By default, a dependency-breaking zero-latency instruction is expected to		// By default, a dependency-breaking zero-latency instruction is expected to
// be optimized at register renaming stage. That means, no physical register		// be optimized at register renaming stage. That means, no physical register
// is allocated to the instruction.		// is allocated to the instruction.
		bool ShouldAllocateRegisters =
		!(Desc.isZeroLatency() && IsDependencyBreaking);
SmallVector<unsigned, 4> RegisterFiles(PRF.getNumRegisterFiles());		SmallVector<unsigned, 4> RegisterFiles(PRF.getNumRegisterFiles());
for (std::unique_ptr<WriteState> &WS : IS.getDefs())		for (std::unique_ptr<WriteState> &WS : IS.getDefs()) {
PRF.addRegisterWrite(WriteRef(IR.first, WS.get()), RegisterFiles,		PRF.addRegisterWrite(WriteRef(IR.first, WS.get()), RegisterFiles,
!Desc.isZeroLatency());		ShouldAllocateRegisters);
		}

// Reserve slots in the RCU, and notify the instruction that it has been		// Reserve slots in the RCU, and notify the instruction that it has been
// dispatched to the schedulers for execution.		// dispatched to the schedulers for execution.
IS.dispatch(RCU.reserveSlot(IR, NumMicroOps));		IS.dispatch(RCU.reserveSlot(IR, NumMicroOps));

// Notify listeners of the "instruction dispatched" event.		// Notify listeners of the "instruction dispatched" event.
notifyInstructionDispatched(IR, RegisterFiles);		notifyInstructionDispatched(IR, RegisterFiles);
}		}
Show All 21 Lines

tools/llvm-mca/InstrBuilder.cpp

Show First 20 Lines • Show All 437 Lines • ▼ Show 20 Lines	InstrBuilder::createInstruction(const MCInst &MCI) {
// Track register writes that implicitly clear the upper portion of the		// Track register writes that implicitly clear the upper portion of the
// underlying super-registers using an APInt.		// underlying super-registers using an APInt.
APInt WriteMask(D.Writes.size(), 0);		APInt WriteMask(D.Writes.size(), 0);

// Now query the MCInstrAnalysis object to obtain information about which		// Now query the MCInstrAnalysis object to obtain information about which
// register writes implicitly clear the upper portion of a super-register.		// register writes implicitly clear the upper portion of a super-register.
MCIA.clearsSuperRegisters(MRI, MCI, WriteMask);		MCIA.clearsSuperRegisters(MRI, MCI, WriteMask);

		// Check if this is a dependency breaking instruction.
		if (MCIA.isDependencyBreaking(STI, MCI))
		NewIS->setDependencyBreaking();

// Initialize writes.		// Initialize writes.
unsigned WriteIndex = 0;		unsigned WriteIndex = 0;
for (const WriteDescriptor &WD : D.Writes) {		for (const WriteDescriptor &WD : D.Writes) {
unsigned RegID = WD.isImplicitWrite() ? WD.RegisterID		unsigned RegID = WD.isImplicitWrite() ? WD.RegisterID
: MCI.getOperand(WD.OpIndex).getReg();		: MCI.getOperand(WD.OpIndex).getReg();
// Check if this is a optional definition that references NoReg.		// Check if this is a optional definition that references NoReg.
if (WD.IsOptionalDef && !RegID) {		if (WD.IsOptionalDef && !RegID) {
++WriteIndex;		++WriteIndex;
Show All 12 Lines

tools/llvm-mca/Instruction.h

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	class ReadState {
// dependent writes (i.e. field DependentWrite) is zero, this value is		// dependent writes (i.e. field DependentWrite) is zero, this value is
// propagated to field CyclesLeft.		// propagated to field CyclesLeft.
unsigned TotalCycles;		unsigned TotalCycles;
// This field is set to true only if there are no dependent writes, and		// This field is set to true only if there are no dependent writes, and
// there are no `CyclesLeft' to wait.		// there are no `CyclesLeft' to wait.
bool IsReady;		bool IsReady;

public:		public:
bool isReady() const { return IsReady; }

ReadState(const ReadDescriptor &Desc, unsigned RegID)		ReadState(const ReadDescriptor &Desc, unsigned RegID)
: RD(Desc), RegisterID(RegID), DependentWrites(0),		: RD(Desc), RegisterID(RegID), DependentWrites(0),
CyclesLeft(UNKNOWN_CYCLES), TotalCycles(0), IsReady(true) {}		CyclesLeft(UNKNOWN_CYCLES), TotalCycles(0), IsReady(true) {}
ReadState(const ReadState &Other) = delete;		ReadState(const ReadState &Other) = delete;
ReadState &operator=(const ReadState &Other) = delete;		ReadState &operator=(const ReadState &Other) = delete;

const ReadDescriptor &getDescriptor() const { return RD; }		const ReadDescriptor &getDescriptor() const { return RD; }
unsigned getSchedClass() const { return RD.SchedClassID; }		unsigned getSchedClass() const { return RD.SchedClassID; }
unsigned getRegisterID() const { return RegisterID; }		unsigned getRegisterID() const { return RegisterID; }

		bool isReady() const { return IsReady; }
		bool isImplicitRead() const { return RD.isImplicitRead(); }

void cycleEvent();		void cycleEvent();
void writeStartEvent(unsigned Cycles);		void writeStartEvent(unsigned Cycles);
void setDependentWrites(unsigned Writes) {		void setDependentWrites(unsigned Writes) {
DependentWrites = Writes;		DependentWrites = Writes;
IsReady = !Writes;		IsReady = !Writes;
}		}
};		};

▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	class Instruction {

// This value defaults to the instruction latency. This instruction is		// This value defaults to the instruction latency. This instruction is
// considered executed when field CyclesLeft goes to zero.		// considered executed when field CyclesLeft goes to zero.
int CyclesLeft;		int CyclesLeft;

// Retire Unit token ID for this instruction.		// Retire Unit token ID for this instruction.
unsigned RCUTokenID;		unsigned RCUTokenID;

		bool IsDepBreaking;

using UniqueDef = std::unique_ptr<WriteState>;		using UniqueDef = std::unique_ptr<WriteState>;
using UniqueUse = std::unique_ptr<ReadState>;		using UniqueUse = std::unique_ptr<ReadState>;
using VecDefs = std::vector<UniqueDef>;		using VecDefs = std::vector<UniqueDef>;
using VecUses = std::vector<UniqueUse>;		using VecUses = std::vector<UniqueUse>;

// Output dependencies.		// Output dependencies.
// One entry per each implicit and explicit register definition.		// One entry per each implicit and explicit register definition.
VecDefs Defs;		VecDefs Defs;

// Input dependencies.		// Input dependencies.
// One entry per each implicit and explicit register use.		// One entry per each implicit and explicit register use.
VecUses Uses;		VecUses Uses;

public:		public:
Instruction(const InstrDesc &D)		Instruction(const InstrDesc &D)
: Desc(D), Stage(IS_INVALID), CyclesLeft(UNKNOWN_CYCLES) {}		: Desc(D), Stage(IS_INVALID), CyclesLeft(UNKNOWN_CYCLES), RCUTokenID(0),
		IsDepBreaking(false) {}
Instruction(const Instruction &Other) = delete;		Instruction(const Instruction &Other) = delete;
Instruction &operator=(const Instruction &Other) = delete;		Instruction &operator=(const Instruction &Other) = delete;

VecDefs &getDefs() { return Defs; }		VecDefs &getDefs() { return Defs; }
const VecDefs &getDefs() const { return Defs; }		const VecDefs &getDefs() const { return Defs; }
VecUses &getUses() { return Uses; }		VecUses &getUses() { return Uses; }
const VecUses &getUses() const { return Uses; }		const VecUses &getUses() const { return Uses; }
const InstrDesc &getDesc() const { return Desc; }		const InstrDesc &getDesc() const { return Desc; }
unsigned getRCUTokenID() const { return RCUTokenID; }		unsigned getRCUTokenID() const { return RCUTokenID; }
int getCyclesLeft() const { return CyclesLeft; }		int getCyclesLeft() const { return CyclesLeft; }

		bool isDependencyBreaking() const { return IsDepBreaking; }
		void setDependencyBreaking() { IsDepBreaking = true; }

unsigned getNumUsers() const {		unsigned getNumUsers() const {
unsigned NumUsers = 0;		unsigned NumUsers = 0;
for (const UniqueDef &Def : Defs)		for (const UniqueDef &Def : Defs)
NumUsers += Def->getNumUsers();		NumUsers += Def->getNumUsers();
return NumUsers;		return NumUsers;
}		}

// Transition to the dispatch stage, and assign a RCUToken to this		// Transition to the dispatch stage, and assign a RCUToken to this
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

tools/llvm-mca/RetireStage.cpp

Show All 39 Lines	while (!RCU.isEmpty()) {
notifyInstructionRetired(Current.IR);		notifyInstructionRetired(Current.IR);
NumRetired++;		NumRetired++;
}		}
}		}

void RetireStage::notifyInstructionRetired(const InstRef &IR) {		void RetireStage::notifyInstructionRetired(const InstRef &IR) {
LLVM_DEBUG(dbgs() << "[E] Instruction Retired: #" << IR << '\n');		LLVM_DEBUG(dbgs() << "[E] Instruction Retired: #" << IR << '\n');
SmallVector<unsigned, 4> FreedRegs(PRF.getNumRegisterFiles());		SmallVector<unsigned, 4> FreedRegs(PRF.getNumRegisterFiles());
const InstrDesc &Desc = IR.getInstruction()->getDesc();		const Instruction &Inst = *IR.getInstruction();
		const InstrDesc &Desc = Inst.getDesc();

for (const std::unique_ptr<WriteState> &WS : IR.getInstruction()->getDefs())		bool ShouldFreeRegs = !(Desc.isZeroLatency() && Inst.isDependencyBreaking());
PRF.removeRegisterWrite(*WS.get(), FreedRegs, !Desc.isZeroLatency());		for (const std::unique_ptr<WriteState> &WS : Inst.getDefs())
		PRF.removeRegisterWrite(*WS.get(), FreedRegs, ShouldFreeRegs);
notifyEvent<HWInstructionEvent>(HWInstructionRetiredEvent(IR, FreedRegs));		notifyEvent<HWInstructionEvent>(HWInstructionRetiredEvent(IR, FreedRegs));
}		}

} // namespace mca		} // namespace mca