This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
-
llvm-mca.rst
-
include/llvm/MCA/
-
llvm/
-
MCA/
-
CustomBehaviour.h
-
Instruction.h
-
lib/
-
MCA/HardwareUnits/
-
HardwareUnits/
-
LSUnit.cpp
-
Target/X86/
-
X86/
-
CMakeLists.txt
-
MCA/
-
CMakeLists.txt
-
X86CustomBehaviour.h
-
X86CustomBehaviour.cpp
-
test/tools/llvm-mca/
-
tools/
-
llvm-mca/
-
AArch64/Cortex/
-
Cortex/
-
A55-load-store-noalias.s
-
AMDGPU/
-
gfx9-retireooo.s
-
X86/
-
Barcelona/
-
store-throughput.s
-
BdVer2/
-
load-store-throughput.s
-
pr37790.s
-
store-throughput.s
-
BtVer2/
-
pr37790.s
-
stmxcsr-ldmxcsr.s
-
Haswell/
-
reserved-resources.s
-
stmxcsr-ldmxcsr.s
1
barrier_output.s
-
tools/llvm-mca/
-
llvm-mca/
-
Views/
1
InstructionInfoView.h
-
InstructionInfoView.cpp
1
llvm-mca.cpp

Differential D116779

[llvm-mca] [LSUnit] Proposal for declaring memory-barrier instructions explicitly rather than making conservative assumptions.
ClosedPublic

Authored by holland11 on Jan 6 2022, 5:33 PM.

Download Raw Diff

Details

Reviewers

andreadb
qcolombet
lebedev.ri

Commits

rG85e6e748d426: [MCA] Switching from conservatively guessing which instructions are

Summary

TLDR: Currently, llvm-mca makes a very conservative assumption about which instructions are and aren't memory-barrier instructions. This leads to quite a few false positives which can result in inaccurate simulations. With this patch, I am proposing that we get rid of the assumptions and instead give the targets and developers a convenient way to specify exactly which instructions should be treated as LoadBarriers and/or StoreBarriers.

First off, I'd really like to apologize for how long this patch has taken me to post. Andrea and I discussed this in late August and I had almost all of it completed by early September. Due to a mixture of personal and technical issues, I was unable to finalize it until now. As far as I can tell, the patch is still relevant, but let me know if not.

As mentioned in the TLDR, this patch aims to be more explicit about which instructions are memory-barriers. Currently, the LSUnit uses the MayLoad, MayStore, and HasUnmodeledSideEffects flags to decide which instructions should behave as barriers. This is a conservative assumption and leads to some instructions causing stalls in the pipeline that shouldn't exist.

What I've done is add two flags to the InstructionBase class (IsALoadBarrier and IsAStoreBarrier). These flags can be modified and evaluated with the isAStoreBarrier(), isALoadBarrier(), setStoreBarrier(), and setLoadBarrier() methods from the same class. In my mind, it is most natural to set these flags within a target's InstrPostProcess::postProcessInstruction() method. An example of this is included in the patch within the newly added X86InstrPostProcess class (within the X86CustomBehaviour.cpp file).

I also added a new command-line argument to llvm-mca (--show-barriers) which adds two additional columns to the InstructionInfoView to show which instructions are load and/or store barriers. This argument defaults to false so the default behaviour of this view is unchanged.

I expected this patch to be quite disruptive to the mca test cases, but I only had to update 10 of the test cases (and then I added a new test case that shows the --show-barriers argument in action). I would really appreciate if some people could look through how those test cases were altered to make sure that they haven't been made inaccurate (or at least not more inaccurate). Feel free to add more reviewers if you'd like target specific developers to take a look at the updated tests.

Each of those 10 test cases have 1 or more instructions that the LSUnit used to conservatively assume were load and/or store barriers. I've listed those instructions below. For some of these test cases, this patch affected their resource pressure statistics. This is not something that I am very familiar with so I'd also really appreciate if someone could make sure that these changes to the resource pressure stats aren't moving in the wrong direction.

AArch64/Cortex/A55-load-store-noalias.s:
nop

AMDGPU/gfx9-retireooo.s:
flat_load_dword (This is the test case that motivated this patch so I'm fairly confident this one shouldn't be treated as a barrier.)

X86/Barcelona/store-throughput.s :
movd

X86/BdVer2/load-store-throughput.s:
movd

X86/BdVer2/pr37790.s:
int3 and stmxcsr

X86/BdVer2/store-throughput.s:
movd

X86/BtVer2/pr37790.s:
int3 and stmxcsr

X86/BtVer2/stmxcsr-ldmxcsr.s:
stmxcsr and ldmxcsr

X86/Haswell/reserved-resources.s:
fxrstor

X86/Haswell/stmxcsr-ldmxcsr.s:
stmxcsr and ldmxcsr

If any of these instructions should be load and/or store barriers, let me know and I can update this patch to set the appropriate flag(s) for those instructions.

Thank you for your time and thanks in advance for any of your input, suggestions, criticisms, or questions. And sorry again for having this patch take so long.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

holland11 created this revision.Jan 6 2022, 5:33 PM

Herald added a reviewer: lebedev.ri. · View Herald TranscriptJan 6 2022, 5:33 PM

Herald added subscribers: kerbowa, pengfei, lebedev.ri and 7 others. · View Herald Transcript

holland11 requested review of this revision.Jan 6 2022, 5:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2022, 5:33 PM

holland11 edited the summary of this revision. (Show Details)Jan 6 2022, 5:33 PM

Harbormaster completed remote builds in B141991: Diff 398017.Jan 6 2022, 6:19 PM

Interesting!

TLDR: Currently, llvm-mca makes a very conservative assumption about which instructions are and aren't memory-barrier instructions. This leads to quite a few false positives which can result in inaccurate simulations. With this patch, I am proposing that we get rid of the assumptions and instead give the targets and developers a convenient way to specify exactly which instructions should be treated as LoadBarriers and/or StoreBarriers.

Thanks for working on this!

First off, I'd really like to apologize for how long this patch has taken me to post. Andrea and I discussed this in late August and I had almost all of it completed by early September. Due to a mixture of personal and technical issues, I was unable to finalize it until now. As far as I can tell, the patch is still relevant, but let me know if not.

Don't need to apologise! I wish there were more contributors like you :)
This patch is definitely still relevant. Thanks for contributing it!

As mentioned in the TLDR, this patch aims to be more explicit about which instructions are memory-barriers. Currently, the LSUnit uses the MayLoad, MayStore, and HasUnmodeledSideEffects flags to decide which instructions should behave as barriers. This is a conservative assumption and leads to some instructions causing stalls in the pipeline that shouldn't exist.

What I've done is add two flags to the InstructionBase class (IsALoadBarrier and IsAStoreBarrier). These flags can be modified and evaluated with the isAStoreBarrier(), isALoadBarrier(), setStoreBarrier(), and setLoadBarrier() methods from the same class. In my mind, it is most natural to set these flags within a target's InstrPostProcess::postProcessInstruction() method. An example of this is included in the patch within the newly added X86InstrPostProcess class (within the X86CustomBehaviour.cpp file).

Sounds good to me!

I also added a new command-line argument to llvm-mca (--show-barriers) which adds two additional columns to the InstructionInfoView to show which instructions are load and/or store barriers. This argument defaults to false so the default behaviour of this view is unchanged.

Thanks for adding that flag!

I had a look at the patch and it looks good modulo a couple of nits.

The changes to the x86 tests make sense. I only had a couple of nits (see my other comments below).
I trust you that the CMake changes work fine. :)

llvm/test/tools/llvm-mca/X86/barrier_output.s
2–17	You also need to test SFENCE here. You can also get rid of most instructions here. Personally, I would only leave SFENCE MFENCE LFENCE and any other instructions with "unmodeled side effects". So, CLFLUSH, LFENCE, MFENCE and SFENCE are fine. All other SSE instructions can be removed in my opinion
llvm/tools/llvm-mca/Views/InstructionInfoView.h
58	Can this be something like this? using UniqueInst = std::unique_ptr<mca::Instruction>; ArrayRef<UniqueInst> LoweredInsts;
llvm/tools/llvm-mca/llvm-mca.cpp
555–558	I wonder if we should use SmallVector for the LoweredSequence (instead of a std::vector). The average code snippet size tends to be very small. So, a SmallVector might perform a bit better. The constructor of InstructionInfoView should probably use an ArrayRef for the LoweredSequence. This would be similar to what we already do for the SourceMgr.

Forgot to mention that this work is related to issue: https://github.com/llvm/llvm-project/issues/36015

We need to remember to resolve that issue once this patch goes in.

@andreadb Thanks for the feedback / suggestions. They all made sense and I implemented each of them. Let me know if they look how you expected them to.

Harbormaster completed remote builds in B142203: Diff 398316.Jan 8 2022, 12:25 AM

In D116779#3229119, @holland11 wrote:

@andreadb Thanks for the feedback / suggestions. They all made sense and I implemented each of them. Let me know if they look how you expected them to.

It looks good to me.

Thanks Patrick!

This revision is now accepted and ready to land.Jan 8 2022, 4:08 AM

Closed by commit rG85e6e748d426: [MCA] Switching from conservatively guessing which instructions are (authored by holland11). · Explain WhyJan 11 2022, 1:51 PM

This revision was automatically updated to reflect the committed changes.

holland11 added a commit: rG85e6e748d426: [MCA] Switching from conservatively guessing which instructions are.

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-mca.rst

24 lines

include/

llvm/

MCA/

CustomBehaviour.h

4 lines

Instruction.h

11 lines

lib/

MCA/

HardwareUnits/

LSUnit.cpp

13 lines

Target/

X86/

CMakeLists.txt

1 line

MCA/

CMakeLists.txt

14 lines

X86CustomBehaviour.h

47 lines

X86CustomBehaviour.cpp

64 lines

test/

tools/

llvm-mca/

AArch64/

Cortex/

A55-load-store-noalias.s

44 lines

AMDGPU/

gfx9-retireooo.s

82 lines

X86/

Barcelona/

store-throughput.s

8 lines

BdVer2/

load-store-throughput.s

41 lines

pr37790.s

20 lines

store-throughput.s

43 lines

BtVer2/

pr37790.s

16 lines

stmxcsr-ldmxcsr.s

60 lines

Haswell/

reserved-resources.s

4 lines

stmxcsr-ldmxcsr.s

74 lines

barrier_output.s

25 lines

tools/

llvm-mca/

Views/

InstructionInfoView.h

10 lines

InstructionInfoView.cpp

32 lines

llvm-mca.cpp

13 lines

Diff 399070

llvm/docs/CommandGuide/llvm-mca.rst

	Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	.. option:: -instruction-info			.. option:: -instruction-info

	Enable the instruction info view. This is enabled by default.			Enable the instruction info view. This is enabled by default.

	.. option:: -show-encoding			.. option:: -show-encoding

	Enable the printing of instruction encodings within the instruction info view.			Enable the printing of instruction encodings within the instruction info view.

				.. option:: -show-barriers

				Enable the printing of LoadBarrier and StoreBarrier flags within the
				instruction info view.

	.. option:: -all-stats			.. option:: -all-stats

	Print all hardware statistics. This enables extra statistics related to the			Print all hardware statistics. This enables extra statistics related to the
	dispatch logic, the hardware schedulers, the register file(s), and the retire			dispatch logic, the hardware schedulers, the register file(s), and the retire
	control unit. This option is disabled by default.			control unit. This option is disabled by default.

	.. option:: -all-views			.. option:: -all-views

	▲ Show 20 Lines • Show All 751 Lines • ▼ Show 20 Lines
	* The LSUnit does not know how to identify serializing operations and memory			* The LSUnit does not know how to identify serializing operations and memory
	fences.			fences.

	The LSUnit does not attempt to predict if a load or store hits or misses the L1			The LSUnit does not attempt to predict if a load or store hits or misses the L1
	cache. It only knows if an instruction "MayLoad" and/or "MayStore." For			cache. It only knows if an instruction "MayLoad" and/or "MayStore." For
	loads, the scheduling model provides an "optimistic" load-to-use latency (which			loads, the scheduling model provides an "optimistic" load-to-use latency (which
	usually matches the load-to-use latency for when there is a hit in the L1D).			usually matches the load-to-use latency for when there is a hit in the L1D).

	:program:`llvm-mca` does not know about serializing operations or memory-barrier			:program:`llvm-mca` does not (on its own) know about serializing operations or
	like instructions. The LSUnit conservatively assumes that an instruction which			memory-barrier like instructions. The LSUnit used to conservatively use an
	has both "MayLoad" and unmodeled side effects behaves like a "soft"			instruction's "MayLoad", "MayStore", and unmodeled side effects flags to
	load-barrier. That means, it serializes loads without forcing a flush of the			determine whether an instruction should be treated as a memory-barrier. This was
	load queue. Similarly, instructions that "MayStore" and have unmodeled side			inaccurate in general and was changed so that now each instruction has an
	effects are treated like store barriers. A full memory barrier is a "MayLoad"			IsAStoreBarrier and IsALoadBarrier flag. These flags are mca specific and
	and "MayStore" instruction with unmodeled side effects. This is inaccurate, but			default to false for every instruction. If any instruction should have either of
	it is the best that we can do at the moment with the current information			these flags set, it should be done within the target's InstrPostProcess class.
	available in LLVM.			For an example, look at the `X86InstrPostProcess::postProcessInstruction` method
				within `llvm/lib/Target/X86/MCA/X86CustomBehaviour.cpp`.

	A load/store barrier consumes one entry of the load/store queue. A load/store			A load/store barrier consumes one entry of the load/store queue. A load/store
	barrier enforces ordering of loads/stores. A younger load cannot pass a load			barrier enforces ordering of loads/stores. A younger load cannot pass a load
	barrier. Also, a younger store cannot pass a store barrier. A younger load			barrier. Also, a younger store cannot pass a store barrier. A younger load
	has to wait for the memory/load barrier to execute. A load/store barrier is			has to wait for the memory/load barrier to execute. A load/store barrier is
	"executed" when it becomes the oldest entry in the load/store queue(s). That			"executed" when it becomes the oldest entry in the load/store queue(s). That
	also means, by construction, all of the older loads/stores have been executed.			also means, by construction, all of the older loads/stores have been executed.

	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/include/llvm/MCA/CustomBehaviour.h

Show All 37 Lines	protected:
const MCInstrInfo &MCII;		const MCInstrInfo &MCII;

public:		public:
InstrPostProcess(const MCSubtargetInfo &STI, const MCInstrInfo &MCII)		InstrPostProcess(const MCSubtargetInfo &STI, const MCInstrInfo &MCII)
: STI(STI), MCII(MCII) {}		: STI(STI), MCII(MCII) {}

virtual ~InstrPostProcess() {}		virtual ~InstrPostProcess() {}

		/// This method can be overriden by targets to modify the mca::Instruction
		/// object after it has been lowered from the MCInst.
		/// This is generally a less disruptive alternative to modifying the
		/// scheduling model.
virtual void postProcessInstruction(std::unique_ptr<Instruction> &Inst,		virtual void postProcessInstruction(std::unique_ptr<Instruction> &Inst,
const MCInst &MCI) {}		const MCInst &MCI) {}
};		};

/// Class which can be overriden by targets to enforce instruction		/// Class which can be overriden by targets to enforce instruction
/// dependencies and behaviours that aren't expressed well enough		/// dependencies and behaviours that aren't expressed well enough
/// within the scheduling model for mca to automatically simulate		/// within the scheduling model for mca to automatically simulate
/// them properly.		/// them properly.
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/include/llvm/MCA/Instruction.h

Show First 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	class InstructionBase {
SmallVector<ReadState, 4> Uses;		SmallVector<ReadState, 4> Uses;

// List of operands which can be used by mca::CustomBehaviour		// List of operands which can be used by mca::CustomBehaviour
std::vector<MCAOperand> Operands;		std::vector<MCAOperand> Operands;

// Instruction opcode which can be used by mca::CustomBehaviour		// Instruction opcode which can be used by mca::CustomBehaviour
unsigned Opcode;		unsigned Opcode;

		// Flags used by the LSUnit.
		bool IsALoadBarrier;
		bool IsAStoreBarrier;

public:		public:
InstructionBase(const InstrDesc &D, const unsigned Opcode)		InstructionBase(const InstrDesc &D, const unsigned Opcode)
: Desc(D), IsOptimizableMove(false), Operands(0), Opcode(Opcode) {}		: Desc(D), IsOptimizableMove(false), Operands(0), Opcode(Opcode),
		IsALoadBarrier(false), IsAStoreBarrier(false) {}

SmallVectorImpl<WriteState> &getDefs() { return Defs; }		SmallVectorImpl<WriteState> &getDefs() { return Defs; }
ArrayRef<WriteState> getDefs() const { return Defs; }		ArrayRef<WriteState> getDefs() const { return Defs; }
SmallVectorImpl<ReadState> &getUses() { return Uses; }		SmallVectorImpl<ReadState> &getUses() { return Uses; }
ArrayRef<ReadState> getUses() const { return Uses; }		ArrayRef<ReadState> getUses() const { return Uses; }
const InstrDesc &getDesc() const { return Desc; }		const InstrDesc &getDesc() const { return Desc; }

unsigned getLatency() const { return Desc.MaxLatency; }		unsigned getLatency() const { return Desc.MaxLatency; }
unsigned getNumMicroOps() const { return Desc.NumMicroOps; }		unsigned getNumMicroOps() const { return Desc.NumMicroOps; }
unsigned getOpcode() const { return Opcode; }		unsigned getOpcode() const { return Opcode; }
		bool isALoadBarrier() const { return IsALoadBarrier; }
		bool isAStoreBarrier() const { return IsAStoreBarrier; }
		void setLoadBarrier(bool IsBarrier) { IsALoadBarrier = IsBarrier; }
		void setStoreBarrier(bool IsBarrier) { IsAStoreBarrier = IsBarrier; }

/// Return the MCAOperand which corresponds to index Idx within the original		/// Return the MCAOperand which corresponds to index Idx within the original
/// MCInst.		/// MCInst.
const MCAOperand *getOperand(const unsigned Idx) const {		const MCAOperand *getOperand(const unsigned Idx) const {
auto It = std::find_if(		auto It = std::find_if(
Operands.begin(), Operands.end(),		Operands.begin(), Operands.end(),
[&Idx](const MCAOperand &Op) { return Op.getIndex() == Idx; });		[&Idx](const MCAOperand &Op) { return Op.getIndex() == Idx; });
if (It == Operands.end())		if (It == Operands.end())
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/lib/MCA/HardwareUnits/LSUnit.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	dbgs() << "[LSUnit] Group (" << GroupIt.first << "): "
<< ", #IIssued = " << Group.getNumExecuting()		<< ", #IIssued = " << Group.getNumExecuting()
<< ", #IExecuted = " << Group.getNumExecuted() << '\n';		<< ", #IExecuted = " << Group.getNumExecuted() << '\n';
}		}
}		}
#endif		#endif

unsigned LSUnit::dispatch(const InstRef &IR) {		unsigned LSUnit::dispatch(const InstRef &IR) {
const InstrDesc &Desc = IR.getInstruction()->getDesc();		const InstrDesc &Desc = IR.getInstruction()->getDesc();
unsigned IsMemBarrier = Desc.HasSideEffects;		bool IsStoreBarrier = IR.getInstruction()->isAStoreBarrier();
		bool IsLoadBarrier = IR.getInstruction()->isALoadBarrier();
assert((Desc.MayLoad \|\| Desc.MayStore) && "Not a memory operation!");		assert((Desc.MayLoad \|\| Desc.MayStore) && "Not a memory operation!");

if (Desc.MayLoad)		if (Desc.MayLoad)
acquireLQSlot();		acquireLQSlot();
if (Desc.MayStore)		if (Desc.MayStore)
acquireSQSlot();		acquireSQSlot();

if (Desc.MayStore) {		if (Desc.MayStore) {
Show All 26 Lines	if (CurrentStoreGroupID &&
MemoryGroup &StoreGroup = getGroup(CurrentStoreGroupID);		MemoryGroup &StoreGroup = getGroup(CurrentStoreGroupID);
LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: (" << CurrentStoreGroupID		LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: (" << CurrentStoreGroupID
<< ") --> (" << NewGID << ")\n");		<< ") --> (" << NewGID << ")\n");
StoreGroup.addSuccessor(&NewGroup, !assumeNoAlias());		StoreGroup.addSuccessor(&NewGroup, !assumeNoAlias());
}		}


CurrentStoreGroupID = NewGID;		CurrentStoreGroupID = NewGID;
if (IsMemBarrier)		if (IsStoreBarrier)
CurrentStoreBarrierGroupID = NewGID;		CurrentStoreBarrierGroupID = NewGID;

if (Desc.MayLoad) {		if (Desc.MayLoad) {
CurrentLoadGroupID = NewGID;		CurrentLoadGroupID = NewGID;
if (IsMemBarrier)		if (IsLoadBarrier)
CurrentLoadBarrierGroupID = NewGID;		CurrentLoadBarrierGroupID = NewGID;
}		}

return NewGID;		return NewGID;
}		}

assert(Desc.MayLoad && "Expected a load!");		assert(Desc.MayLoad && "Expected a load!");

unsigned ImmediateLoadDominator =		unsigned ImmediateLoadDominator =
std::max(CurrentLoadGroupID, CurrentLoadBarrierGroupID);		std::max(CurrentLoadGroupID, CurrentLoadBarrierGroupID);

// A new load group is created if we are in one of the following situations:		// A new load group is created if we are in one of the following situations:
// 1) This is a load barrier (by construction, a load barrier is always		// 1) This is a load barrier (by construction, a load barrier is always
// assigned to a different memory group).		// assigned to a different memory group).
// 2) There is no load in flight (by construction we always keep loads and		// 2) There is no load in flight (by construction we always keep loads and
// stores into separate memory groups).		// stores into separate memory groups).
// 3) There is a load barrier in flight. This load depends on it.		// 3) There is a load barrier in flight. This load depends on it.
// 4) There is an intervening store between the last load dispatched to the		// 4) There is an intervening store between the last load dispatched to the
// LSU and this load. We always create a new group even if this load		// LSU and this load. We always create a new group even if this load
// does not alias the last dispatched store.		// does not alias the last dispatched store.
// 5) There is no intervening store and there is an active load group.		// 5) There is no intervening store and there is an active load group.
// However that group has already started execution, so we cannot add		// However that group has already started execution, so we cannot add
// this load to it.		// this load to it.
bool ShouldCreateANewGroup =		bool ShouldCreateANewGroup =
IsMemBarrier \|\| !ImmediateLoadDominator \|\|		IsLoadBarrier \|\| !ImmediateLoadDominator \|\|
CurrentLoadBarrierGroupID == ImmediateLoadDominator \|\|		CurrentLoadBarrierGroupID == ImmediateLoadDominator \|\|
ImmediateLoadDominator <= CurrentStoreGroupID \|\|		ImmediateLoadDominator <= CurrentStoreGroupID \|\|
getGroup(ImmediateLoadDominator).isExecuting();		getGroup(ImmediateLoadDominator).isExecuting();

if (ShouldCreateANewGroup) {		if (ShouldCreateANewGroup) {
unsigned NewGID = createMemoryGroup();		unsigned NewGID = createMemoryGroup();
MemoryGroup &NewGroup = getGroup(NewGID);		MemoryGroup &NewGroup = getGroup(NewGID);
NewGroup.addInstruction();		NewGroup.addInstruction();

// A load may not pass a previous store or store barrier		// A load may not pass a previous store or store barrier
// unless flag 'NoAlias' is set.		// unless flag 'NoAlias' is set.
if (!assumeNoAlias() && CurrentStoreGroupID) {		if (!assumeNoAlias() && CurrentStoreGroupID) {
MemoryGroup &StoreGroup = getGroup(CurrentStoreGroupID);		MemoryGroup &StoreGroup = getGroup(CurrentStoreGroupID);
LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: (" << CurrentStoreGroupID		LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: (" << CurrentStoreGroupID
<< ") --> (" << NewGID << ")\n");		<< ") --> (" << NewGID << ")\n");
StoreGroup.addSuccessor(&NewGroup, true);		StoreGroup.addSuccessor(&NewGroup, true);
}		}

// A load barrier may not pass a previous load or load barrier.		// A load barrier may not pass a previous load or load barrier.
if (IsMemBarrier) {		if (IsLoadBarrier) {
if (ImmediateLoadDominator) {		if (ImmediateLoadDominator) {
MemoryGroup &LoadGroup = getGroup(ImmediateLoadDominator);		MemoryGroup &LoadGroup = getGroup(ImmediateLoadDominator);
LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: ("		LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: ("
<< ImmediateLoadDominator		<< ImmediateLoadDominator
<< ") --> (" << NewGID << ")\n");		<< ") --> (" << NewGID << ")\n");
LoadGroup.addSuccessor(&NewGroup, true);		LoadGroup.addSuccessor(&NewGroup, true);
}		}
} else {		} else {
// A younger load cannot pass a older load barrier.		// A younger load cannot pass a older load barrier.
if (CurrentLoadBarrierGroupID) {		if (CurrentLoadBarrierGroupID) {
MemoryGroup &LoadGroup = getGroup(CurrentLoadBarrierGroupID);		MemoryGroup &LoadGroup = getGroup(CurrentLoadBarrierGroupID);
LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: ("		LLVM_DEBUG(dbgs() << "[LSUnit]: GROUP DEP: ("
<< CurrentLoadBarrierGroupID		<< CurrentLoadBarrierGroupID
<< ") --> (" << NewGID << ")\n");		<< ") --> (" << NewGID << ")\n");
LoadGroup.addSuccessor(&NewGroup, true);		LoadGroup.addSuccessor(&NewGroup, true);
}		}
}		}

CurrentLoadGroupID = NewGID;		CurrentLoadGroupID = NewGID;
if (IsMemBarrier)		if (IsLoadBarrier)
CurrentLoadBarrierGroupID = NewGID;		CurrentLoadBarrierGroupID = NewGID;
return NewGID;		return NewGID;
}		}

// A load may pass a previous load.		// A load may pass a previous load.
MemoryGroup &Group = getGroup(CurrentLoadGroupID);		MemoryGroup &Group = getGroup(CurrentLoadGroupID);
Group.addInstruction();		Group.addInstruction();
return CurrentLoadGroupID;		return CurrentLoadGroupID;
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/lib/Target/X86/CMakeLists.txt

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	add_llvm_target(X86CodeGen ${sources}
CFGuard		CFGuard

ADD_TO_COMPONENT		ADD_TO_COMPONENT
X86		X86
)		)

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
		add_subdirectory(MCA)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)

llvm/lib/Target/X86/MCA/CMakeLists.txt

This file was added.

				add_llvm_component_library(LLVMX86TargetMCA
				X86CustomBehaviour.cpp

				LINK_COMPONENTS
				MC
				MCParser
				X86Desc
				X86Info
				Support
				MCA

				ADD_TO_COMPONENT
				X86
				)

llvm/lib/Target/X86/MCA/X86CustomBehaviour.h

This file was added.

				//===-------------------- X86CustomBehaviour.h -------------------C++ - -===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file defines the X86CustomBehaviour class which inherits from
				/// CustomBehaviour. This class is used by the tool llvm-mca to enforce
				/// target specific behaviour that is not expressed well enough in the
				/// scheduling model for mca to enforce it automatically.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_X86_MCA_X86CUSTOMBEHAVIOUR_H
				#define LLVM_LIB_TARGET_X86_MCA_X86CUSTOMBEHAVIOUR_H

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/MCA/CustomBehaviour.h"
				#include "llvm/Support/TargetParser.h"

				namespace llvm {
				namespace mca {

				class X86InstrPostProcess : public InstrPostProcess {
				void processWaitCnt(std::unique_ptr<Instruction> &Inst, const MCInst &MCI);

				/// Called within X86InstrPostProcess to specify certain instructions
				/// as load and store barriers.
				void setMemBarriers(std::unique_ptr<Instruction> &Inst, const MCInst &MCI);

				public:
				X86InstrPostProcess(const MCSubtargetInfo &STI, const MCInstrInfo &MCII)
				: InstrPostProcess(STI, MCII) {}

				~X86InstrPostProcess() {}

				void postProcessInstruction(std::unique_ptr<Instruction> &Inst,
				const MCInst &MCI) override;
				};

				} // namespace mca
				} // namespace llvm

				#endif

llvm/lib/Target/X86/MCA/X86CustomBehaviour.cpp

This file was added.

				//===------------------- X86CustomBehaviour.cpp ------------------C++ - -===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file implements methods from the X86CustomBehaviour class.
				///
				//===----------------------------------------------------------------------===//

				#include "X86CustomBehaviour.h"
				#include "TargetInfo/X86TargetInfo.h"
				#include "X86InstrInfo.h"
				#include "llvm/MC/TargetRegistry.h"
				#include "llvm/Support/WithColor.h"

				namespace llvm {
				namespace mca {

				void X86InstrPostProcess::setMemBarriers(std::unique_ptr<Instruction> &Inst,
				const MCInst &MCI) {
				switch (MCI.getOpcode()) {
				case X86::MFENCE:
				Inst->setLoadBarrier(true);
				Inst->setStoreBarrier(true);
				break;
				case X86::LFENCE:
				Inst->setLoadBarrier(true);
				break;
				case X86::SFENCE:
				Inst->setStoreBarrier(true);
				break;
				}
				}

				void X86InstrPostProcess::postProcessInstruction(
				std::unique_ptr<Instruction> &Inst, const MCInst &MCI) {
				// Currently, we only modify certain instructions' IsALoadBarrier and
				// IsAStoreBarrier flags.
				setMemBarriers(Inst, MCI);
				}

				} // namespace mca
				} // namespace llvm

				using namespace llvm;
				using namespace mca;

				static InstrPostProcess *createX86InstrPostProcess(const MCSubtargetInfo &STI,
				const MCInstrInfo &MCII) {
				return new X86InstrPostProcess(STI, MCII);
				}

				/// Extern function to initialize the targets for the X86 backend

				extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeX86TargetMCA() {
				TargetRegistry::RegisterInstrPostProcess(getTheX86_32Target(),
				createX86InstrPostProcess);
				TargetRegistry::RegisterInstrPostProcess(getTheX86_64Target(),
				createX86InstrPostProcess);
				}

llvm/test/tools/llvm-mca/AArch64/Cortex/A55-load-store-noalias.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 -timeline --iterations=3 --noalias=true < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 -timeline --iterations=3 --noalias=true < %s \| FileCheck %s

	str x1, [x10]			str x1, [x10]
	str x1, [x10]			str x1, [x10]
	ldr x2, [x10]			ldr x2, [x10]
	nop			nop
	ldr x2, [x10]			ldr x2, [x10]
	ldr x3, [x10]			ldr x3, [x10]

	# CHECK: Iterations: 3			# CHECK: Iterations: 3
	# CHECK-NEXT: Instructions: 18			# CHECK-NEXT: Instructions: 18
	# CHECK-NEXT: Total Cycles: 19			# CHECK-NEXT: Total Cycles: 16
	# CHECK-NEXT: Total uOps: 18			# CHECK-NEXT: Total uOps: 18

	# CHECK: Dispatch Width: 2			# CHECK: Dispatch Width: 2
	# CHECK-NEXT: uOps Per Cycle: 0.95			# CHECK-NEXT: uOps Per Cycle: 1.13
	# CHECK-NEXT: IPC: 0.95			# CHECK-NEXT: IPC: 1.13
	# CHECK-NEXT: Block RThroughput: 3.0			# CHECK-NEXT: Block RThroughput: 3.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 30 Lines
	# CHECK-NEXT: - - - - - - - - - - - 1.00 str x1, [x10]			# CHECK-NEXT: - - - - - - - - - - - 1.00 str x1, [x10]
	# CHECK-NEXT: - - - - - - - - - - - 1.00 str x1, [x10]			# CHECK-NEXT: - - - - - - - - - - - 1.00 str x1, [x10]
	# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr x2, [x10]			# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr x2, [x10]
	# CHECK-NEXT: - - 1.00 - - - - - - - - - nop			# CHECK-NEXT: - - 1.00 - - - - - - - - - nop
	# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr x2, [x10]			# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr x2, [x10]
	# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr x3, [x10]			# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr x3, [x10]

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 012345678			# CHECK-NEXT: 012345
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DE . . . . str x1, [x10]			# CHECK: [0,0] DE . . . str x1, [x10]
	# CHECK-NEXT: [0,1] .DE . . . . str x1, [x10]			# CHECK-NEXT: [0,1] .DE . . . str x1, [x10]
	# CHECK-NEXT: [0,2] .DeeE. . . . ldr x2, [x10]			# CHECK-NEXT: [0,2] .DeeE. . . ldr x2, [x10]
	# CHECK-NEXT: [0,3] . DE. . . . nop			# CHECK-NEXT: [0,3] . DE. . . nop
	# CHECK-NEXT: [0,4] . DeeE . . . ldr x2, [x10]			# CHECK-NEXT: [0,4] . DeeE . . ldr x2, [x10]
	# CHECK-NEXT: [0,5] . DeeE . . . ldr x3, [x10]			# CHECK-NEXT: [0,5] . DeeE . . ldr x3, [x10]
	# CHECK-NEXT: [1,0] . DE . . . str x1, [x10]			# CHECK-NEXT: [1,0] . DE . . str x1, [x10]
	# CHECK-NEXT: [1,1] . .DE . . . str x1, [x10]			# CHECK-NEXT: [1,1] . DE . . str x1, [x10]
	# CHECK-NEXT: [1,2] . .DeeE. . . ldr x2, [x10]			# CHECK-NEXT: [1,2] . DeeE . . ldr x2, [x10]
	# CHECK-NEXT: [1,3] . . DE. . . nop			# CHECK-NEXT: [1,3] . . DE . . nop
	# CHECK-NEXT: [1,4] . . DeeE . . ldr x2, [x10]			# CHECK-NEXT: [1,4] . . DeeE . ldr x2, [x10]
	# CHECK-NEXT: [1,5] . . DeeE . . ldr x3, [x10]			# CHECK-NEXT: [1,5] . . DeeE . ldr x3, [x10]
	# CHECK-NEXT: [2,0] . . DE . . str x1, [x10]			# CHECK-NEXT: [2,0] . . DE. . str x1, [x10]
	# CHECK-NEXT: [2,1] . . .DE . . str x1, [x10]			# CHECK-NEXT: [2,1] . . DE . str x1, [x10]
	# CHECK-NEXT: [2,2] . . .DeeE. . ldr x2, [x10]			# CHECK-NEXT: [2,2] . . DeeE . ldr x2, [x10]
	# CHECK-NEXT: [2,3] . . . DE. . nop			# CHECK-NEXT: [2,3] . . .DE . nop
	# CHECK-NEXT: [2,4] . . . DeeE. ldr x2, [x10]			# CHECK-NEXT: [2,4] . . .DeeE. ldr x2, [x10]
	# CHECK-NEXT: [2,5] . . . DeeE ldr x3, [x10]			# CHECK-NEXT: [2,5] . . . DeeE ldr x3, [x10]

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 0.0 0.0 0.0 str x1, [x10]			# CHECK-NEXT: 0. 3 0.0 0.0 0.0 str x1, [x10]
	# CHECK-NEXT: 1. 3 0.0 0.0 0.0 str x1, [x10]			# CHECK-NEXT: 1. 3 0.0 0.0 0.0 str x1, [x10]
	# CHECK-NEXT: 2. 3 0.0 0.0 0.0 ldr x2, [x10]			# CHECK-NEXT: 2. 3 0.0 0.0 0.0 ldr x2, [x10]
	# CHECK-NEXT: 3. 3 0.0 0.0 0.0 nop			# CHECK-NEXT: 3. 3 0.0 0.0 0.0 nop
	# CHECK-NEXT: 4. 3 0.0 0.0 0.0 ldr x2, [x10]			# CHECK-NEXT: 4. 3 0.0 0.0 0.0 ldr x2, [x10]
	# CHECK-NEXT: 5. 3 0.0 0.0 0.0 ldr x3, [x10]			# CHECK-NEXT: 5. 3 0.0 0.0 0.0 ldr x3, [x10]
	# CHECK-NEXT: 3 0.0 0.0 0.0 <total>			# CHECK-NEXT: 3 0.0 0.0 0.0 <total>

llvm/test/tools/llvm-mca/AMDGPU/gfx9-retireooo.s

	Show All 34 Lines
	v_mov_b32_e32 v26, s26			v_mov_b32_e32 v26, s26
	v_mov_b32_e32 v27, s27			v_mov_b32_e32 v27, s27
	v_mov_b32_e32 v28, s28			v_mov_b32_e32 v28, s28
	v_mov_b32_e32 v29, s29			v_mov_b32_e32 v29, s29
	s_waitcnt vmcnt(0) lgkmcnt(0)			s_waitcnt vmcnt(0) lgkmcnt(0)

	# CHECK: Iterations: 1			# CHECK: Iterations: 1
	# CHECK-NEXT: Instructions: 36			# CHECK-NEXT: Instructions: 36
	# CHECK-NEXT: Total Cycles: 331			# CHECK-NEXT: Total Cycles: 94
	# CHECK-NEXT: Total uOps: 36			# CHECK-NEXT: Total uOps: 36

	# CHECK: Dispatch Width: 1			# CHECK: Dispatch Width: 1
	# CHECK-NEXT: uOps Per Cycle: 0.11			# CHECK-NEXT: uOps Per Cycle: 0.38
	# CHECK-NEXT: IPC: 0.11			# CHECK-NEXT: IPC: 0.38
	# CHECK-NEXT: Block RThroughput: 36.0			# CHECK-NEXT: Block RThroughput: 36.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v25, s25			# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v25, s25
	# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v26, s26			# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v26, s26
	# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v27, s27			# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v27, s27
	# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v28, s28			# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v28, s28
	# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v29, s29			# CHECK-NEXT: - - - - 1.00 - - v_mov_b32_e32 v29, s29
	# CHECK-NEXT: - - - 1.00 - - - s_waitcnt vmcnt(0) lgkmcnt(0)			# CHECK-NEXT: - - - 1.00 - - - s_waitcnt vmcnt(0) lgkmcnt(0)

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0			# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123
	# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 0123456789			# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789

	# CHECK: [0,0] DeeeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . s_load_dwordx2 s[2:3], s[0:1], 0x24			# CHECK: [0,0] DeeeeE . . . . . . . . . . . . . . . . . . s_load_dwordx2 s[2:3], s[0:1], 0x24
	# CHECK-NEXT: [0,1] .DeeeeE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . s_load_dwordx2 s[0:1], s[0:1], 0x2c			# CHECK-NEXT: [0,1] .DeeeeE . . . . . . . . . . . . . . . . . . s_load_dwordx2 s[0:1], s[0:1], 0x2c
	# CHECK-NEXT: [0,2] . .DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . s_waitcnt lgkmcnt(0)			# CHECK-NEXT: [0,2] . .DE . . . . . . . . . . . . . . . . . . s_waitcnt lgkmcnt(0)
	# CHECK-NEXT: [0,3] . . DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v_mov_b32_e32 v0, s2			# CHECK-NEXT: [0,3] . . DE . . . . . . . . . . . . . . . . . . v_mov_b32_e32 v0, s2
	# CHECK-NEXT: [0,4] . . DE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v_mov_b32_e32 v1, s3			# CHECK-NEXT: [0,4] . . DE. . . . . . . . . . . . . . . . . . v_mov_b32_e32 v1, s3
	# CHECK-NEXT: [0,5] . . DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . flat_load_dword v2, v[0:1]			# CHECK-NEXT: [0,5] . . DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeE. . flat_load_dword v2, v[0:1]
	# CHECK-NEXT: [0,6] . . . . . . . . . . . . . . . . . . DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . flat_load_dword v3, v[0:1] offset:8			# CHECK-NEXT: [0,6] . . DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeE . flat_load_dword v3, v[0:1] offset:8
	# CHECK-NEXT: [0,7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeE. . . . . . . . . . . . . . . . . flat_load_dword v4, v[0:1] offset:16			# CHECK-NEXT: [0,7] . . .DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeE . flat_load_dword v4, v[0:1] offset:16
	# CHECK-NEXT: [0,8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeE. flat_load_dword v5, v[0:1] offset:24			# CHECK-NEXT: [0,8] . . . DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeE. flat_load_dword v5, v[0:1] offset:24
	# CHECK-NEXT: [0,9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . v_mov_b32_e32 v0, s0			# CHECK-NEXT: [0,9] . . . DE. . . . . . . . . . . . . . . . . v_mov_b32_e32 v0, s0
	# CHECK-NEXT: [0,10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . . v_mov_b32_e32 v1, s1			# CHECK-NEXT: [0,10] . . . DE . . . . . . . . . . . . . . . . v_mov_b32_e32 v1, s1
	# CHECK-NEXT: [0,11] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . . v_mov_b32_e32 v6, s6			# CHECK-NEXT: [0,11] . . . DE . . . . . . . . . . . . . . . . v_mov_b32_e32 v6, s6
	# CHECK-NEXT: [0,12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . . v_mov_b32_e32 v7, s7			# CHECK-NEXT: [0,12] . . . .DE . . . . . . . . . . . . . . . . v_mov_b32_e32 v7, s7
	# CHECK-NEXT: [0,13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . v_mov_b32_e32 v8, s8			# CHECK-NEXT: [0,13] . . . . DE . . . . . . . . . . . . . . . . v_mov_b32_e32 v8, s8
	# CHECK-NEXT: [0,14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . v_mov_b32_e32 v9, s9			# CHECK-NEXT: [0,14] . . . . DE. . . . . . . . . . . . . . . . v_mov_b32_e32 v9, s9
	# CHECK-NEXT: [0,15] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . . v_mov_b32_e32 v10, s10			# CHECK-NEXT: [0,15] . . . . DE . . . . . . . . . . . . . . . v_mov_b32_e32 v10, s10
	# CHECK-NEXT: [0,16] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . . v_mov_b32_e32 v11, s11			# CHECK-NEXT: [0,16] . . . . DE . . . . . . . . . . . . . . . v_mov_b32_e32 v11, s11
	# CHECK-NEXT: [0,17] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . . v_mov_b32_e32 v12, s12			# CHECK-NEXT: [0,17] . . . . .DE . . . . . . . . . . . . . . . v_mov_b32_e32 v12, s12
	# CHECK-NEXT: [0,18] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . v_mov_b32_e32 v13, s13			# CHECK-NEXT: [0,18] . . . . . DE . . . . . . . . . . . . . . . v_mov_b32_e32 v13, s13
	# CHECK-NEXT: [0,19] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . v_mov_b32_e32 v14, s14			# CHECK-NEXT: [0,19] . . . . . DE. . . . . . . . . . . . . . . v_mov_b32_e32 v14, s14
	# CHECK-NEXT: [0,20] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . . v_mov_b32_e32 v15, s15			# CHECK-NEXT: [0,20] . . . . . DE . . . . . . . . . . . . . . v_mov_b32_e32 v15, s15
	# CHECK-NEXT: [0,21] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . . v_mov_b32_e32 v16, s16			# CHECK-NEXT: [0,21] . . . . . DE . . . . . . . . . . . . . . v_mov_b32_e32 v16, s16
	# CHECK-NEXT: [0,22] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . . v_mov_b32_e32 v17, s17			# CHECK-NEXT: [0,22] . . . . . .DE . . . . . . . . . . . . . . v_mov_b32_e32 v17, s17
	# CHECK-NEXT: [0,23] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . v_mov_b32_e32 v18, s18			# CHECK-NEXT: [0,23] . . . . . . DE . . . . . . . . . . . . . . v_mov_b32_e32 v18, s18
	# CHECK-NEXT: [0,24] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . v_mov_b32_e32 v19, s19			# CHECK-NEXT: [0,24] . . . . . . DE. . . . . . . . . . . . . . v_mov_b32_e32 v19, s19
	# CHECK-NEXT: [0,25] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . . v_mov_b32_e32 v20, s20			# CHECK-NEXT: [0,25] . . . . . . DE . . . . . . . . . . . . . v_mov_b32_e32 v20, s20
	# CHECK-NEXT: [0,26] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . . v_mov_b32_e32 v21, s21			# CHECK-NEXT: [0,26] . . . . . . DE . . . . . . . . . . . . . v_mov_b32_e32 v21, s21
	# CHECK-NEXT: [0,27] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . . v_mov_b32_e32 v22, s22			# CHECK-NEXT: [0,27] . . . . . . .DE . . . . . . . . . . . . . v_mov_b32_e32 v22, s22
	# CHECK-NEXT: [0,28] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . v_mov_b32_e32 v23, s23			# CHECK-NEXT: [0,28] . . . . . . . DE . . . . . . . . . . . . . v_mov_b32_e32 v23, s23
	# CHECK-NEXT: [0,29] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . v_mov_b32_e32 v24, s24			# CHECK-NEXT: [0,29] . . . . . . . DE. . . . . . . . . . . . . v_mov_b32_e32 v24, s24
	# CHECK-NEXT: [0,30] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .DE . . . . . . . . . . . . v_mov_b32_e32 v25, s25			# CHECK-NEXT: [0,30] . . . . . . . DE . . . . . . . . . . . . v_mov_b32_e32 v25, s25
	# CHECK-NEXT: [0,31] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . . v_mov_b32_e32 v26, s26			# CHECK-NEXT: [0,31] . . . . . . . DE . . . . . . . . . . . . v_mov_b32_e32 v26, s26
	# CHECK-NEXT: [0,32] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE. . . . . . . . . . . . v_mov_b32_e32 v27, s27			# CHECK-NEXT: [0,32] . . . . . . . .DE . . . . . . . . . . . . v_mov_b32_e32 v27, s27
	# CHECK-NEXT: [0,33] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . v_mov_b32_e32 v28, s28			# CHECK-NEXT: [0,33] . . . . . . . . DE . . . . . . . . . . . . v_mov_b32_e32 v28, s28
	# CHECK-NEXT: [0,34] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE . . . . . . . . . . . v_mov_b32_e32 v29, s29			# CHECK-NEXT: [0,34] . . . . . . . . DE. . . . . . . . . . . . v_mov_b32_e32 v29, s29
	# CHECK-NEXT: [0,35] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DE s_waitcnt vmcnt(0) lgkmcnt(0)			# CHECK-NEXT: [0,35] . . . . . . . . . . . . . . . . . . . DE s_waitcnt vmcnt(0) lgkmcnt(0)

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	Show All 37 Lines

llvm/test/tools/llvm-mca/X86/Barcelona/store-throughput.s

	Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines
	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movd %mm0, (%rax)			# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movd %mm0, (%rax)
	# CHECK-NEXT: 1. 1 2.0 0.0 0.0 movd %mm1, (%rcx)			# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movd %mm1, (%rcx)
	# CHECK-NEXT: 2. 1 3.0 0.0 0.0 movd %mm2, (%rdx)			# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movd %mm2, (%rdx)
	# CHECK-NEXT: 3. 1 4.0 0.0 0.0 movd %mm3, (%rbx)			# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movd %mm3, (%rbx)
	# CHECK-NEXT: 1 2.5 0.3 0.0 <total>			# CHECK-NEXT: 1 2.5 1.0 0.0 <total>

	# CHECK: [5] Code Region			# CHECK: [5] Code Region

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 400			# CHECK-NEXT: Instructions: 400
	# CHECK-NEXT: Total Cycles: 403			# CHECK-NEXT: Total Cycles: 403
	# CHECK-NEXT: Total uOps: 400			# CHECK-NEXT: Total uOps: 400

	▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s

	Show First 20 Lines • Show All 513 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movq (%rdx), %rsi			# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movq (%rdx), %rsi
	# CHECK-NEXT: 3. 1 3.0 1.0 3.0 movq %rdi, (%rbx)			# CHECK-NEXT: 3. 1 3.0 1.0 3.0 movq %rdi, (%rbx)
	# CHECK-NEXT: 1 1.8 1.3 0.8 <total>			# CHECK-NEXT: 1 1.8 1.3 0.8 <total>

	# CHECK: [4] Code Region			# CHECK: [4] Code Region

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 400			# CHECK-NEXT: Instructions: 400
	# CHECK-NEXT: Total Cycles: 553			# CHECK-NEXT: Total Cycles: 405
	# CHECK-NEXT: Total uOps: 400			# CHECK-NEXT: Total uOps: 400

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 0.72			# CHECK-NEXT: uOps Per Cycle: 0.99
	# CHECK-NEXT: IPC: 0.72			# CHECK-NEXT: IPC: 0.99
	# CHECK-NEXT: Block RThroughput: 4.0			# CHECK-NEXT: Block RThroughput: 4.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	# CHECK-NEXT: [6]: HasSideEffects (U)			# CHECK-NEXT: [6]: HasSideEffects (U)

	# CHECK: [1] [2] [3] [4] [5] [6] Instructions:			# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
	# CHECK-NEXT: 1 2 1.50 * U movd %mm0, (%rax)			# CHECK-NEXT: 1 2 1.50 * U movd %mm0, (%rax)
	# CHECK-NEXT: 1 5 1.50 * movd (%rcx), %mm1			# CHECK-NEXT: 1 5 1.50 * movd (%rcx), %mm1
	# CHECK-NEXT: 1 5 1.50 * movd (%rdx), %mm2			# CHECK-NEXT: 1 5 1.50 * movd (%rdx), %mm2
	# CHECK-NEXT: 1 2 1.50 * U movd %mm3, (%rbx)			# CHECK-NEXT: 1 2 1.50 * U movd %mm3, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 57 (10.3%)			# CHECK-NEXT: SCHEDQ - Scheduler full: 347 (85.7%)
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 432 (78.1%)			# CHECK-NEXT: SQ - Store queue full: 0
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
	# CHECK-NEXT: USH - Uncategorised Structural Hazard: 0			# CHECK-NEXT: USH - Uncategorised Structural Hazard: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 364 (65.8%)			# CHECK-NEXT: 0, 131 (32.3%)
	# CHECK-NEXT: 1, 88 (15.9%)			# CHECK-NEXT: 1, 174 (43.0%)
	# CHECK-NEXT: 2, 4 (0.7%)			# CHECK-NEXT: 2, 87 (21.5%)
	# CHECK-NEXT: 3, 84 (15.2%)			# CHECK-NEXT: 4, 13 (3.2%)
	# CHECK-NEXT: 4, 13 (2.4%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 253 (45.8%)			# CHECK-NEXT: 0, 105 (25.9%)
	# CHECK-NEXT: 1, 200 (36.2%)			# CHECK-NEXT: 1, 200 (49.4%)
	# CHECK-NEXT: 2, 100 (18.1%)			# CHECK-NEXT: 2, 100 (24.7%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 23 40 40			# CHECK-NEXT: PdEX 36 40 40
	# CHECK-NEXT: PdFPU 23 40 64			# CHECK-NEXT: PdFPU 36 40 64
	# CHECK-NEXT: PdLoad 3 22 40			# CHECK-NEXT: PdLoad 20 23 40
	# CHECK-NEXT: PdStore 22 24 24			# CHECK-NEXT: PdStore 19 22 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	# CHECK-NEXT: [3] - PdDiv			# CHECK-NEXT: [3] - PdDiv
	# CHECK-NEXT: [4] - PdEX0			# CHECK-NEXT: [4] - PdEX0
	Show All 17 Lines

	# CHECK: Resource pressure per iteration:			# CHECK: Resource pressure per iteration:
	# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18]			# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18]
	# CHECK-NEXT: 4.00 4.00 - - - - - - - - 3.00 3.00 - 2.00 1.00 1.00 3.00 3.00 - 3.00 3.00 - 2.00			# CHECK-NEXT: 4.00 4.00 - - - - - - - - 3.00 3.00 - 2.00 1.00 1.00 3.00 3.00 - 3.00 3.00 - 2.00

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:			# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
	# CHECK-NEXT: - 1.00 - - - - - - - - - - - 1.00 - - - 3.00 - - - - 1.00 movd %mm0, (%rax)			# CHECK-NEXT: - 1.00 - - - - - - - - - - - 1.00 - - - 3.00 - - - - 1.00 movd %mm0, (%rax)
	# CHECK-NEXT: 1.50 1.50 - - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - movd (%rcx), %mm1			# CHECK-NEXT: 3.00 - - - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - movd (%rcx), %mm1
	# CHECK-NEXT: 1.50 1.50 - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - - movd (%rdx), %mm2			# CHECK-NEXT: - 3.00 - - - - - - - - 3.00 - - - 1.00 - - - - 3.00 - - - movd (%rdx), %mm2
	# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movd %mm3, (%rbx)			# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movd %mm3, (%rbx)

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: Index 012345678			# CHECK-NEXT: Index 012345678

	# CHECK: [0,0] DeeER. . movd %mm0, (%rax)			# CHECK: [0,0] DeeER. . movd %mm0, (%rax)
	# CHECK-NEXT: [0,1] DeeeeeER. movd (%rcx), %mm1			# CHECK-NEXT: [0,1] DeeeeeER. movd (%rcx), %mm1
	# CHECK-NEXT: [0,2] D=eeeeeER movd (%rdx), %mm2			# CHECK-NEXT: [0,2] D=eeeeeER movd (%rdx), %mm2
	# CHECK-NEXT: [0,3] D===eeE-R movd %mm3, (%rbx)			# CHECK-NEXT: [0,3] D===eeE-R movd %mm3, (%rbx)

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movd %mm0, (%rax)			# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movd %mm0, (%rax)
	# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movd (%rcx), %mm1			# CHECK-NEXT: 1. 1 1.0 1.0 0.0 movd (%rcx), %mm1
	# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movd (%rdx), %mm2			# CHECK-NEXT: 2. 1 2.0 2.0 0.0 movd (%rdx), %mm2
	# CHECK-NEXT: 3. 1 4.0 1.0 1.0 movd %mm3, (%rbx)			# CHECK-NEXT: 3. 1 4.0 2.0 1.0 movd %mm3, (%rbx)
	# CHECK-NEXT: 1 2.0 1.3 0.3 <total>			# CHECK-NEXT: 1 2.0 1.5 0.3 <total>

	# CHECK: [5] Code Region			# CHECK: [5] Code Region

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 400			# CHECK-NEXT: Instructions: 400
	# CHECK-NEXT: Total Cycles: 405			# CHECK-NEXT: Total Cycles: 405
	# CHECK-NEXT: Total uOps: 400			# CHECK-NEXT: Total uOps: 400

	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/test/tools/llvm-mca/X86/BdVer2/pr37790.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=bdver2 -lqueue=2 -iterations=2 -resource-pressure=false -timeline -timeline-max-cycles=104 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=bdver2 -lqueue=2 -iterations=2 -resource-pressure=false -timeline -timeline-max-cycles=104 < %s \| FileCheck %s

	int3			int3
	stmxcsr (%rsp)			stmxcsr (%rsp)

	# CHECK: Iterations: 2			# CHECK: Iterations: 2
	# CHECK-NEXT: Instructions: 4			# CHECK-NEXT: Instructions: 4
	# CHECK-NEXT: Total Cycles: 205			# CHECK-NEXT: Total Cycles: 103
	# CHECK-NEXT: Total uOps: 6			# CHECK-NEXT: Total uOps: 6

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 0.03			# CHECK-NEXT: uOps Per Cycle: 0.06
	# CHECK-NEXT: IPC: 0.02			# CHECK-NEXT: IPC: 0.04
	# CHECK-NEXT: Block RThroughput: 18.0			# CHECK-NEXT: Block RThroughput: 18.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	# CHECK-NEXT: [6]: HasSideEffects (U)			# CHECK-NEXT: [6]: HasSideEffects (U)

	# CHECK: [1] [2] [3] [4] [5] [6] Instructions:			# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
	# CHECK-NEXT: 1 100 0.50 * * U int3			# CHECK-NEXT: 1 100 0.50 * * U int3
	# CHECK-NEXT: 2 1 18.00 * U stmxcsr (%rsp)			# CHECK-NEXT: 2 1 18.00 * U stmxcsr (%rsp)

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789			# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789
	# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123			# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 012

	# CHECK: [0,0] DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER. int3			# CHECK: [0,0] DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER int3
	# CHECK-NEXT: [0,1] D====================================================================================================eER stmxcsr (%rsp)			# CHECK-NEXT: [0,1] DeE---------------------------------------------------------------------------------------------------R stmxcsr (%rsp)
				# CHECK-NEXT: [1,0] DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER int3
				# CHECK-NEXT: [1,1] .D=================eE---------------------------------------------------------------------------------R stmxcsr (%rsp)

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 2 51.5 0.5 0.0 int3			# CHECK-NEXT: 0. 2 1.0 0.5 0.0 int3
	# CHECK-NEXT: 1. 2 151.0 0.0 0.0 stmxcsr (%rsp)			# CHECK-NEXT: 1. 2 9.5 9.0 90.0 stmxcsr (%rsp)
	# CHECK-NEXT: 2 101.3 0.3 0.0 <total>			# CHECK-NEXT: 2 5.3 4.8 45.0 <total>

llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s

	Show First 20 Lines • Show All 508 Lines • ▼ Show 20 Lines
	# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movq %rsi, (%rdx)			# CHECK-NEXT: 2. 1 3.0 1.0 0.0 movq %rsi, (%rdx)
	# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movq %rdi, (%rbx)			# CHECK-NEXT: 3. 1 4.0 1.0 0.0 movq %rdi, (%rbx)
	# CHECK-NEXT: 1 2.5 1.0 0.0 <total>			# CHECK-NEXT: 1 2.5 1.0 0.0 <total>

	# CHECK: [4] Code Region			# CHECK: [4] Code Region

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 400			# CHECK-NEXT: Instructions: 400
	# CHECK-NEXT: Total Cycles: 803			# CHECK-NEXT: Total Cycles: 603
	# CHECK-NEXT: Total uOps: 400			# CHECK-NEXT: Total uOps: 400

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 0.50			# CHECK-NEXT: uOps Per Cycle: 0.66
	# CHECK-NEXT: IPC: 0.50			# CHECK-NEXT: IPC: 0.66
	# CHECK-NEXT: Block RThroughput: 6.0			# CHECK-NEXT: Block RThroughput: 6.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	# CHECK-NEXT: [6]: HasSideEffects (U)			# CHECK-NEXT: [6]: HasSideEffects (U)

	# CHECK: [1] [2] [3] [4] [5] [6] Instructions:			# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
	# CHECK-NEXT: 1 2 1.50 * U movd %mm0, (%rax)			# CHECK-NEXT: 1 2 1.50 * U movd %mm0, (%rax)
	# CHECK-NEXT: 1 2 1.50 * U movd %mm1, (%rcx)			# CHECK-NEXT: 1 2 1.50 * U movd %mm1, (%rcx)
	# CHECK-NEXT: 1 2 1.50 * U movd %mm2, (%rdx)			# CHECK-NEXT: 1 2 1.50 * U movd %mm2, (%rdx)
	# CHECK-NEXT: 1 2 1.50 * U movd %mm3, (%rbx)			# CHECK-NEXT: 1 2 1.50 * U movd %mm3, (%rbx)

	# CHECK: Dynamic Dispatch Stall Cycles:			# CHECK: Dynamic Dispatch Stall Cycles:
	# CHECK-NEXT: RAT - Register unavailable: 0			# CHECK-NEXT: RAT - Register unavailable: 0
	# CHECK-NEXT: RCU - Retire tokens unavailable: 0			# CHECK-NEXT: RCU - Retire tokens unavailable: 0
	# CHECK-NEXT: SCHEDQ - Scheduler full: 0			# CHECK-NEXT: SCHEDQ - Scheduler full: 0
	# CHECK-NEXT: LQ - Load queue full: 0			# CHECK-NEXT: LQ - Load queue full: 0
	# CHECK-NEXT: SQ - Store queue full: 748 (93.2%)			# CHECK-NEXT: SQ - Store queue full: 560 (92.9%)
	# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0			# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
	# CHECK-NEXT: USH - Uncategorised Structural Hazard: 0			# CHECK-NEXT: USH - Uncategorised Structural Hazard: 0

	# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:			# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
	# CHECK-NEXT: [# dispatched], [# cycles]			# CHECK-NEXT: [# dispatched], [# cycles]
	# CHECK-NEXT: 0, 422 (52.6%)			# CHECK-NEXT: 0, 222 (36.8%)
	# CHECK-NEXT: 1, 374 (46.6%)			# CHECK-NEXT: 1, 374 (62.0%)
	# CHECK-NEXT: 2, 1 (0.1%)			# CHECK-NEXT: 2, 1 (0.2%)
	# CHECK-NEXT: 4, 6 (0.7%)			# CHECK-NEXT: 4, 6 (1.0%)

	# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:			# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
	# CHECK-NEXT: [# issued], [# cycles]			# CHECK-NEXT: [# issued], [# cycles]
	# CHECK-NEXT: 0, 403 (50.2%)			# CHECK-NEXT: 0, 203 (33.7%)
	# CHECK-NEXT: 1, 400 (49.8%)			# CHECK-NEXT: 1, 400 (66.3%)

	# CHECK: Scheduler's queue usage:			# CHECK: Scheduler's queue usage:
	# CHECK-NEXT: [1] Resource name.			# CHECK-NEXT: [1] Resource name.
	# CHECK-NEXT: [2] Average number of used buffer entries.			# CHECK-NEXT: [2] Average number of used buffer entries.
	# CHECK-NEXT: [3] Maximum number of used buffer entries.			# CHECK-NEXT: [3] Maximum number of used buffer entries.
	# CHECK-NEXT: [4] Total number of buffer entries.			# CHECK-NEXT: [4] Total number of buffer entries.

	# CHECK: [1] [2] [3] [4]			# CHECK: [1] [2] [3] [4]
	# CHECK-NEXT: PdEX 21 23 40			# CHECK-NEXT: PdEX 21 22 40
	# CHECK-NEXT: PdFPU 21 23 64			# CHECK-NEXT: PdFPU 21 22 64
	# CHECK-NEXT: PdLoad 0 0 40			# CHECK-NEXT: PdLoad 0 0 40
	# CHECK-NEXT: PdStore 22 24 24			# CHECK-NEXT: PdStore 22 24 24

	# CHECK: Resources:			# CHECK: Resources:
	# CHECK-NEXT: [0.0] - PdAGLU01			# CHECK-NEXT: [0.0] - PdAGLU01
	# CHECK-NEXT: [0.1] - PdAGLU01			# CHECK-NEXT: [0.1] - PdAGLU01
	# CHECK-NEXT: [1] - PdBranch			# CHECK-NEXT: [1] - PdBranch
	# CHECK-NEXT: [2] - PdCount			# CHECK-NEXT: [2] - PdCount
	Show All 24 Lines
	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:			# CHECK-NEXT: [0.0] [0.1] [1] [2] [3] [4] [5] [6] [7.0] [7.1] [8.0] [8.1] [9] [10] [11] [12] [13] [14] [15] [16.0] [16.1] [17] [18] Instructions:
	# CHECK-NEXT: - 1.00 - - - - - - - - - - - 1.00 - - - 3.00 - - - - 1.00 movd %mm0, (%rax)			# CHECK-NEXT: - 1.00 - - - - - - - - - - - 1.00 - - - 3.00 - - - - 1.00 movd %mm0, (%rax)
	# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movd %mm1, (%rcx)			# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movd %mm1, (%rcx)
	# CHECK-NEXT: - 1.00 - - - - - - - - - - - 1.00 - - - 3.00 - - - - 1.00 movd %mm2, (%rdx)			# CHECK-NEXT: - 1.00 - - - - - - - - - - - 1.00 - - - 3.00 - - - - 1.00 movd %mm2, (%rdx)
	# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movd %mm3, (%rbx)			# CHECK-NEXT: 1.00 - - - - - - - - - - - - 1.00 - - 3.00 - - - - - 1.00 movd %mm3, (%rbx)

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 0			# CHECK-NEXT: Index 012345678
	# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeER. . movd %mm0, (%rax)			# CHECK: [0,0] DeeER. . movd %mm0, (%rax)
	# CHECK-NEXT: [0,1] D==eeER . movd %mm1, (%rcx)			# CHECK-NEXT: [0,1] D=eeER . movd %mm1, (%rcx)
	# CHECK-NEXT: [0,2] D====eeER . movd %mm2, (%rdx)			# CHECK-NEXT: [0,2] D===eeER. movd %mm2, (%rdx)
	# CHECK-NEXT: [0,3] D======eeER movd %mm3, (%rbx)			# CHECK-NEXT: [0,3] D====eeER movd %mm3, (%rbx)

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movd %mm0, (%rax)			# CHECK-NEXT: 0. 1 1.0 1.0 0.0 movd %mm0, (%rax)
	# CHECK-NEXT: 1. 1 3.0 0.0 0.0 movd %mm1, (%rcx)			# CHECK-NEXT: 1. 1 2.0 1.0 0.0 movd %mm1, (%rcx)
	# CHECK-NEXT: 2. 1 5.0 0.0 0.0 movd %mm2, (%rdx)			# CHECK-NEXT: 2. 1 4.0 2.0 0.0 movd %mm2, (%rdx)
	# CHECK-NEXT: 3. 1 7.0 0.0 0.0 movd %mm3, (%rbx)			# CHECK-NEXT: 3. 1 5.0 1.0 0.0 movd %mm3, (%rbx)
	# CHECK-NEXT: 1 4.0 0.3 0.0 <total>			# CHECK-NEXT: 1 3.0 1.3 0.0 <total>

	# CHECK: [5] Code Region			# CHECK: [5] Code Region

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 400			# CHECK-NEXT: Instructions: 400
	# CHECK-NEXT: Total Cycles: 602			# CHECK-NEXT: Total Cycles: 602
	# CHECK-NEXT: Total uOps: 400			# CHECK-NEXT: Total uOps: 400

	▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

llvm/test/tools/llvm-mca/X86/BtVer2/pr37790.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -lqueue=2 -iterations=2 -resource-pressure=false -timeline -timeline-max-cycles=104 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -lqueue=2 -iterations=2 -resource-pressure=false -timeline -timeline-max-cycles=104 < %s \| FileCheck %s

	int3			int3
	stmxcsr (%rsp)			stmxcsr (%rsp)

	# CHECK: Iterations: 2			# CHECK: Iterations: 2
	# CHECK-NEXT: Instructions: 4			# CHECK-NEXT: Instructions: 4
	# CHECK-NEXT: Total Cycles: 205			# CHECK-NEXT: Total Cycles: 104
	# CHECK-NEXT: Total uOps: 4			# CHECK-NEXT: Total uOps: 4

	# CHECK: Dispatch Width: 2			# CHECK: Dispatch Width: 2
	# CHECK-NEXT: uOps Per Cycle: 0.02			# CHECK-NEXT: uOps Per Cycle: 0.04
	# CHECK-NEXT: IPC: 0.02			# CHECK-NEXT: IPC: 0.04
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	# CHECK-NEXT: [6]: HasSideEffects (U)			# CHECK-NEXT: [6]: HasSideEffects (U)

	# CHECK: [1] [2] [3] [4] [5] [6] Instructions:			# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
	# CHECK-NEXT: 1 100 0.50 * * U int3			# CHECK-NEXT: 1 100 0.50 * * U int3
	# CHECK-NEXT: 1 1 1.00 * U stmxcsr (%rsp)			# CHECK-NEXT: 1 1 1.00 * U stmxcsr (%rsp)

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789			# CHECK-NEXT: 0123456789 0123456789 0123456789 0123456789 0123456789
	# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123			# CHECK-NEXT: Index 0123456789 0123456789 0123456789 0123456789 0123456789 0123

	# CHECK: [0,0] DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER. int3			# CHECK: [0,0] DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER. int3
	# CHECK-NEXT: [0,1] D====================================================================================================eER stmxcsr (%rsp)			# CHECK-NEXT: [0,1] DeE---------------------------------------------------------------------------------------------------R. stmxcsr (%rsp)
				# CHECK-NEXT: [1,0] .DeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeER int3
				# CHECK-NEXT: [1,1] .DeE---------------------------------------------------------------------------------------------------R stmxcsr (%rsp)

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 2 51.0 0.5 0.0 int3			# CHECK-NEXT: 0. 2 1.0 1.0 0.0 int3
	# CHECK-NEXT: 1. 2 151.0 0.0 0.0 stmxcsr (%rsp)			# CHECK-NEXT: 1. 2 1.0 0.0 99.0 stmxcsr (%rsp)
	# CHECK-NEXT: 2 101.0 0.3 0.0 <total>			# CHECK-NEXT: 2 1.0 0.5 49.5 <total>

llvm/test/tools/llvm-mca/X86/BtVer2/stmxcsr-ldmxcsr.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -timeline -timeline-max-iterations=3 < %s \| FileCheck %s

	# Code snippet taken from PR48024.			# Code snippet taken from PR48024.

	stmxcsr -4(%rsp)			stmxcsr -4(%rsp)
	movl $-24577, %eax # imm = 0x9FFF			movl $-24577, %eax # imm = 0x9FFF
	andl -4(%rsp), %eax			andl -4(%rsp), %eax
	movl %eax, -8(%rsp)			movl %eax, -8(%rsp)
	ldmxcsr -8(%rsp)			ldmxcsr -8(%rsp)
	retq			retq

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 600			# CHECK-NEXT: Instructions: 600
	# CHECK-NEXT: Total Cycles: 704			# CHECK-NEXT: Total Cycles: 308
	# CHECK-NEXT: Total uOps: 600			# CHECK-NEXT: Total uOps: 600

	# CHECK: Dispatch Width: 2			# CHECK: Dispatch Width: 2
	# CHECK-NEXT: uOps Per Cycle: 0.85			# CHECK-NEXT: uOps Per Cycle: 1.95
	# CHECK-NEXT: IPC: 0.85			# CHECK-NEXT: IPC: 1.95
	# CHECK-NEXT: Block RThroughput: 3.0			# CHECK-NEXT: Block RThroughput: 3.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 32 Lines
	# CHECK-NEXT: - - - - - - - - - 1.00 - - - - stmxcsr -4(%rsp)			# CHECK-NEXT: - - - - - - - - - 1.00 - - - - stmxcsr -4(%rsp)
	# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - movl $-24577, %eax			# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - - - movl $-24577, %eax
	# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - - - - - - andl -4(%rsp), %eax			# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - - - - - - andl -4(%rsp), %eax
	# CHECK-NEXT: - - - - - - - - - 1.00 - - - - movl %eax, -8(%rsp)			# CHECK-NEXT: - - - - - - - - - 1.00 - - - - movl %eax, -8(%rsp)
	# CHECK-NEXT: - - - - - - - 1.00 - - - - - - ldmxcsr -8(%rsp)			# CHECK-NEXT: - - - - - - - 1.00 - - - - - - ldmxcsr -8(%rsp)
	# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - - - - - - retq			# CHECK-NEXT: 0.50 0.50 - - - - - 1.00 - - - - - - retq

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 0123456789			# CHECK-NEXT: 0123456
	# CHECK-NEXT: Index 0123456789 01234			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeER . . . . . stmxcsr -4(%rsp)			# CHECK: [0,0] DeER . . .. stmxcsr -4(%rsp)
	# CHECK-NEXT: [0,1] DeER . . . . . movl $-24577, %eax			# CHECK-NEXT: [0,1] DeER . . .. movl $-24577, %eax
	# CHECK-NEXT: [0,2] .DeeeeER . . . . andl -4(%rsp), %eax			# CHECK-NEXT: [0,2] .DeeeeER . .. andl -4(%rsp), %eax
	# CHECK-NEXT: [0,3] .D====eER . . . . movl %eax, -8(%rsp)			# CHECK-NEXT: [0,3] .D====eER . .. movl %eax, -8(%rsp)
	# CHECK-NEXT: [0,4] . D===eeeER . . . ldmxcsr -8(%rsp)			# CHECK-NEXT: [0,4] . D===eeeER .. ldmxcsr -8(%rsp)
	# CHECK-NEXT: [0,5] . DeeeeE--R . . . retq			# CHECK-NEXT: [0,5] . DeeeeE--R .. retq
	# CHECK-NEXT: [1,0] . D=====eER . . . stmxcsr -4(%rsp)			# CHECK-NEXT: [1,0] . D===eE--R .. stmxcsr -4(%rsp)
	# CHECK-NEXT: [1,1] . DeE-----R . . . movl $-24577, %eax			# CHECK-NEXT: [1,1] . DeE-----R .. movl $-24577, %eax
	# CHECK-NEXT: [1,2] . D====eeeeER. . . andl -4(%rsp), %eax			# CHECK-NEXT: [1,2] . DeeeeE--R .. andl -4(%rsp), %eax
	# CHECK-NEXT: [1,3] . D========eER . . movl %eax, -8(%rsp)			# CHECK-NEXT: [1,3] . D====eE-R .. movl %eax, -8(%rsp)
	# CHECK-NEXT: [1,4] . D=======eeeER . . ldmxcsr -8(%rsp)			# CHECK-NEXT: [1,4] . D===eeeER .. ldmxcsr -8(%rsp)
	# CHECK-NEXT: [1,5] . D=eeeeE-----R . . retq			# CHECK-NEXT: [1,5] . D=eeeeE-R .. retq
	# CHECK-NEXT: [2,0] . .D=========eER . . stmxcsr -4(%rsp)			# CHECK-NEXT: [2,0] . .D===eE--R.. stmxcsr -4(%rsp)
	# CHECK-NEXT: [2,1] . .DeE---------R . . movl $-24577, %eax			# CHECK-NEXT: [2,1] . .DeE-----R.. movl $-24577, %eax
	# CHECK-NEXT: [2,2] . . D========eeeeER . andl -4(%rsp), %eax			# CHECK-NEXT: [2,2] . . DeeeeE--R. andl -4(%rsp), %eax
	# CHECK-NEXT: [2,3] . . D============eER . movl %eax, -8(%rsp)			# CHECK-NEXT: [2,3] . . D====eE-R. movl %eax, -8(%rsp)
	# CHECK-NEXT: [2,4] . . D===========eeeER ldmxcsr -8(%rsp)			# CHECK-NEXT: [2,4] . . D===eeeER ldmxcsr -8(%rsp)
	# CHECK-NEXT: [2,5] . . D=eeeeE---------R retq			# CHECK-NEXT: [2,5] . . D=eeeeE-R retq

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 5.7 0.3 0.0 stmxcsr -4(%rsp)			# CHECK-NEXT: 0. 3 3.0 1.0 1.3 stmxcsr -4(%rsp)
	# CHECK-NEXT: 1. 3 1.0 1.0 4.7 movl $-24577, %eax			# CHECK-NEXT: 1. 3 1.0 1.0 3.3 movl $-24577, %eax
	# CHECK-NEXT: 2. 3 5.0 0.3 0.0 andl -4(%rsp), %eax			# CHECK-NEXT: 2. 3 1.0 1.0 1.3 andl -4(%rsp), %eax
	# CHECK-NEXT: 3. 3 9.0 0.0 0.0 movl %eax, -8(%rsp)			# CHECK-NEXT: 3. 3 5.0 0.0 0.7 movl %eax, -8(%rsp)
	# CHECK-NEXT: 4. 3 8.0 0.0 0.0 ldmxcsr -8(%rsp)			# CHECK-NEXT: 4. 3 4.0 0.0 0.0 ldmxcsr -8(%rsp)
	# CHECK-NEXT: 5. 3 1.7 1.7 5.3 retq			# CHECK-NEXT: 5. 3 1.7 1.7 1.3 retq
	# CHECK-NEXT: 3 5.1 0.6 1.7 <total>			# CHECK-NEXT: 3 2.6 0.8 1.3 <total>

llvm/test/tools/llvm-mca/X86/Haswell/reserved-resources.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -iterations=100 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -iterations=100 < %s \| FileCheck %s

	fxrstor (%rsp)			fxrstor (%rsp)

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 100			# CHECK-NEXT: Instructions: 100
	# CHECK-NEXT: Total Cycles: 6403			# CHECK-NEXT: Total Cycles: 4720
	# CHECK-NEXT: Total uOps: 9000			# CHECK-NEXT: Total uOps: 9000

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 1.41			# CHECK-NEXT: uOps Per Cycle: 1.91
	# CHECK-NEXT: IPC: 0.02			# CHECK-NEXT: IPC: 0.02
	# CHECK-NEXT: Block RThroughput: 22.5			# CHECK-NEXT: Block RThroughput: 22.5

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	Show All 25 Lines

llvm/test/tools/llvm-mca/X86/Haswell/stmxcsr-ldmxcsr.s

	# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
	# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -timeline -timeline-max-iterations=3 < %s \| FileCheck %s			# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=haswell -timeline -timeline-max-iterations=3 < %s \| FileCheck %s

	# Code snippet taken from PR48024.			# Code snippet taken from PR48024.

	stmxcsr -4(%rsp)			stmxcsr -4(%rsp)
	movl $-24577, %eax # imm = 0x9FFF			movl $-24577, %eax # imm = 0x9FFF
	andl -4(%rsp), %eax			andl -4(%rsp), %eax
	movl %eax, -8(%rsp)			movl %eax, -8(%rsp)
	ldmxcsr -8(%rsp)			ldmxcsr -8(%rsp)
	retq			retq

	# CHECK: Iterations: 100			# CHECK: Iterations: 100
	# CHECK-NEXT: Instructions: 600			# CHECK-NEXT: Instructions: 600
	# CHECK-NEXT: Total Cycles: 1304			# CHECK-NEXT: Total Cycles: 413
	# CHECK-NEXT: Total uOps: 1300			# CHECK-NEXT: Total uOps: 1300

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 1.00			# CHECK-NEXT: uOps Per Cycle: 3.15
	# CHECK-NEXT: IPC: 0.46			# CHECK-NEXT: IPC: 1.45
	# CHECK-NEXT: Block RThroughput: 3.3			# CHECK-NEXT: Block RThroughput: 3.3

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 16 Lines
	# CHECK-NEXT: [5] - HWPort3			# CHECK-NEXT: [5] - HWPort3
	# CHECK-NEXT: [6] - HWPort4			# CHECK-NEXT: [6] - HWPort4
	# CHECK-NEXT: [7] - HWPort5			# CHECK-NEXT: [7] - HWPort5
	# CHECK-NEXT: [8] - HWPort6			# CHECK-NEXT: [8] - HWPort6
	# CHECK-NEXT: [9] - HWPort7			# CHECK-NEXT: [9] - HWPort7

	# CHECK: Resource pressure per iteration:			# CHECK: Resource pressure per iteration:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
	# CHECK-NEXT: - - 1.74 1.74 1.67 1.68 2.00 1.74 1.78 1.65			# CHECK-NEXT: - - 1.99 1.50 1.66 1.67 2.00 1.52 1.99 1.67

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:
	# CHECK-NEXT: - - - - 0.30 - 1.00 1.00 - 0.70 stmxcsr -4(%rsp)			# CHECK-NEXT: - - - - 0.16 - 1.00 1.00 - 0.84 stmxcsr -4(%rsp)
	# CHECK-NEXT: - - 0.03 0.53 - - - 0.23 0.21 - movl $-24577, %eax			# CHECK-NEXT: - - 0.49 0.49 - - - 0.01 0.01 - movl $-24577, %eax
	# CHECK-NEXT: - - 0.22 0.58 0.35 0.65 - - 0.20 - andl -4(%rsp), %eax			# CHECK-NEXT: - - 0.49 0.02 0.49 0.51 - 0.01 0.48 - andl -4(%rsp), %eax
	# CHECK-NEXT: - - - - 0.05 - 1.00 - - 0.95 movl %eax, -8(%rsp)			# CHECK-NEXT: - - - - 0.17 - 1.00 - - 0.83 movl %eax, -8(%rsp)
	# CHECK-NEXT: - - 1.00 0.21 0.34 0.66 - 0.42 0.37 - ldmxcsr -8(%rsp)			# CHECK-NEXT: - - 1.00 0.01 0.33 0.67 - 0.49 0.50 - ldmxcsr -8(%rsp)
	# CHECK-NEXT: - - 0.49 0.42 0.63 0.37 - 0.09 1.00 - retq			# CHECK-NEXT: - - 0.01 0.98 0.51 0.49 - 0.01 1.00 - retq

	# CHECK: Timeline view:			# CHECK: Timeline view:
	# CHECK-NEXT: 0123456789 0123456789			# CHECK-NEXT: 0123456789
	# CHECK-NEXT: Index 0123456789 0123456789 012			# CHECK-NEXT: Index 0123456789 01234

	# CHECK: [0,0] DeeER. . . . . . . . . stmxcsr -4(%rsp)			# CHECK: [0,0] DeeER. . . . . stmxcsr -4(%rsp)
	# CHECK-NEXT: [0,1] DeE-R. . . . . . . . . movl $-24577, %eax			# CHECK-NEXT: [0,1] DeE-R. . . . . movl $-24577, %eax
	# CHECK-NEXT: [0,2] .DeeeeeeER. . . . . . . . andl -4(%rsp), %eax			# CHECK-NEXT: [0,2] .DeeeeeeER. . . . andl -4(%rsp), %eax
	# CHECK-NEXT: [0,3] .D======eER . . . . . . . movl %eax, -8(%rsp)			# CHECK-NEXT: [0,3] .D======eER . . . movl %eax, -8(%rsp)
	# CHECK-NEXT: [0,4] . D=====eeeeeeeER . . . . . . ldmxcsr -8(%rsp)			# CHECK-NEXT: [0,4] . D=====eeeeeeeER . . ldmxcsr -8(%rsp)
	# CHECK-NEXT: [0,5] . DeeeeeeeE----R . . . . . . retq			# CHECK-NEXT: [0,5] . DeeeeeeeE----R . . retq
	# CHECK-NEXT: [1,0] . D==========eeER . . . . . . stmxcsr -4(%rsp)			# CHECK-NEXT: [1,0] . D====eeE----R . . stmxcsr -4(%rsp)
	# CHECK-NEXT: [1,1] . DeE-----------R . . . . . . movl $-24577, %eax			# CHECK-NEXT: [1,1] . DeE---------R . . movl $-24577, %eax
	# CHECK-NEXT: [1,2] . D=========eeeeeeER . . . . . andl -4(%rsp), %eax			# CHECK-NEXT: [1,2] . DeeeeeeE---R . . andl -4(%rsp), %eax
	# CHECK-NEXT: [1,3] . D===============eER . . . . . movl %eax, -8(%rsp)			# CHECK-NEXT: [1,3] . D======eE--R . . movl %eax, -8(%rsp)
	# CHECK-NEXT: [1,4] . .D==============eeeeeeeER. . . . ldmxcsr -8(%rsp)			# CHECK-NEXT: [1,4] . .D=====eeeeeeeER . ldmxcsr -8(%rsp)
	# CHECK-NEXT: [1,5] . . DeeeeeeeE-------------R. . . . retq			# CHECK-NEXT: [1,5] . . D=eeeeeeeE---R . retq
	# CHECK-NEXT: [2,0] . . D===================eeER . . . stmxcsr -4(%rsp)			# CHECK-NEXT: [2,0] . . D====eeE----R . stmxcsr -4(%rsp)
	# CHECK-NEXT: [2,1] . . DeE--------------------R . . . movl $-24577, %eax			# CHECK-NEXT: [2,1] . . DeE---------R . movl $-24577, %eax
	# CHECK-NEXT: [2,2] . . D==================eeeeeeER . . andl -4(%rsp), %eax			# CHECK-NEXT: [2,2] . . DeeeeeeE---R . andl -4(%rsp), %eax
	# CHECK-NEXT: [2,3] . . D========================eER . . movl %eax, -8(%rsp)			# CHECK-NEXT: [2,3] . . D======eE--R . movl %eax, -8(%rsp)
	# CHECK-NEXT: [2,4] . . D=======================eeeeeeeER ldmxcsr -8(%rsp)			# CHECK-NEXT: [2,4] . . D=====eeeeeeeER ldmxcsr -8(%rsp)
	# CHECK-NEXT: [2,5] . . .DeeeeeeeE----------------------R retq			# CHECK-NEXT: [2,5] . . .DeeeeeeeE----R retq

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 3 10.7 0.3 0.0 stmxcsr -4(%rsp)			# CHECK-NEXT: 0. 3 3.7 1.0 2.7 stmxcsr -4(%rsp)
	# CHECK-NEXT: 1. 3 1.0 1.0 10.7 movl $-24577, %eax			# CHECK-NEXT: 1. 3 1.0 1.0 6.3 movl $-24577, %eax
	# CHECK-NEXT: 2. 3 10.0 0.3 0.0 andl -4(%rsp), %eax			# CHECK-NEXT: 2. 3 1.0 1.0 2.0 andl -4(%rsp), %eax
	# CHECK-NEXT: 3. 3 16.0 0.0 0.0 movl %eax, -8(%rsp)			# CHECK-NEXT: 3. 3 7.0 0.0 1.3 movl %eax, -8(%rsp)
	# CHECK-NEXT: 4. 3 15.0 0.0 0.0 ldmxcsr -8(%rsp)			# CHECK-NEXT: 4. 3 6.0 0.0 0.0 ldmxcsr -8(%rsp)
	# CHECK-NEXT: 5. 3 1.0 1.0 13.0 retq			# CHECK-NEXT: 5. 3 1.3 1.3 3.7 retq
	# CHECK-NEXT: 3 8.9 0.4 3.9 <total>			# CHECK-NEXT: 3 3.3 0.7 2.7 <total>

llvm/test/tools/llvm-mca/X86/barrier_output.s

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
				# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -iterations=1 -resource-pressure=false -summary-view=false -show-barriers < %s \| FileCheck %s

				clflush (%rax)
				lfence
				mfence
				sfence
				maskmovdqu %xmm0, %xmm1

				# CHECK: Instruction Info:
				# CHECK-NEXT: [1]: #uOps
				# CHECK-NEXT: [2]: Latency
				# CHECK-NEXT: [3]: RThroughput
				# CHECK-NEXT: [4]: MayLoad
				# CHECK-NEXT: [5]: MayStore
				# CHECK-NEXT: [6]: HasSideEffects (U)
				# CHECK-NEXT: [7]: LoadBarrier
				andreadbUnsubmitted Not Done Reply Inline Actions You also need to test SFENCE here. You can also get rid of most instructions here. Personally, I would only leave SFENCE MFENCE LFENCE and any other instructions with "unmodeled side effects". So, CLFLUSH, LFENCE, MFENCE and SFENCE are fine. All other SSE instructions can be removed in my opinion andreadb: You also need to test SFENCE here. You can also get rid of most instructions here. Personally…
				# CHECK-NEXT: [8]: StoreBarrier

				# CHECK: [1] [2] [3] [4] [5] [6] [7] [8] Instructions:
				# CHECK-NEXT: 4 5 1.00 * * U clflush (%rax)
				# CHECK-NEXT: 1 1 1.00 * * U * lfence
				# CHECK-NEXT: 1 1 1.00 * * U * * mfence
				# CHECK-NEXT: 1 1 1.00 * * U * sfence
				# CHECK-NEXT: 1 1 1.00 * * U maskmovdqu %xmm0, %xmm1

llvm/tools/llvm-mca/Views/InstructionInfoView.h

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	namespace llvm {			namespace llvm {
	namespace mca {			namespace mca {

	/// A view that prints out generic instruction information.			/// A view that prints out generic instruction information.
	class InstructionInfoView : public InstructionView {			class InstructionInfoView : public InstructionView {
	const llvm::MCInstrInfo &MCII;			const llvm::MCInstrInfo &MCII;
	CodeEmitter &CE;			CodeEmitter &CE;
	bool PrintEncodings;			bool PrintEncodings;
				bool PrintBarriers;
				using UniqueInst = std::unique_ptr<Instruction>;
				andreadbUnsubmitted Not Done Reply Inline Actions Can this be something like this? using UniqueInst = std::unique_ptr<mca::Instruction>; ArrayRef<UniqueInst> LoweredInsts; andreadb: Can this be something like this? ``` using UniqueInst = std::unique_ptr<mca::Instruction>…
				ArrayRef<UniqueInst> LoweredInsts;

	struct InstructionInfoViewData {			struct InstructionInfoViewData {
	unsigned NumMicroOpcodes = 0;			unsigned NumMicroOpcodes = 0;
	unsigned Latency = 0;			unsigned Latency = 0;
	Optional<double> RThroughput = 0.0;			Optional<double> RThroughput = 0.0;
	bool mayLoad = false;			bool mayLoad = false;
	bool mayStore = false;			bool mayStore = false;
	bool hasUnmodeledSideEffects = false;			bool hasUnmodeledSideEffects = false;
	};			};
	using IIVDVec = SmallVector<InstructionInfoViewData, 16>;			using IIVDVec = SmallVector<InstructionInfoViewData, 16>;

	/// Place the data into the array of InstructionInfoViewData IIVD.			/// Place the data into the array of InstructionInfoViewData IIVD.
	void collectData(MutableArrayRef<InstructionInfoViewData> IIVD) const;			void collectData(MutableArrayRef<InstructionInfoViewData> IIVD) const;

	public:			public:
	InstructionInfoView(const llvm::MCSubtargetInfo &ST,			InstructionInfoView(const llvm::MCSubtargetInfo &ST,
	const llvm::MCInstrInfo &II, CodeEmitter &C,			const llvm::MCInstrInfo &II, CodeEmitter &C,
	bool ShouldPrintEncodings, llvm::ArrayRef<llvm::MCInst> S,			bool ShouldPrintEncodings, llvm::ArrayRef<llvm::MCInst> S,
	llvm::MCInstPrinter &IP)			llvm::MCInstPrinter &IP,
				ArrayRef<UniqueInst> LoweredInsts,
				bool ShouldPrintBarriers)
	: InstructionView(ST, IP, S), MCII(II), CE(C),			: InstructionView(ST, IP, S), MCII(II), CE(C),
	PrintEncodings(ShouldPrintEncodings) {}			PrintEncodings(ShouldPrintEncodings),
				PrintBarriers(ShouldPrintBarriers), LoweredInsts(LoweredInsts) {}

	void printView(llvm::raw_ostream &OS) const override;			void printView(llvm::raw_ostream &OS) const override;
	StringRef getNameAsString() const override { return "InstructionInfoView"; }			StringRef getNameAsString() const override { return "InstructionInfoView"; }
	json::Value toJSON() const override;			json::Value toJSON() const override;
	json::Object toJSON(const InstructionInfoViewData &IIVD) const;			json::Object toJSON(const InstructionInfoViewData &IIVD) const;
	};			};
	} // namespace mca			} // namespace mca
	} // namespace llvm			} // namespace llvm

	#endif			#endif

llvm/tools/llvm-mca/Views/InstructionInfoView.cpp

Show All 26 Lines	if (!Source.size())
return;		return;

IIVDVec IIVD(Source.size());		IIVDVec IIVD(Source.size());
collectData(IIVD);		collectData(IIVD);

TempStream << "\n\nInstruction Info:\n";		TempStream << "\n\nInstruction Info:\n";
TempStream << "[1]: #uOps\n[2]: Latency\n[3]: RThroughput\n"		TempStream << "[1]: #uOps\n[2]: Latency\n[3]: RThroughput\n"
<< "[4]: MayLoad\n[5]: MayStore\n[6]: HasSideEffects (U)\n";		<< "[4]: MayLoad\n[5]: MayStore\n[6]: HasSideEffects (U)\n";
		if (PrintBarriers) {
		TempStream << "[7]: LoadBarrier\n[8]: StoreBarrier\n";
		}
if (PrintEncodings) {		if (PrintEncodings) {
		if (PrintBarriers) {
		TempStream << "[9]: Encoding Size\n";
		TempStream << "\n[1] [2] [3] [4] [5] [6] [7] [8] "
		<< "[9] Encodings: Instructions:\n";
		} else {
TempStream << "[7]: Encoding Size\n";		TempStream << "[7]: Encoding Size\n";
TempStream << "\n[1] [2] [3] [4] [5] [6] [7] "		TempStream << "\n[1] [2] [3] [4] [5] [6] [7] "
<< "Encodings: Instructions:\n";		<< "Encodings: Instructions:\n";
		}
		} else {
		if (PrintBarriers) {
		TempStream << "\n[1] [2] [3] [4] [5] [6] [7] [8] "
		<< "Instructions:\n";
} else {		} else {
TempStream << "\n[1] [2] [3] [4] [5] [6] Instructions:\n";		TempStream << "\n[1] [2] [3] [4] [5] [6] "
		<< "Instructions:\n";
		}
}		}

		int Index = 0;
for (const auto &I : enumerate(zip(IIVD, Source))) {		for (const auto &I : enumerate(zip(IIVD, Source))) {
const InstructionInfoViewData &IIVDEntry = std::get<0>(I.value());		const InstructionInfoViewData &IIVDEntry = std::get<0>(I.value());

TempStream << ' ' << IIVDEntry.NumMicroOpcodes << " ";		TempStream << ' ' << IIVDEntry.NumMicroOpcodes << " ";
if (IIVDEntry.NumMicroOpcodes < 10)		if (IIVDEntry.NumMicroOpcodes < 10)
TempStream << " ";		TempStream << " ";
else if (IIVDEntry.NumMicroOpcodes < 100)		else if (IIVDEntry.NumMicroOpcodes < 100)
TempStream << ' ';		TempStream << ' ';
Show All 12 Lines	if (IIVDEntry.RThroughput.hasValue()) {
TempStream << ' ';		TempStream << ' ';
} else {		} else {
TempStream << " - ";		TempStream << " - ";
}		}
TempStream << (IIVDEntry.mayLoad ? " * " : " ");		TempStream << (IIVDEntry.mayLoad ? " * " : " ");
TempStream << (IIVDEntry.mayStore ? " * " : " ");		TempStream << (IIVDEntry.mayStore ? " * " : " ");
TempStream << (IIVDEntry.hasUnmodeledSideEffects ? " U " : " ");		TempStream << (IIVDEntry.hasUnmodeledSideEffects ? " U " : " ");

		if (PrintBarriers) {
		TempStream << (LoweredInsts[Index]->isALoadBarrier() ? " * "
		: " ");
		TempStream << (LoweredInsts[Index]->isAStoreBarrier() ? " * "
		: " ");
		}

if (PrintEncodings) {		if (PrintEncodings) {
StringRef Encoding(CE.getEncoding(I.index()));		StringRef Encoding(CE.getEncoding(I.index()));
unsigned EncodingSize = Encoding.size();		unsigned EncodingSize = Encoding.size();
TempStream << " " << EncodingSize		TempStream << " " << EncodingSize
<< (EncodingSize < 10 ? " " : " ");		<< (EncodingSize < 10 ? " " : " ");
TempStream.flush();		TempStream.flush();
formatted_raw_ostream FOS(TempStream);		formatted_raw_ostream FOS(TempStream);
for (unsigned i = 0, e = Encoding.size(); i != e; ++i)		for (unsigned i = 0, e = Encoding.size(); i != e; ++i)
FOS << format("%02x ", (uint8_t)Encoding[i]);		FOS << format("%02x ", (uint8_t)Encoding[i]);
FOS.PadToColumn(30);		FOS.PadToColumn(30);
FOS.flush();		FOS.flush();
}		}

const MCInst &Inst = std::get<1>(I.value());		const MCInst &Inst = std::get<1>(I.value());
TempStream << printInstructionString(Inst) << '\n';		TempStream << printInstructionString(Inst) << '\n';
		++Index;
}		}

TempStream.flush();		TempStream.flush();
OS << Buffer;		OS << Buffer;
}		}

void InstructionInfoView::collectData(		void InstructionInfoView::collectData(
MutableArrayRef<InstructionInfoViewData> IIVD) const {		MutableArrayRef<InstructionInfoViewData> IIVD) const {
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/tools/llvm-mca/llvm-mca.cpp

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableBottleneckAnalysis(
cl::desc("Enable bottleneck analysis (disabled by default)"),		cl::desc("Enable bottleneck analysis (disabled by default)"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

static cl::opt<bool> ShowEncoding(		static cl::opt<bool> ShowEncoding(
"show-encoding",		"show-encoding",
cl::desc("Print encoding information in the instruction info view"),		cl::desc("Print encoding information in the instruction info view"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

		static cl::opt<bool> ShowBarriers(
		"show-barriers",
		cl::desc("Print memory barrier information in the instruction info view"),
		cl::cat(ViewOptions), cl::init(false));

static cl::opt<bool> DisableCustomBehaviour(		static cl::opt<bool> DisableCustomBehaviour(
"disable-cb",		"disable-cb",
cl::desc(		cl::desc(
"Disable custom behaviour (use the default class which does nothing)."),		"Disable custom behaviour (use the default class which does nothing)."),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

namespace {		namespace {

▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	if (!DisableCustomBehaviour) {
TheTarget->createInstrPostProcess(STI, MCII));		TheTarget->createInstrPostProcess(STI, MCII));
}		}
if (!IPP)		if (!IPP)
// If the target doesn't have its own IPP implemented (or the		// If the target doesn't have its own IPP implemented (or the
// -disable-cb flag is set) then we use the base class		// -disable-cb flag is set) then we use the base class
// (which does nothing).		// (which does nothing).
IPP = std::make_unique<mca::InstrPostProcess>(STI, MCII);		IPP = std::make_unique<mca::InstrPostProcess>(STI, MCII);

std::vector<std::unique_ptr<mca::Instruction>> LoweredSequence;		SmallVector<std::unique_ptr<mca::Instruction>> LoweredSequence;
for (const MCInst &MCI : Insts) {		for (const MCInst &MCI : Insts) {
Expected<std::unique_ptr<mca::Instruction>> Inst =		Expected<std::unique_ptr<mca::Instruction>> Inst =
IB.createInstruction(MCI);		IB.createInstruction(MCI);
if (!Inst) {		if (!Inst) {
if (auto NewE = handleErrors(		if (auto NewE = handleErrors(
Inst.takeError(),		Inst.takeError(),
[&IP, &STI](const mca::InstructionError<MCInst> &IE) {		[&IP, &STI](const mca::InstructionError<MCInst> &IE) {
std::string InstructionStr;		std::string InstructionStr;
Show All 26 Lines	if (PrintInstructionTables) {
mca::PipelinePrinter Printer(P, Region, RegionIdx, *STI, PO);		mca::PipelinePrinter Printer(P, Region, RegionIdx, *STI, PO);
if (PrintJson) {		if (PrintJson) {
Printer.addView(		Printer.addView(
std::make_unique<mca::InstructionView>(STI, IP, Insts));		std::make_unique<mca::InstructionView>(STI, IP, Insts));
}		}

// Create the views for this pipeline, execute, and emit a report.		// Create the views for this pipeline, execute, and emit a report.
if (PrintInstructionInfoView) {		if (PrintInstructionInfoView) {
Printer.addView(std::make_unique<mca::InstructionInfoView>(		Printer.addView(std::make_unique<mca::InstructionInfoView>(
STI, MCII, CE, ShowEncoding, Insts, *IP));		STI, MCII, CE, ShowEncoding, Insts, *IP, LoweredSequence,
		ShowBarriers));
}		}
		andreadbUnsubmitted Not Done Reply Inline Actions I wonder if we should use SmallVector for the LoweredSequence (instead of a std::vector). The average code snippet size tends to be very small. So, a SmallVector might perform a bit better. The constructor of InstructionInfoView should probably use an ArrayRef for the LoweredSequence. This would be similar to what we already do for the SourceMgr. andreadb: I wonder if we should use SmallVector for the LoweredSequence (instead of a std::vector). The…
Printer.addView(		Printer.addView(
std::make_unique<mca::ResourcePressureView>(STI, IP, Insts));		std::make_unique<mca::ResourcePressureView>(STI, IP, Insts));

if (!runPipeline(*P))		if (!runPipeline(*P))
return 1;		return 1;

if (PrintJson) {		if (PrintJson) {
Printer.printReport(JSONOutput);		Printer.printReport(JSONOutput);
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	if (EnableBottleneckAnalysis) {
<< "'.\n";		<< "'.\n";
}		}
Printer.addView(std::make_unique<mca::BottleneckAnalysis>(		Printer.addView(std::make_unique<mca::BottleneckAnalysis>(
STI, IP, Insts, S.getNumIterations()));		STI, IP, Insts, S.getNumIterations()));
}		}

if (PrintInstructionInfoView)		if (PrintInstructionInfoView)
Printer.addView(std::make_unique<mca::InstructionInfoView>(		Printer.addView(std::make_unique<mca::InstructionInfoView>(
STI, MCII, CE, ShowEncoding, Insts, *IP));		STI, MCII, CE, ShowEncoding, Insts, *IP, LoweredSequence,
		ShowBarriers));

// Fetch custom Views that are to be placed after the InstructionInfoView.		// Fetch custom Views that are to be placed after the InstructionInfoView.
// Refer to the comment paired with the CB->getStartViews(*IP, Insts); line		// Refer to the comment paired with the CB->getStartViews(*IP, Insts); line
// for more info.		// for more info.
if (!DisableCustomBehaviour) {		if (!DisableCustomBehaviour) {
std::vector<std::unique_ptr<mca::View>> CBViews =		std::vector<std::unique_ptr<mca::View>> CBViews =
CB->getPostInstrInfoViews(*IP, Insts);		CB->getPostInstrInfoViews(*IP, Insts);
for (auto &CBView : CBViews)		for (auto &CBView : CBViews)
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-mca] [LSUnit] Proposal for declaring memory-barrier instructions explicitly rather than making conservative assumptions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 399070

llvm/docs/CommandGuide/llvm-mca.rst

llvm/include/llvm/MCA/CustomBehaviour.h

llvm/include/llvm/MCA/Instruction.h

llvm/lib/MCA/HardwareUnits/LSUnit.cpp

llvm/lib/Target/X86/CMakeLists.txt

llvm/lib/Target/X86/MCA/CMakeLists.txt

llvm/lib/Target/X86/MCA/X86CustomBehaviour.h

llvm/lib/Target/X86/MCA/X86CustomBehaviour.cpp

llvm/test/tools/llvm-mca/AArch64/Cortex/A55-load-store-noalias.s

llvm/test/tools/llvm-mca/AMDGPU/gfx9-retireooo.s

llvm/test/tools/llvm-mca/X86/Barcelona/store-throughput.s

llvm/test/tools/llvm-mca/X86/BdVer2/load-store-throughput.s

llvm/test/tools/llvm-mca/X86/BdVer2/pr37790.s

llvm/test/tools/llvm-mca/X86/BdVer2/store-throughput.s

llvm/test/tools/llvm-mca/X86/BtVer2/pr37790.s

llvm/test/tools/llvm-mca/X86/BtVer2/stmxcsr-ldmxcsr.s

llvm/test/tools/llvm-mca/X86/Haswell/reserved-resources.s

llvm/test/tools/llvm-mca/X86/Haswell/stmxcsr-ldmxcsr.s

llvm/test/tools/llvm-mca/X86/barrier_output.s

llvm/tools/llvm-mca/Views/InstructionInfoView.h

llvm/tools/llvm-mca/Views/InstructionInfoView.cpp

llvm/tools/llvm-mca/llvm-mca.cpp

[llvm-mca] [LSUnit] Proposal for declaring memory-barrier instructions explicitly rather than making conservative assumptions.
ClosedPublic