This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
MCA/HardwareUnits/
-
HardwareUnits/
2/2
RegisterFile.cpp
-
Target/X86/
-
X86/
-
X86InstrArithmetic.td
-
X86SchedBroadwell.td
-
X86SchedHaswell.td
-
X86SchedSandyBridge.td
-
X86SchedSkylakeClient.td
-
X86SchedSkylakeServer.td
-
test/tools/llvm-mca/X86/
-
tools/
-
llvm-mca/
-
X86/
-
Haswell/
-
mulx-same-regs.s
-
SkylakeClient/
-
mulx-same-regs.s

Differential D108727

[X86][MCA] Address other issues with MULX reported in PR51495.
ClosedPublic

Authored by andreadb on Aug 25 2021, 2:13 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
craig.topper
lebedev.ri

Commits

rG4a5b19170397: [X86][MCA] Address the latest issues with MULX reported in PR51495.

Summary

It turns out that SchedWrite WriteIMulH was always assigned to the low half of the result of a MULX (rather than to the high half).

To avoid confusion, this patch swaps the two MULX writes in the tablegen definition of MULX32/64.
That way, write names better describe what they actually refer to; this also avoids further complications, if in future we decide to reuse the same MulH writes to also model other scalar integer multiply instructions.
I also had to swap the latency values for the two MULX writes to make sure that the change is effectively an NFC. In fact, none of the existing x86 tests were affected by this small refactoring.

This patch also fixes a bug in MCA: a wrong latency value was propagated for instructions that perform multiple writes to a same register.
This last issue was found by Roman while testing MULX on targets that define a different latency for the Low/High part of the result.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

andreadb created this revision.Aug 25 2021, 2:13 PM

Herald added subscribers: pengfei, gbedwell, hiraditya. · View Herald TranscriptAug 25 2021, 2:13 PM

andreadb requested review of this revision.Aug 25 2021, 2:13 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2021, 2:13 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

andreadb edited the summary of this revision. (Show Details)Aug 25 2021, 2:15 PM

Harbormaster completed remote builds in B121226: Diff 368718.Aug 25 2021, 2:19 PM

This comment has been deleted.

llvm/lib/MCA/HardwareUnits/RegisterFile.cpp

293

This doesn't compile for me

/repositories/llvm-project/llvm/lib/MCA/HardwareUnits/RegisterFile.cpp:293:34: error: use of undeclared identifier 'RegISterMappings'; did you mean 'RegisterMappings'?
    const WriteRef &OtherWrite = RegISterMappings[RegID].first;
                                 ^~~~~~~~~~~~~~~~
                                 RegisterMappings
/repositories/llvm-project/llvm/include/llvm/MCA/HardwareUnits/RegisterFile.h:190:32: note: 'RegisterMappings' declared here
  std::vector<RegisterMapping> RegisterMappings;
                               ^

I think we should rename WriteIMulH/WriteIMulH to better convey that it is for the low half of the multiplicative result.

In D108727#2965965, @lebedev.ri wrote:

I think we should rename WriteIMulH/WriteIMulH to better convey that it is for the low half of the multiplicative result.

So, rather than swapping the position of the two writes, you suggest to rename WriteIMulH into something like WriteIMulLo?

llvm/lib/MCA/HardwareUnits/RegisterFile.cpp
293	I will fix it. Sorry.

In D108727#2966046, @andreadb wrote:

In D108727#2965965, @lebedev.ri wrote:

I think we should rename WriteIMulH/WriteIMulH to better convey that it is for the low half of the multiplicative result.

So, rather than swapping the position of the two writes, you suggest to rename WriteIMulH into something like WriteIMulLo?

No, i mean in addition to the current diff, also rename the WriteIMulH.

In D108727#2966053, @lebedev.ri wrote:

In D108727#2966046, @andreadb wrote:

In D108727#2965965, @lebedev.ri wrote:

I think we should rename WriteIMulH/WriteIMulH to better convey that it is for the low half of the multiplicative result.

So, rather than swapping the position of the two writes, you suggest to rename WriteIMulH into something like WriteIMulLo?

No, i mean in addition to the current diff, also rename the WriteIMulH.

With this patch, WriteIMulH now correctly references the high half.

If you think that the name should be changed then what name do you suggest to use?

In D108727#2966067, @andreadb wrote:

In D108727#2966053, @lebedev.ri wrote:

In D108727#2966046, @andreadb wrote:

In D108727#2965965, @lebedev.ri wrote:

I think we should rename WriteIMulH/WriteIMulH to better convey that it is for the low half of the multiplicative result.

So, rather than swapping the position of the two writes, you suggest to rename WriteIMulH into something like WriteIMulLo?

No, i mean in addition to the current diff, also rename the WriteIMulH.

With this patch, WriteIMulH now correctly references the high half.

If you think that the name should be changed then what name do you suggest to use?

Ah, hmm, i think i got fooled by overrides in znver3 model.
Looking at this again, i believe this is correct as-is.
I will fix Zen3 model afterwards.

LG
@RKSimon ?

This revision is now accepted and ready to land.Aug 25 2021, 3:13 PM

Address review comment.

andreadb marked an inline comment as done.Aug 25 2021, 3:25 PM

Harbormaster completed remote builds in B121251: Diff 368753.Aug 25 2021, 5:25 PM

LGTM

Closed by commit rG4a5b19170397: [X86][MCA] Address the latest issues with MULX reported in PR51495. (authored by andreadb). · Explain WhyAug 26 2021, 4:10 AM

This revision was automatically updated to reflect the committed changes.

andreadb added a commit: rG4a5b19170397: [X86][MCA] Address the latest issues with MULX reported in PR51495..

I believe there is some other llvm-mca bug, because now i can not fix znver3 since llvm-mca simply hangs on existing tests (ninja check-llvm-tools-llvm-mca) after:

diff --git a/llvm/lib/Target/X86/X86ScheduleZnver3.td b/llvm/lib/Target/X86/X86ScheduleZnver3.td
index c2be9ec6085d..be07c069aae1 100644
--- a/llvm/lib/Target/X86/X86ScheduleZnver3.td
+++ b/llvm/lib/Target/X86/X86ScheduleZnver3.td
@@ -617,45 +617,11 @@ defm : Zn3WriteResIntPair<WriteIMul16, [Zn3Multiplier], 3, [3], 3, /*LoadUOps=*/
 defm : Zn3WriteResIntPair<WriteIMul16Imm, [Zn3Multiplier], 4, [4], 2>; // Integer 16-bit multiplication by immediate.
 defm : Zn3WriteResIntPair<WriteIMul16Reg, [Zn3Multiplier], 3, [1], 1>; // Integer 16-bit multiplication by register.
 defm : Zn3WriteResIntPair<WriteIMul32, [Zn3Multiplier], 3, [3], 2>;    // Integer 32-bit multiplication.
-defm : Zn3WriteResIntPair<WriteMULX32, [Zn3Multiplier], 4, [1], 2>;    // Integer 32-bit Unsigned Multiply Without Affecting Flags.
-
-def Zn3MULX32rr : SchedWriteRes<[Zn3Multiplier]> {
-  let Latency = 4;
-  let ResourceCycles = [1];
-  let NumMicroOps = 2;
-}
-def : InstRW<[Zn3MULX32rr, WriteIMulH], (instrs MULX32rr)>;
-
-def Zn3MULX32rm : SchedWriteRes<[Zn3AGU012, Zn3Load, Zn3Multiplier]> {
-  let Latency = !add(Znver3Model.LoadLatency, Zn3MULX32rr.Latency);
-  let ResourceCycles = [1, 1, 2];
-  let NumMicroOps = Zn3MULX32rr.NumMicroOps;
-}
-def : InstRW<[Zn3MULX32rm, WriteIMulHLd,
-              ReadDefault, ReadDefault, ReadDefault, ReadDefault, ReadDefault,
-              ReadAfterLd], (instrs MULX32rm)>;
-
+defm : Zn3WriteResIntPair<WriteMULX32, [Zn3Multiplier], 3, [1], 2>;    // Integer 32-bit Unsigned Multiply Without Affecting Flags.
 defm : Zn3WriteResIntPair<WriteIMul32Imm, [Zn3Multiplier], 3, [1], 1>; // Integer 32-bit multiplication by immediate.
 defm : Zn3WriteResIntPair<WriteIMul32Reg, [Zn3Multiplier], 3, [1], 1>; // Integer 32-bit multiplication by register.
 defm : Zn3WriteResIntPair<WriteIMul64, [Zn3Multiplier], 3, [3], 2>;    // Integer 64-bit multiplication.
-defm : Zn3WriteResIntPair<WriteMULX64, [Zn3Multiplier], 4, [1], 2>;    // Integer 32-bit Unsigned Multiply Without Affecting Flags.
-
-def Zn3MULX64rr : SchedWriteRes<[Zn3Multiplier]> {
-  let Latency = 4;
-  let ResourceCycles = [1];
-  let NumMicroOps = 2;
-}
-def : InstRW<[Zn3MULX64rr, WriteIMulH], (instrs MULX64rr)>;
-
-def Zn3MULX64rm : SchedWriteRes<[Zn3AGU012, Zn3Load, Zn3Multiplier]> {
-  let Latency = !add(Znver3Model.LoadLatency, Zn3MULX64rr.Latency);
-  let ResourceCycles = [1, 1, 2];
-  let NumMicroOps = Zn3MULX64rr.NumMicroOps;
-}
-def : InstRW<[Zn3MULX64rm, WriteIMulHLd,
-              ReadDefault, ReadDefault, ReadDefault, ReadDefault, ReadDefault,
-              ReadAfterLd], (instrs MULX64rm)>;
-
+defm : Zn3WriteResIntPair<WriteMULX64, [Zn3Multiplier], 3, [1], 2>;    // Integer 32-bit Unsigned Multiply Without Affecting Flags.
 defm : Zn3WriteResIntPair<WriteIMul64Imm, [Zn3Multiplier], 3, [1], 1>; // Integer 64-bit multiplication by immediate.
 defm : Zn3WriteResIntPair<WriteIMul64Reg, [Zn3Multiplier], 3, [1], 1>; // Integer 64-bit multiplication by register.
 defm : Zn3WriteResInt<WriteIMulHLd, [], !add(4, Znver3Model.LoadLatency), [], 0>;  // Integer multiplication, high part.

In D108727#2966983, @lebedev.ri wrote:

I believe there is some other llvm-mca bug, because now i can not fix znver3 since llvm-mca simply hangs on existing tests (ninja check-llvm-tools-llvm-mca) after:

diff --git a/llvm/lib/Target/X86/X86ScheduleZnver3.td b/llvm/lib/Target/X86/X86ScheduleZnver3.td
index c2be9ec6085d..be07c069aae1 100644
--- a/llvm/lib/Target/X86/X86ScheduleZnver3.td
+++ b/llvm/lib/Target/X86/X86ScheduleZnver3.td
@@ -617,45 +617,11 @@ defm : Zn3WriteResIntPair<WriteIMul16, [Zn3Multiplier], 3, [3], 3, /*LoadUOps=*/
 defm : Zn3WriteResIntPair<WriteIMul16Imm, [Zn3Multiplier], 4, [4], 2>; // Integer 16-bit multiplication by immediate.
 defm : Zn3WriteResIntPair<WriteIMul16Reg, [Zn3Multiplier], 3, [1], 1>; // Integer 16-bit multiplication by register.
 defm : Zn3WriteResIntPair<WriteIMul32, [Zn3Multiplier], 3, [3], 2>;    // Integer 32-bit multiplication.
-defm : Zn3WriteResIntPair<WriteMULX32, [Zn3Multiplier], 4, [1], 2>;    // Integer 32-bit Unsigned Multiply Without Affecting Flags.
-
-def Zn3MULX32rr : SchedWriteRes<[Zn3Multiplier]> {
-  let Latency = 4;
-  let ResourceCycles = [1];
-  let NumMicroOps = 2;
-}
-def : InstRW<[Zn3MULX32rr, WriteIMulH], (instrs MULX32rr)>;
-
-def Zn3MULX32rm : SchedWriteRes<[Zn3AGU012, Zn3Load, Zn3Multiplier]> {
-  let Latency = !add(Znver3Model.LoadLatency, Zn3MULX32rr.Latency);
-  let ResourceCycles = [1, 1, 2];
-  let NumMicroOps = Zn3MULX32rr.NumMicroOps;
-}
-def : InstRW<[Zn3MULX32rm, WriteIMulHLd,
-              ReadDefault, ReadDefault, ReadDefault, ReadDefault, ReadDefault,
-              ReadAfterLd], (instrs MULX32rm)>;
-
+defm : Zn3WriteResIntPair<WriteMULX32, [Zn3Multiplier], 3, [1], 2>;    // Integer 32-bit Unsigned Multiply Without Affecting Flags.
 defm : Zn3WriteResIntPair<WriteIMul32Imm, [Zn3Multiplier], 3, [1], 1>; // Integer 32-bit multiplication by immediate.
 defm : Zn3WriteResIntPair<WriteIMul32Reg, [Zn3Multiplier], 3, [1], 1>; // Integer 32-bit multiplication by register.
 defm : Zn3WriteResIntPair<WriteIMul64, [Zn3Multiplier], 3, [3], 2>;    // Integer 64-bit multiplication.
-defm : Zn3WriteResIntPair<WriteMULX64, [Zn3Multiplier], 4, [1], 2>;    // Integer 32-bit Unsigned Multiply Without Affecting Flags.
-
-def Zn3MULX64rr : SchedWriteRes<[Zn3Multiplier]> {
-  let Latency = 4;
-  let ResourceCycles = [1];
-  let NumMicroOps = 2;
-}
-def : InstRW<[Zn3MULX64rr, WriteIMulH], (instrs MULX64rr)>;
-
-def Zn3MULX64rm : SchedWriteRes<[Zn3AGU012, Zn3Load, Zn3Multiplier]> {
-  let Latency = !add(Znver3Model.LoadLatency, Zn3MULX64rr.Latency);
-  let ResourceCycles = [1, 1, 2];
-  let NumMicroOps = Zn3MULX64rr.NumMicroOps;
-}
-def : InstRW<[Zn3MULX64rm, WriteIMulHLd,
-              ReadDefault, ReadDefault, ReadDefault, ReadDefault, ReadDefault,
-              ReadAfterLd], (instrs MULX64rm)>;
-
+defm : Zn3WriteResIntPair<WriteMULX64, [Zn3Multiplier], 3, [1], 2>;    // Integer 32-bit Unsigned Multiply Without Affecting Flags.
 defm : Zn3WriteResIntPair<WriteIMul64Imm, [Zn3Multiplier], 3, [1], 1>; // Integer 64-bit multiplication by immediate.
 defm : Zn3WriteResIntPair<WriteIMul64Reg, [Zn3Multiplier], 3, [1], 1>; // Integer 64-bit multiplication by register.
 defm : Zn3WriteResInt<WriteIMulHLd, [], !add(4, Znver3Model.LoadLatency), [], 0>;  // Integer multiplication, high part.

I'll see if I can reproduce it. Thanks for reporting it.

@lebedev.ri I finally found out what the problem was. Luckily the fix was simple, and I was able to test it using your modified scheduling model.

That bug is fixed by this commit:

[llvm] 1eb7536 - [MCA][RegisterFile] Consistently update the PRF in the presence of multiple writes to the same register.

Please let me know if that works for you too.

Apologies for all the issues caused by this change.

Thanks,
-Andrea

In D108727#2967710, @andreadb wrote:
@lebedev.ri I finally found out what the problem was. Luckily the fix was simple, and I was able to test it using your modified scheduling model.

That bug is fixed by this commit:
[llvm] 1eb7536 - [MCA][RegisterFile] Consistently update the PRF in the presence of multiple writes to the same register.
Please let me know if that works for you too.

Apologies for all the issues caused by this change.

Thanks,
-Andrea

Thank you!

Revision Contents

Path

Size

llvm/

lib/

MCA/

HardwareUnits/

RegisterFile.cpp

11 lines

Target/

X86/

X86InstrArithmetic.td

4 lines

X86SchedBroadwell.td

6 lines

X86SchedHaswell.td

6 lines

X86SchedSandyBridge.td

6 lines

X86SchedSkylakeClient.td

6 lines

X86SchedSkylakeServer.td

6 lines

test/

tools/

llvm-mca/

X86/

Haswell/

mulx-same-regs.s

26 lines

SkylakeClient/

mulx-same-regs.s

26 lines

Diff 368844

llvm/lib/MCA/HardwareUnits/RegisterFile.cpp

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	MCPhysReg ZeroRegisterID =
WS.clearsSuperRegisters() ? RegID : WS.getRegisterID();		WS.clearsSuperRegisters() ? RegID : WS.getRegisterID();
ZeroRegisters.setBitVal(ZeroRegisterID, IsWriteZero);		ZeroRegisters.setBitVal(ZeroRegisterID, IsWriteZero);
for (MCSubRegIterator I(ZeroRegisterID, &MRI); I.isValid(); ++I)		for (MCSubRegIterator I(ZeroRegisterID, &MRI); I.isValid(); ++I)
ZeroRegisters.setBitVal(*I, IsWriteZero);		ZeroRegisters.setBitVal(*I, IsWriteZero);

// If this move has been eliminated, then method tryEliminateMoveOrSwap should		// If this move has been eliminated, then method tryEliminateMoveOrSwap should
// have already updated all the register mappings.		// have already updated all the register mappings.
if (!IsEliminated) {		if (!IsEliminated) {
		// Check if this is one of multiple writes performed by this
		// instruction to register RegID.
		const WriteRef &OtherWrite = RegisterMappings[RegID].first;
		lebedev.riUnsubmitted Done Reply Inline Actions This doesn't compile for me /repositories/llvm-project/llvm/lib/MCA/HardwareUnits/RegisterFile.cpp:293:34: error: use of undeclared identifier 'RegISterMappings'; did you mean 'RegisterMappings'? const WriteRef &OtherWrite = RegISterMappings[RegID].first; ^~~~~~~~~~~~~~~~ RegisterMappings /repositories/llvm-project/llvm/include/llvm/MCA/HardwareUnits/RegisterFile.h:190:32: note: 'RegisterMappings' declared here std::vector<RegisterMapping> RegisterMappings; ^ lebedev.ri: This doesn't compile for me ``` /repositories/llvm…
		andreadbAuthorUnsubmitted Done Reply Inline Actions I will fix it. Sorry. andreadb: I will fix it. Sorry.
		const WriteState *OtherWS = OtherWrite.getWriteState();
		if (OtherWS && OtherWrite.getSourceIndex() == Write.getSourceIndex()) {
		if (OtherWS->getLatency() > WS.getLatency()) {
		// Conservatively keep the slowest write to RegID.
		return;
		}
		}

// Update the mapping for register RegID including its sub-registers.		// Update the mapping for register RegID including its sub-registers.
RegisterMappings[RegID].first = Write;		RegisterMappings[RegID].first = Write;
RegisterMappings[RegID].second.AliasRegID = 0U;		RegisterMappings[RegID].second.AliasRegID = 0U;
for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I) {		for (MCSubRegIterator I(RegID, &MRI); I.isValid(); ++I) {
RegisterMappings[*I].first = Write;		RegisterMappings[*I].first = Write;
RegisterMappings[*I].second.AliasRegID = 0U;		RegisterMappings[*I].second.AliasRegID = 0U;
}		}

▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrArithmetic.td

	Show First 20 Lines • Show All 1,491 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// MULX Instruction			// MULX Instruction
	//			//
	multiclass bmi_mulx<string mnemonic, RegisterClass RC, X86MemOperand x86memop,			multiclass bmi_mulx<string mnemonic, RegisterClass RC, X86MemOperand x86memop,
	X86FoldableSchedWrite sched> {			X86FoldableSchedWrite sched> {
	let hasSideEffects = 0 in {			let hasSideEffects = 0 in {
	def rr : I<0xF6, MRMSrcReg, (outs RC:$dst1, RC:$dst2), (ins RC:$src),			def rr : I<0xF6, MRMSrcReg, (outs RC:$dst1, RC:$dst2), (ins RC:$src),
	!strconcat(mnemonic, "\t{$src, $dst2, $dst1\|$dst1, $dst2, $src}"),			!strconcat(mnemonic, "\t{$src, $dst2, $dst1\|$dst1, $dst2, $src}"),
	[]>, T8XD, VEX_4V, Sched<[sched, WriteIMulH]>;			[]>, T8XD, VEX_4V, Sched<[WriteIMulH, sched]>;

	let mayLoad = 1 in			let mayLoad = 1 in
	def rm : I<0xF6, MRMSrcMem, (outs RC:$dst1, RC:$dst2), (ins x86memop:$src),			def rm : I<0xF6, MRMSrcMem, (outs RC:$dst1, RC:$dst2), (ins x86memop:$src),
	!strconcat(mnemonic, "\t{$src, $dst2, $dst1\|$dst1, $dst2, $src}"),			!strconcat(mnemonic, "\t{$src, $dst2, $dst1\|$dst1, $dst2, $src}"),
	[]>, T8XD, VEX_4V,			[]>, T8XD, VEX_4V,
	Sched<[sched.Folded, WriteIMulHLd,			Sched<[WriteIMulHLd, sched.Folded,
	// Memory operand.			// Memory operand.
	ReadDefault, ReadDefault, ReadDefault, ReadDefault, ReadDefault,			ReadDefault, ReadDefault, ReadDefault, ReadDefault, ReadDefault,
	// Implicit read of EDX/RDX			// Implicit read of EDX/RDX
	sched.ReadAfterFold]>;			sched.ReadAfterFold]>;

	// Pseudo instructions to be used when the low result isn't used. The			// Pseudo instructions to be used when the low result isn't used. The
	// instruction is defined to keep the high if both destinations are the same.			// instruction is defined to keep the high if both destinations are the same.
	def Hrr : PseudoI<(outs RC:$dst), (ins RC:$src),			def Hrr : PseudoI<(outs RC:$dst), (ins RC:$src),
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86SchedBroadwell.td

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines

	// Integer multiplication.			// Integer multiplication.
	defm : BWWriteResPair<WriteIMul8, [BWPort1], 3>;			defm : BWWriteResPair<WriteIMul8, [BWPort1], 3>;
	defm : BWWriteResPair<WriteIMul16, [BWPort1,BWPort06,BWPort0156], 4, [1,1,2], 4>;			defm : BWWriteResPair<WriteIMul16, [BWPort1,BWPort06,BWPort0156], 4, [1,1,2], 4>;
	defm : X86WriteRes<WriteIMul16Imm, [BWPort1,BWPort0156], 4, [1,1], 2>;			defm : X86WriteRes<WriteIMul16Imm, [BWPort1,BWPort0156], 4, [1,1], 2>;
	defm : X86WriteRes<WriteIMul16ImmLd, [BWPort1,BWPort0156,BWPort23], 8, [1,1,1], 3>;			defm : X86WriteRes<WriteIMul16ImmLd, [BWPort1,BWPort0156,BWPort23], 8, [1,1,1], 3>;
	defm : BWWriteResPair<WriteIMul16Reg, [BWPort1], 3>;			defm : BWWriteResPair<WriteIMul16Reg, [BWPort1], 3>;
	defm : BWWriteResPair<WriteIMul32, [BWPort1,BWPort06,BWPort0156], 4, [1,1,1], 3>;			defm : BWWriteResPair<WriteIMul32, [BWPort1,BWPort06,BWPort0156], 4, [1,1,1], 3>;
	defm : BWWriteResPair<WriteMULX32, [BWPort1,BWPort06,BWPort0156], 4, [1,1,1], 3>;			defm : BWWriteResPair<WriteMULX32, [BWPort1,BWPort06,BWPort0156], 3, [1,1,1], 3>;
	defm : BWWriteResPair<WriteIMul32Imm, [BWPort1], 3>;			defm : BWWriteResPair<WriteIMul32Imm, [BWPort1], 3>;
	defm : BWWriteResPair<WriteIMul32Reg, [BWPort1], 3>;			defm : BWWriteResPair<WriteIMul32Reg, [BWPort1], 3>;
	defm : BWWriteResPair<WriteIMul64, [BWPort1,BWPort5], 4, [1,1], 2>;			defm : BWWriteResPair<WriteIMul64, [BWPort1,BWPort5], 4, [1,1], 2>;
	defm : BWWriteResPair<WriteMULX64, [BWPort1,BWPort5], 4, [1,1], 2>;			defm : BWWriteResPair<WriteMULX64, [BWPort1,BWPort5], 3, [1,1], 2>;
	defm : BWWriteResPair<WriteIMul64Imm, [BWPort1], 3>;			defm : BWWriteResPair<WriteIMul64Imm, [BWPort1], 3>;
	defm : BWWriteResPair<WriteIMul64Reg, [BWPort1], 3>;			defm : BWWriteResPair<WriteIMul64Reg, [BWPort1], 3>;
	def BWWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 3; }			def BWWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 4; }
	def : WriteRes<WriteIMulHLd, []> {			def : WriteRes<WriteIMulHLd, []> {
	let Latency = !add(BWWriteIMulH.Latency, BroadwellModel.LoadLatency);			let Latency = !add(BWWriteIMulH.Latency, BroadwellModel.LoadLatency);
	}			}

	defm : X86WriteRes<WriteBSWAP32, [BWPort15], 1, [1], 1>;			defm : X86WriteRes<WriteBSWAP32, [BWPort15], 1, [1], 1>;
	defm : X86WriteRes<WriteBSWAP64, [BWPort06, BWPort15], 2, [1, 1], 2>;			defm : X86WriteRes<WriteBSWAP64, [BWPort06, BWPort15], 2, [1, 1], 2>;
	defm : X86WriteRes<WriteCMPXCHG,[BWPort06, BWPort0156], 5, [2, 3], 5>;			defm : X86WriteRes<WriteCMPXCHG,[BWPort06, BWPort0156], 5, [2, 3], 5>;
	defm : X86WriteRes<WriteCMPXCHGRMW,[BWPort23, BWPort06, BWPort0156, BWPort237, BWPort4], 8, [1, 2, 1, 1, 1], 6>;			defm : X86WriteRes<WriteCMPXCHGRMW,[BWPort23, BWPort06, BWPort0156, BWPort237, BWPort4], 8, [1, 2, 1, 1, 1], 6>;
	▲ Show 20 Lines • Show All 1,584 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86SchedHaswell.td

	Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines

	// Integer multiplication.			// Integer multiplication.
	defm : HWWriteResPair<WriteIMul8, [HWPort1], 3>;			defm : HWWriteResPair<WriteIMul8, [HWPort1], 3>;
	defm : HWWriteResPair<WriteIMul16, [HWPort1,HWPort06,HWPort0156], 4, [1,1,2], 4>;			defm : HWWriteResPair<WriteIMul16, [HWPort1,HWPort06,HWPort0156], 4, [1,1,2], 4>;
	defm : X86WriteRes<WriteIMul16Imm, [HWPort1,HWPort0156], 4, [1,1], 2>;			defm : X86WriteRes<WriteIMul16Imm, [HWPort1,HWPort0156], 4, [1,1], 2>;
	defm : X86WriteRes<WriteIMul16ImmLd, [HWPort1,HWPort0156,HWPort23], 8, [1,1,1], 3>;			defm : X86WriteRes<WriteIMul16ImmLd, [HWPort1,HWPort0156,HWPort23], 8, [1,1,1], 3>;
	defm : HWWriteResPair<WriteIMul16Reg, [HWPort1], 3>;			defm : HWWriteResPair<WriteIMul16Reg, [HWPort1], 3>;
	defm : HWWriteResPair<WriteIMul32, [HWPort1,HWPort06,HWPort0156], 4, [1,1,1], 3>;			defm : HWWriteResPair<WriteIMul32, [HWPort1,HWPort06,HWPort0156], 4, [1,1,1], 3>;
	defm : HWWriteResPair<WriteMULX32, [HWPort1,HWPort06,HWPort0156], 4, [1,1,1], 3>;			defm : HWWriteResPair<WriteMULX32, [HWPort1,HWPort06,HWPort0156], 3, [1,1,1], 3>;
	defm : HWWriteResPair<WriteIMul32Imm, [HWPort1], 3>;			defm : HWWriteResPair<WriteIMul32Imm, [HWPort1], 3>;
	defm : HWWriteResPair<WriteIMul32Reg, [HWPort1], 3>;			defm : HWWriteResPair<WriteIMul32Reg, [HWPort1], 3>;
	defm : HWWriteResPair<WriteIMul64, [HWPort1,HWPort6], 4, [1,1], 2>;			defm : HWWriteResPair<WriteIMul64, [HWPort1,HWPort6], 4, [1,1], 2>;
	defm : HWWriteResPair<WriteMULX64, [HWPort1,HWPort6], 4, [1,1], 2>;			defm : HWWriteResPair<WriteMULX64, [HWPort1,HWPort6], 3, [1,1], 2>;
	defm : HWWriteResPair<WriteIMul64Imm, [HWPort1], 3>;			defm : HWWriteResPair<WriteIMul64Imm, [HWPort1], 3>;
	defm : HWWriteResPair<WriteIMul64Reg, [HWPort1], 3>;			defm : HWWriteResPair<WriteIMul64Reg, [HWPort1], 3>;
	def HWWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 3; }			def HWWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 4; }
	def : WriteRes<WriteIMulHLd, []> {			def : WriteRes<WriteIMulHLd, []> {
	let Latency = !add(HWWriteIMulH.Latency, HaswellModel.LoadLatency);			let Latency = !add(HWWriteIMulH.Latency, HaswellModel.LoadLatency);
	}			}

	defm : X86WriteRes<WriteBSWAP32, [HWPort15], 1, [1], 1>;			defm : X86WriteRes<WriteBSWAP32, [HWPort15], 1, [1], 1>;
	defm : X86WriteRes<WriteBSWAP64, [HWPort06, HWPort15], 2, [1,1], 2>;			defm : X86WriteRes<WriteBSWAP64, [HWPort06, HWPort15], 2, [1,1], 2>;
	defm : X86WriteRes<WriteCMPXCHG,[HWPort06, HWPort0156], 5, [2,3], 5>;			defm : X86WriteRes<WriteCMPXCHG,[HWPort06, HWPort0156], 5, [2,3], 5>;
	defm : X86WriteRes<WriteCMPXCHGRMW,[HWPort23,HWPort06,HWPort0156,HWPort237,HWPort4], 9, [1,2,1,1,1], 6>;			defm : X86WriteRes<WriteCMPXCHGRMW,[HWPort23,HWPort06,HWPort0156,HWPort237,HWPort4], 9, [1,2,1,1,1], 6>;
	▲ Show 20 Lines • Show All 1,871 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86SchedSandyBridge.td

	Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	defm : SBWriteResPair<WriteADC, [SBPort05,SBPort015], 2, [1,1], 2>;			defm : SBWriteResPair<WriteADC, [SBPort05,SBPort015], 2, [1,1], 2>;

	defm : SBWriteResPair<WriteIMul8, [SBPort1], 3>;			defm : SBWriteResPair<WriteIMul8, [SBPort1], 3>;
	defm : SBWriteResPair<WriteIMul16, [SBPort1,SBPort05,SBPort015], 4, [1,1,2], 4>;			defm : SBWriteResPair<WriteIMul16, [SBPort1,SBPort05,SBPort015], 4, [1,1,2], 4>;
	defm : X86WriteRes<WriteIMul16Imm, [SBPort1,SBPort015], 4, [1,1], 2>;			defm : X86WriteRes<WriteIMul16Imm, [SBPort1,SBPort015], 4, [1,1], 2>;
	defm : X86WriteRes<WriteIMul16ImmLd, [SBPort1,SBPort015,SBPort23], 8, [1,1,1], 3>;			defm : X86WriteRes<WriteIMul16ImmLd, [SBPort1,SBPort015,SBPort23], 8, [1,1,1], 3>;
	defm : SBWriteResPair<WriteIMul16Reg, [SBPort1], 3>;			defm : SBWriteResPair<WriteIMul16Reg, [SBPort1], 3>;
	defm : SBWriteResPair<WriteIMul32, [SBPort1,SBPort05,SBPort015], 4, [1,1,1], 3>;			defm : SBWriteResPair<WriteIMul32, [SBPort1,SBPort05,SBPort015], 4, [1,1,1], 3>;
	defm : SBWriteResPair<WriteMULX32, [SBPort1,SBPort05,SBPort015], 4, [1,1,1], 3>;			defm : SBWriteResPair<WriteMULX32, [SBPort1,SBPort05,SBPort015], 3, [1,1,1], 3>;
	defm : SBWriteResPair<WriteIMul32Imm, [SBPort1], 3>;			defm : SBWriteResPair<WriteIMul32Imm, [SBPort1], 3>;
	defm : SBWriteResPair<WriteIMul32Reg, [SBPort1], 3>;			defm : SBWriteResPair<WriteIMul32Reg, [SBPort1], 3>;
	defm : SBWriteResPair<WriteIMul64, [SBPort1,SBPort0], 4, [1,1], 2>;			defm : SBWriteResPair<WriteIMul64, [SBPort1,SBPort0], 4, [1,1], 2>;
	defm : SBWriteResPair<WriteMULX64, [SBPort1,SBPort0], 4, [1,1], 2>;			defm : SBWriteResPair<WriteMULX64, [SBPort1,SBPort0], 3, [1,1], 2>;
	defm : SBWriteResPair<WriteIMul64Imm, [SBPort1], 3>;			defm : SBWriteResPair<WriteIMul64Imm, [SBPort1], 3>;
	defm : SBWriteResPair<WriteIMul64Reg, [SBPort1], 3>;			defm : SBWriteResPair<WriteIMul64Reg, [SBPort1], 3>;
	def SBWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 3; }			def SBWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 4; }
	def : WriteRes<WriteIMulHLd, []> {			def : WriteRes<WriteIMulHLd, []> {
	let Latency = !add(SBWriteIMulH.Latency, SandyBridgeModel.LoadLatency);			let Latency = !add(SBWriteIMulH.Latency, SandyBridgeModel.LoadLatency);
	}			}

	defm : X86WriteRes<WriteXCHG, [SBPort015], 2, [3], 3>;			defm : X86WriteRes<WriteXCHG, [SBPort015], 2, [3], 3>;
	defm : X86WriteRes<WriteBSWAP32, [SBPort1], 1, [1], 1>;			defm : X86WriteRes<WriteBSWAP32, [SBPort1], 1, [1], 1>;
	defm : X86WriteRes<WriteBSWAP64, [SBPort1, SBPort05], 2, [1,1], 2>;			defm : X86WriteRes<WriteBSWAP64, [SBPort1, SBPort05], 2, [1,1], 2>;
	defm : X86WriteRes<WriteCMPXCHG, [SBPort05, SBPort015], 5, [1,3], 4>;			defm : X86WriteRes<WriteCMPXCHG, [SBPort05, SBPort015], 5, [1,3], 4>;
	▲ Show 20 Lines • Show All 1,091 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86SchedSkylakeClient.td

	Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines

	// Integer multiplication.			// Integer multiplication.
	defm : SKLWriteResPair<WriteIMul8, [SKLPort1], 3>;			defm : SKLWriteResPair<WriteIMul8, [SKLPort1], 3>;
	defm : SKLWriteResPair<WriteIMul16, [SKLPort1,SKLPort06,SKLPort0156], 4, [1,1,2], 4>;			defm : SKLWriteResPair<WriteIMul16, [SKLPort1,SKLPort06,SKLPort0156], 4, [1,1,2], 4>;
	defm : X86WriteRes<WriteIMul16Imm, [SKLPort1,SKLPort0156], 4, [1,1], 2>;			defm : X86WriteRes<WriteIMul16Imm, [SKLPort1,SKLPort0156], 4, [1,1], 2>;
	defm : X86WriteRes<WriteIMul16ImmLd, [SKLPort1,SKLPort0156,SKLPort23], 8, [1,1,1], 3>;			defm : X86WriteRes<WriteIMul16ImmLd, [SKLPort1,SKLPort0156,SKLPort23], 8, [1,1,1], 3>;
	defm : SKLWriteResPair<WriteIMul16Reg, [SKLPort1], 3>;			defm : SKLWriteResPair<WriteIMul16Reg, [SKLPort1], 3>;
	defm : SKLWriteResPair<WriteIMul32, [SKLPort1,SKLPort06,SKLPort0156], 4, [1,1,1], 3>;			defm : SKLWriteResPair<WriteIMul32, [SKLPort1,SKLPort06,SKLPort0156], 4, [1,1,1], 3>;
	defm : SKLWriteResPair<WriteMULX32, [SKLPort1,SKLPort06,SKLPort0156], 4, [1,1,1], 3>;			defm : SKLWriteResPair<WriteMULX32, [SKLPort1,SKLPort06,SKLPort0156], 3, [1,1,1], 3>;
	defm : SKLWriteResPair<WriteIMul32Imm, [SKLPort1], 3>;			defm : SKLWriteResPair<WriteIMul32Imm, [SKLPort1], 3>;
	defm : SKLWriteResPair<WriteIMul32Reg, [SKLPort1], 3>;			defm : SKLWriteResPair<WriteIMul32Reg, [SKLPort1], 3>;
	defm : SKLWriteResPair<WriteIMul64, [SKLPort1,SKLPort5], 4, [1,1], 2>;			defm : SKLWriteResPair<WriteIMul64, [SKLPort1,SKLPort5], 4, [1,1], 2>;
	defm : SKLWriteResPair<WriteMULX64, [SKLPort1,SKLPort5], 4, [1,1], 2>;			defm : SKLWriteResPair<WriteMULX64, [SKLPort1,SKLPort5], 3, [1,1], 2>;
	defm : SKLWriteResPair<WriteIMul64Imm, [SKLPort1], 3>;			defm : SKLWriteResPair<WriteIMul64Imm, [SKLPort1], 3>;
	defm : SKLWriteResPair<WriteIMul64Reg, [SKLPort1], 3>;			defm : SKLWriteResPair<WriteIMul64Reg, [SKLPort1], 3>;
	def SKLWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 3; }			def SKLWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 4; }
	def : WriteRes<WriteIMulHLd, []> {			def : WriteRes<WriteIMulHLd, []> {
	let Latency = !add(SKLWriteIMulH.Latency, SkylakeClientModel.LoadLatency);			let Latency = !add(SKLWriteIMulH.Latency, SkylakeClientModel.LoadLatency);
	}			}

	defm : X86WriteRes<WriteBSWAP32, [SKLPort15], 1, [1], 1>;			defm : X86WriteRes<WriteBSWAP32, [SKLPort15], 1, [1], 1>;
	defm : X86WriteRes<WriteBSWAP64, [SKLPort06, SKLPort15], 2, [1,1], 2>;			defm : X86WriteRes<WriteBSWAP64, [SKLPort06, SKLPort15], 2, [1,1], 2>;
	defm : X86WriteRes<WriteCMPXCHG,[SKLPort06, SKLPort0156], 5, [2,3], 5>;			defm : X86WriteRes<WriteCMPXCHG,[SKLPort06, SKLPort0156], 5, [2,3], 5>;
	defm : X86WriteRes<WriteCMPXCHGRMW,[SKLPort23,SKLPort06,SKLPort0156,SKLPort237,SKLPort4], 8, [1,2,1,1,1], 6>;			defm : X86WriteRes<WriteCMPXCHGRMW,[SKLPort23,SKLPort06,SKLPort0156,SKLPort237,SKLPort4], 8, [1,2,1,1,1], 6>;
	▲ Show 20 Lines • Show All 1,764 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86SchedSkylakeServer.td

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	// Integer multiplication.			// Integer multiplication.
	defm : SKXWriteResPair<WriteIMul8, [SKXPort1], 3>;			defm : SKXWriteResPair<WriteIMul8, [SKXPort1], 3>;
	defm : SKXWriteResPair<WriteIMul16, [SKXPort1,SKXPort06,SKXPort0156], 4, [1,1,2], 4>;			defm : SKXWriteResPair<WriteIMul16, [SKXPort1,SKXPort06,SKXPort0156], 4, [1,1,2], 4>;
	defm : X86WriteRes<WriteIMul16Imm, [SKXPort1,SKXPort0156], 4, [1,1], 2>;			defm : X86WriteRes<WriteIMul16Imm, [SKXPort1,SKXPort0156], 4, [1,1], 2>;
	defm : X86WriteRes<WriteIMul16ImmLd, [SKXPort1,SKXPort0156,SKXPort23], 8, [1,1,1], 3>;			defm : X86WriteRes<WriteIMul16ImmLd, [SKXPort1,SKXPort0156,SKXPort23], 8, [1,1,1], 3>;
	defm : X86WriteRes<WriteIMul16Reg, [SKXPort1], 3, [1], 1>;			defm : X86WriteRes<WriteIMul16Reg, [SKXPort1], 3, [1], 1>;
	defm : X86WriteRes<WriteIMul16RegLd, [SKXPort1,SKXPort0156,SKXPort23], 8, [1,1,1], 3>;			defm : X86WriteRes<WriteIMul16RegLd, [SKXPort1,SKXPort0156,SKXPort23], 8, [1,1,1], 3>;
	defm : SKXWriteResPair<WriteIMul32, [SKXPort1,SKXPort06,SKXPort0156], 4, [1,1,1], 3>;			defm : SKXWriteResPair<WriteIMul32, [SKXPort1,SKXPort06,SKXPort0156], 4, [1,1,1], 3>;
	defm : SKXWriteResPair<WriteMULX32, [SKXPort1,SKXPort06,SKXPort0156], 4, [1,1,1], 3>;			defm : SKXWriteResPair<WriteMULX32, [SKXPort1,SKXPort06,SKXPort0156], 3, [1,1,1], 3>;
	defm : SKXWriteResPair<WriteIMul32Imm, [SKXPort1], 3>;			defm : SKXWriteResPair<WriteIMul32Imm, [SKXPort1], 3>;
	defm : SKXWriteResPair<WriteIMul32Reg, [SKXPort1], 3>;			defm : SKXWriteResPair<WriteIMul32Reg, [SKXPort1], 3>;
	defm : SKXWriteResPair<WriteIMul64, [SKXPort1,SKXPort5], 4, [1,1], 2>;			defm : SKXWriteResPair<WriteIMul64, [SKXPort1,SKXPort5], 4, [1,1], 2>;
	defm : SKXWriteResPair<WriteMULX64, [SKXPort1,SKXPort5], 4, [1,1], 2>;			defm : SKXWriteResPair<WriteMULX64, [SKXPort1,SKXPort5], 3, [1,1], 2>;
	defm : SKXWriteResPair<WriteIMul64Imm, [SKXPort1], 3>;			defm : SKXWriteResPair<WriteIMul64Imm, [SKXPort1], 3>;
	defm : SKXWriteResPair<WriteIMul64Reg, [SKXPort1], 3>;			defm : SKXWriteResPair<WriteIMul64Reg, [SKXPort1], 3>;
	def SKXWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 3; }			def SKXWriteIMulH : WriteRes<WriteIMulH, []> { let Latency = 4; }
	def : WriteRes<WriteIMulHLd, []> {			def : WriteRes<WriteIMulHLd, []> {
	let Latency = !add(SKXWriteIMulH.Latency, SkylakeServerModel.LoadLatency);			let Latency = !add(SKXWriteIMulH.Latency, SkylakeServerModel.LoadLatency);
	}			}

	defm : X86WriteRes<WriteBSWAP32, [SKXPort15], 1, [1], 1>;			defm : X86WriteRes<WriteBSWAP32, [SKXPort15], 1, [1], 1>;
	defm : X86WriteRes<WriteBSWAP64, [SKXPort06, SKXPort15], 2, [1,1], 2>;			defm : X86WriteRes<WriteBSWAP64, [SKXPort06, SKXPort15], 2, [1,1], 2>;
	defm : X86WriteRes<WriteCMPXCHG,[SKXPort06, SKXPort0156], 5, [2,3], 5>;			defm : X86WriteRes<WriteCMPXCHG,[SKXPort06, SKXPort0156], 5, [2,3], 5>;
	defm : X86WriteRes<WriteCMPXCHGRMW,[SKXPort23,SKXPort06,SKXPort0156,SKXPort237,SKXPort4], 8, [1,2,1,1,1], 6>;			defm : X86WriteRes<WriteCMPXCHGRMW,[SKXPort23,SKXPort06,SKXPort0156,SKXPort237,SKXPort4], 8, [1,2,1,1,1], 6>;
	▲ Show 20 Lines • Show All 2,487 Lines • Show Last 20 Lines

llvm/test/tools/llvm-mca/X86/Haswell/mulx-same-regs.s

	Show All 10 Lines
	# LLVM-MCA-BEGIN			# LLVM-MCA-BEGIN
	mulxq %rax, %rax, %rax			mulxq %rax, %rax, %rax
	# LLVM-MCA-END			# LLVM-MCA-END

	# CHECK: [0] Code Region			# CHECK: [0] Code Region

	# CHECK: Iterations: 2			# CHECK: Iterations: 2
	# CHECK-NEXT: Instructions: 2			# CHECK-NEXT: Instructions: 2
	# CHECK-NEXT: Total Cycles: 10			# CHECK-NEXT: Total Cycles: 11
	# CHECK-NEXT: Total uOps: 8			# CHECK-NEXT: Total uOps: 8

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 0.80			# CHECK-NEXT: uOps Per Cycle: 0.73
	# CHECK-NEXT: IPC: 0.20			# CHECK-NEXT: IPC: 0.18
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 18 Lines
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
	# CHECK-NEXT: - - 0.50 1.00 - - - 0.50 1.00 -			# CHECK-NEXT: - - 0.50 1.00 - - - 0.50 1.00 -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:
	# CHECK-NEXT: - - 0.50 1.00 - - - 0.50 1.00 - mulxl %eax, %eax, %eax			# CHECK-NEXT: - - 0.50 1.00 - - - 0.50 1.00 - mulxl %eax, %eax, %eax

	# CHECK: Timeline view:			# CHECK: Timeline view:
				# CHECK-NEXT: 0
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeeeER . mulxl %eax, %eax, %eax			# CHECK: [0,0] DeeeeER . mulxl %eax, %eax, %eax
	# CHECK-NEXT: [1,0] .D==eeeeER mulxl %eax, %eax, %eax			# CHECK-NEXT: [1,0] .D===eeeeER mulxl %eax, %eax, %eax

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 2 2.0 0.5 0.0 mulxl %eax, %eax, %eax			# CHECK-NEXT: 0. 2 2.5 0.5 0.0 mulxl %eax, %eax, %eax

	# CHECK: [1] Code Region			# CHECK: [1] Code Region

	# CHECK: Iterations: 2			# CHECK: Iterations: 2
	# CHECK-NEXT: Instructions: 2			# CHECK-NEXT: Instructions: 2
	# CHECK-NEXT: Total Cycles: 10			# CHECK-NEXT: Total Cycles: 11
	# CHECK-NEXT: Total uOps: 6			# CHECK-NEXT: Total uOps: 6

	# CHECK: Dispatch Width: 4			# CHECK: Dispatch Width: 4
	# CHECK-NEXT: uOps Per Cycle: 0.60			# CHECK-NEXT: uOps Per Cycle: 0.55
	# CHECK-NEXT: IPC: 0.20			# CHECK-NEXT: IPC: 0.18
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 18 Lines
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
	# CHECK-NEXT: - - - 1.00 - - - - 1.00 -			# CHECK-NEXT: - - - 1.00 - - - - 1.00 -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:
	# CHECK-NEXT: - - - 1.00 - - - - 1.00 - mulxq %rax, %rax, %rax			# CHECK-NEXT: - - - 1.00 - - - - 1.00 - mulxq %rax, %rax, %rax

	# CHECK: Timeline view:			# CHECK: Timeline view:
				# CHECK-NEXT: 0
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeeeER . mulxq %rax, %rax, %rax			# CHECK: [0,0] DeeeeER . mulxq %rax, %rax, %rax
	# CHECK-NEXT: [1,0] .D==eeeeER mulxq %rax, %rax, %rax			# CHECK-NEXT: [1,0] .D===eeeeER mulxq %rax, %rax, %rax

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 2 2.0 0.5 0.0 mulxq %rax, %rax, %rax			# CHECK-NEXT: 0. 2 2.5 0.5 0.0 mulxq %rax, %rax, %rax

llvm/test/tools/llvm-mca/X86/SkylakeClient/mulx-same-regs.s

	Show All 10 Lines
	# LLVM-MCA-BEGIN			# LLVM-MCA-BEGIN
	mulxq %rax, %rax, %rax			mulxq %rax, %rax, %rax
	# LLVM-MCA-END			# LLVM-MCA-END

	# CHECK: [0] Code Region			# CHECK: [0] Code Region

	# CHECK: Iterations: 2			# CHECK: Iterations: 2
	# CHECK-NEXT: Instructions: 2			# CHECK-NEXT: Instructions: 2
	# CHECK-NEXT: Total Cycles: 10			# CHECK-NEXT: Total Cycles: 11
	# CHECK-NEXT: Total uOps: 8			# CHECK-NEXT: Total uOps: 8

	# CHECK: Dispatch Width: 6			# CHECK: Dispatch Width: 6
	# CHECK-NEXT: uOps Per Cycle: 0.80			# CHECK-NEXT: uOps Per Cycle: 0.73
	# CHECK-NEXT: IPC: 0.20			# CHECK-NEXT: IPC: 0.18
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 18 Lines
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
	# CHECK-NEXT: - - 0.50 1.00 - - - 0.50 1.00 -			# CHECK-NEXT: - - 0.50 1.00 - - - 0.50 1.00 -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:
	# CHECK-NEXT: - - 0.50 1.00 - - - 0.50 1.00 - mulxl %eax, %eax, %eax			# CHECK-NEXT: - - 0.50 1.00 - - - 0.50 1.00 - mulxl %eax, %eax, %eax

	# CHECK: Timeline view:			# CHECK: Timeline view:
				# CHECK-NEXT: 0
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeeeER . mulxl %eax, %eax, %eax			# CHECK: [0,0] DeeeeER . mulxl %eax, %eax, %eax
	# CHECK-NEXT: [1,0] .D==eeeeER mulxl %eax, %eax, %eax			# CHECK-NEXT: [1,0] .D===eeeeER mulxl %eax, %eax, %eax

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 2 2.0 0.5 0.0 mulxl %eax, %eax, %eax			# CHECK-NEXT: 0. 2 2.5 0.5 0.0 mulxl %eax, %eax, %eax

	# CHECK: [1] Code Region			# CHECK: [1] Code Region

	# CHECK: Iterations: 2			# CHECK: Iterations: 2
	# CHECK-NEXT: Instructions: 2			# CHECK-NEXT: Instructions: 2
	# CHECK-NEXT: Total Cycles: 10			# CHECK-NEXT: Total Cycles: 11
	# CHECK-NEXT: Total uOps: 6			# CHECK-NEXT: Total uOps: 6

	# CHECK: Dispatch Width: 6			# CHECK: Dispatch Width: 6
	# CHECK-NEXT: uOps Per Cycle: 0.60			# CHECK-NEXT: uOps Per Cycle: 0.55
	# CHECK-NEXT: IPC: 0.20			# CHECK-NEXT: IPC: 0.18
	# CHECK-NEXT: Block RThroughput: 1.0			# CHECK-NEXT: Block RThroughput: 1.0

	# CHECK: Instruction Info:			# CHECK: Instruction Info:
	# CHECK-NEXT: [1]: #uOps			# CHECK-NEXT: [1]: #uOps
	# CHECK-NEXT: [2]: Latency			# CHECK-NEXT: [2]: Latency
	# CHECK-NEXT: [3]: RThroughput			# CHECK-NEXT: [3]: RThroughput
	# CHECK-NEXT: [4]: MayLoad			# CHECK-NEXT: [4]: MayLoad
	# CHECK-NEXT: [5]: MayStore			# CHECK-NEXT: [5]: MayStore
	Show All 18 Lines
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
	# CHECK-NEXT: - - - 1.00 - - - 1.00 - -			# CHECK-NEXT: - - - 1.00 - - - 1.00 - -

	# CHECK: Resource pressure by instruction:			# CHECK: Resource pressure by instruction:
	# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:			# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Instructions:
	# CHECK-NEXT: - - - 1.00 - - - 1.00 - - mulxq %rax, %rax, %rax			# CHECK-NEXT: - - - 1.00 - - - 1.00 - - mulxq %rax, %rax, %rax

	# CHECK: Timeline view:			# CHECK: Timeline view:
				# CHECK-NEXT: 0
	# CHECK-NEXT: Index 0123456789			# CHECK-NEXT: Index 0123456789

	# CHECK: [0,0] DeeeeER . mulxq %rax, %rax, %rax			# CHECK: [0,0] DeeeeER . mulxq %rax, %rax, %rax
	# CHECK-NEXT: [1,0] D===eeeeER mulxq %rax, %rax, %rax			# CHECK-NEXT: [1,0] D====eeeeER mulxq %rax, %rax, %rax

	# CHECK: Average Wait times (based on the timeline view):			# CHECK: Average Wait times (based on the timeline view):
	# CHECK-NEXT: [0]: Executions			# CHECK-NEXT: [0]: Executions
	# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue			# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
	# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready			# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
	# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage			# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage

	# CHECK: [0] [1] [2] [3]			# CHECK: [0] [1] [2] [3]
	# CHECK-NEXT: 0. 2 2.5 0.5 0.0 mulxq %rax, %rax, %rax			# CHECK-NEXT: 0. 2 3.0 0.5 0.0 mulxq %rax, %rax, %rax

This is an archive of the discontinued LLVM Phabricator instance.

[X86][MCA] Address other issues with MULX reported in PR51495.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 368844

llvm/lib/MCA/HardwareUnits/RegisterFile.cpp

llvm/lib/Target/X86/X86InstrArithmetic.td

llvm/lib/Target/X86/X86SchedBroadwell.td

llvm/lib/Target/X86/X86SchedHaswell.td

llvm/lib/Target/X86/X86SchedSandyBridge.td

llvm/lib/Target/X86/X86SchedSkylakeClient.td

llvm/lib/Target/X86/X86SchedSkylakeServer.td

llvm/test/tools/llvm-mca/X86/Haswell/mulx-same-regs.s

llvm/test/tools/llvm-mca/X86/SkylakeClient/mulx-same-regs.s

[X86][MCA] Address other issues with MULX reported in PR51495.
ClosedPublic