This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add SchedRW for PMULLD
ClosedPublic

Authored by craig.topper on Mar 27 2018, 10:47 PM.

Download Raw Diff

Details

Reviewers

RKSimon
GGanesh
courbet

Commits

rG13a0f83a05ff: [X86] Add SchedRW for PMULLD
rL328914: [X86] Add SchedRW for PMULLD

Summary

It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Diff Detail

Event Timeline

craig.topper created this revision.Mar 27 2018, 10:47 PM

courbet accepted this revision.Mar 28 2018, 12:22 AM

courbet added inline comments.

lib/Target/X86/X86SchedSkylakeClient.td
163	llvm-exegesis measures 2xP01 here.

This revision is now accepted and ready to land.Mar 28 2018, 12:22 AM

courbet requested changes to this revision.Mar 28 2018, 12:23 AM

This revision now requires changes to proceed.Mar 28 2018, 12:23 AM

You will need to check the llvm-mca tests as well - some of those will have changed (sorry no update script - its manual!)

andreadb added a subscriber: andreadb.Mar 28 2018, 4:00 AM

Fix the ports for SKL. Remove the register form override from most of the schedulers. I left the memory form overrides in place. Update the llvm-mca tests.

Herald added a subscriber: gbedwell. · View Herald TranscriptMar 28 2018, 10:23 AM

courbet added inline comments.Mar 29 2018, 2:01 AM

lib/Target/X86/X86SchedSkylakeServer.td
163	I don't have a skylake server to test that, but I'm surprised that this is different from SKL. Is this a typo ?

The btver2 change looks good to me. Thanks!

RKSimon added inline comments.Mar 29 2018, 4:30 AM

lib/Target/X86/X86SchedBroadwell.td
166	Remove BWWriteResGroup148 ((V?)PMULLDrm) overload and just leave BWWriteResGroup151 (VPMULLDYrm) ? Same for others.

craig.topper added inline comments.Mar 29 2018, 7:55 AM

lib/Target/X86/X86SchedSkylakeServer.td
163	SKX adds an extra FMA unit and vector multiplier in port 5 for AVX512. 512-bit operations combine the 256-bit port0 and 1 units. So an extra unit was added to maintain 2 ports for 512-bit. I’m not sure the port 5 unit can be used for 128 and 256 bit, but the scheduler model thinks so. The scheduler model definitely doesn’t model 512 bit correctly, but that’s a larger problem than I want to fix here.

courbet added a subscriber: gchatelet.Mar 29 2018, 8:06 AM

Remove 128-bit memory instructions from most of the models. Didn't touch Skylake Server because there are many things about it that I don't understand.

GGanesh added inline comments.Mar 29 2018, 9:20 PM

lib/Target/X86/X86ScheduleZnver1.td
212	This needs a fix definitely. I will do it!

LGTM and then @GGanesh can fix the Zn model afterward.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 30 2018, 10:00 PM

Closed by commit rL328914: [X86] Add SchedRW for PMULLD (authored by ctopper). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

2 lines

2 lines

1 line

1 line

X86SchedSandyBridge.td

3 lines

X86SchedSkylakeClient.td

1 line

X86SchedSkylakeServer.td

1 line

1 line

1 line

1 line

1 line

test/

CodeGen/

X86/

avx2-schedule.ll

2 lines

slow-pmulld.ll

42 lines

sse41-schedule.ll

12 lines

Diff 140040

lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,499 Lines • ▼ Show 20 Lines	defm VPADDS : avx512_binop_rm_vl_bw<0xEC, 0xED, "vpadds", X86adds,
SSE_INTALU_ITINS_P, HasBWI, 1>;		SSE_INTALU_ITINS_P, HasBWI, 1>;
defm VPSUBS : avx512_binop_rm_vl_bw<0xE8, 0xE9, "vpsubs", X86subs,		defm VPSUBS : avx512_binop_rm_vl_bw<0xE8, 0xE9, "vpsubs", X86subs,
SSE_INTALU_ITINS_P, HasBWI, 0>;		SSE_INTALU_ITINS_P, HasBWI, 0>;
defm VPADDUS : avx512_binop_rm_vl_bw<0xDC, 0xDD, "vpaddus", X86addus,		defm VPADDUS : avx512_binop_rm_vl_bw<0xDC, 0xDD, "vpaddus", X86addus,
SSE_INTALU_ITINS_P, HasBWI, 1>;		SSE_INTALU_ITINS_P, HasBWI, 1>;
defm VPSUBUS : avx512_binop_rm_vl_bw<0xD8, 0xD9, "vpsubus", X86subus,		defm VPSUBUS : avx512_binop_rm_vl_bw<0xD8, 0xD9, "vpsubus", X86subus,
SSE_INTALU_ITINS_P, HasBWI, 0>;		SSE_INTALU_ITINS_P, HasBWI, 0>;
defm VPMULLD : avx512_binop_rm_vl_d<0x40, "vpmulld", mul,		defm VPMULLD : avx512_binop_rm_vl_d<0x40, "vpmulld", mul,
SSE_INTMUL_ITINS_P, HasAVX512, 1>, T8PD;		SSE_PMULLD_ITINS, HasAVX512, 1>, T8PD;
defm VPMULLW : avx512_binop_rm_vl_w<0xD5, "vpmullw", mul,		defm VPMULLW : avx512_binop_rm_vl_w<0xD5, "vpmullw", mul,
SSE_INTMUL_ITINS_P, HasBWI, 1>;		SSE_INTMUL_ITINS_P, HasBWI, 1>;
defm VPMULLQ : avx512_binop_rm_vl_q<0x40, "vpmullq", mul,		defm VPMULLQ : avx512_binop_rm_vl_q<0x40, "vpmullq", mul,
SSE_INTMUL_ITINS_P, HasDQI, 1>, T8PD;		SSE_INTMUL_ITINS_P, HasDQI, 1>, T8PD;
defm VPMULHW : avx512_binop_rm_vl_w<0xE5, "vpmulhw", mulhs, SSE_INTMUL_ITINS_P,		defm VPMULHW : avx512_binop_rm_vl_w<0xE5, "vpmulhw", mulhs, SSE_INTMUL_ITINS_P,
HasBWI, 1>;		HasBWI, 1>;
defm VPMULHUW : avx512_binop_rm_vl_w<0xE4, "vpmulhuw", mulhu, SSE_INTMUL_ITINS_P,		defm VPMULHUW : avx512_binop_rm_vl_w<0xE4, "vpmulhuw", mulhu, SSE_INTMUL_ITINS_P,
HasBWI, 1>;		HasBWI, 1>;
▲ Show 20 Lines • Show All 6,413 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	def SSE_INSERT_ITINS : OpndItins<
IIC_SSE_INSERTPS_RR, IIC_SSE_INSERTPS_RM		IIC_SSE_INSERTPS_RR, IIC_SSE_INSERTPS_RM
>;		>;

let Sched = WriteMPSAD in		let Sched = WriteMPSAD in
def SSE_MPSADBW_ITINS : OpndItins<		def SSE_MPSADBW_ITINS : OpndItins<
IIC_SSE_MPSADBW_RR, IIC_SSE_MPSADBW_RM		IIC_SSE_MPSADBW_RR, IIC_SSE_MPSADBW_RM
>;		>;

let Sched = WriteVecIMul in		let Sched = WritePMULLD in
def SSE_PMULLD_ITINS : OpndItins<		def SSE_PMULLD_ITINS : OpndItins<
IIC_SSE_PMULLD_RR, IIC_SSE_PMULLD_RM		IIC_SSE_PMULLD_RR, IIC_SSE_PMULLD_RM
>;		>;

// Definitions for backward compatibility.		// Definitions for backward compatibility.
// The instructions mapped on these definitions uses a different itinerary		// The instructions mapped on these definitions uses a different itinerary
// than the actual scheduling model.		// than the actual scheduling model.
let Sched = WriteShuffle in		let Sched = WriteShuffle in
▲ Show 20 Lines • Show All 8,435 Lines • Show Last 20 Lines

lib/Target/X86/X86SchedBroadwell.td

	Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	// Vector integer operations.			// Vector integer operations.
	def : WriteRes<WriteVecLoad, [BWPort23]> { let Latency = 5; }			def : WriteRes<WriteVecLoad, [BWPort23]> { let Latency = 5; }
	def : WriteRes<WriteVecStore, [BWPort237, BWPort4]>;			def : WriteRes<WriteVecStore, [BWPort237, BWPort4]>;
	def : WriteRes<WriteVecMove, [BWPort015]>;			def : WriteRes<WriteVecMove, [BWPort015]>;

	defm : BWWriteResPair<WriteVecALU, [BWPort15], 1>; // Vector integer ALU op, no logicals.			defm : BWWriteResPair<WriteVecALU, [BWPort15], 1>; // Vector integer ALU op, no logicals.
	defm : BWWriteResPair<WriteVecShift, [BWPort0], 1>; // Vector integer shifts.			defm : BWWriteResPair<WriteVecShift, [BWPort0], 1>; // Vector integer shifts.
	defm : BWWriteResPair<WriteVecIMul, [BWPort0], 5>; // Vector integer multiply.			defm : BWWriteResPair<WriteVecIMul, [BWPort0], 5>; // Vector integer multiply.
				defm : BWWriteResPair<WritePMULLD, [BWPort0], 10, [2], 2>; // PMULLD
				RKSimonUnsubmitted Not Done Reply Inline Actions Remove BWWriteResGroup148 ((V?)PMULLDrm) overload and just leave BWWriteResGroup151 (VPMULLDYrm) ? Same for others. RKSimon: Remove BWWriteResGroup148 ((V?)PMULLDrm) overload and just leave BWWriteResGroup151…
	defm : BWWriteResPair<WriteShuffle, [BWPort5], 1>; // Vector shuffles.			defm : BWWriteResPair<WriteShuffle, [BWPort5], 1>; // Vector shuffles.
	defm : BWWriteResPair<WriteBlend, [BWPort15], 1>; // Vector blends.			defm : BWWriteResPair<WriteBlend, [BWPort15], 1>; // Vector blends.
	defm : BWWriteResPair<WriteVarBlend, [BWPort5], 2, [2]>; // Vector variable blends.			defm : BWWriteResPair<WriteVarBlend, [BWPort5], 2, [2]>; // Vector variable blends.
	defm : BWWriteResPair<WriteMPSAD, [BWPort0, BWPort5], 6, [1, 2]>; // Vector MPSAD.			defm : BWWriteResPair<WriteMPSAD, [BWPort0, BWPort5], 6, [1, 2]>; // Vector MPSAD.

	// Vector bitwise operations.			// Vector bitwise operations.
	// These are often used on both floating point and integer vectors.			// These are often used on both floating point and integer vectors.
	defm : BWWriteResPair<WriteVecLogic, [BWPort015], 1>; // Vector and/or/xor.			defm : BWWriteResPair<WriteVecLogic, [BWPort015], 1>; // Vector and/or/xor.
	▲ Show 20 Lines • Show All 2,688 Lines • Show Last 20 Lines

lib/Target/X86/X86SchedHaswell.td

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	def : WriteRes<WriteVecStore, [HWPort237, HWPort4]>;			def : WriteRes<WriteVecStore, [HWPort237, HWPort4]>;
	def : WriteRes<WriteVecLoad, [HWPort23]> { let Latency = 5; }			def : WriteRes<WriteVecLoad, [HWPort23]> { let Latency = 5; }
	def : WriteRes<WriteVecMove, [HWPort015]>;			def : WriteRes<WriteVecMove, [HWPort015]>;

	defm : HWWriteResPair<WriteVecShift, [HWPort0], 1>;			defm : HWWriteResPair<WriteVecShift, [HWPort0], 1>;
	defm : HWWriteResPair<WriteVecLogic, [HWPort015], 1>;			defm : HWWriteResPair<WriteVecLogic, [HWPort015], 1>;
	defm : HWWriteResPair<WriteVecALU, [HWPort15], 1>;			defm : HWWriteResPair<WriteVecALU, [HWPort15], 1>;
	defm : HWWriteResPair<WriteVecIMul, [HWPort0], 5>;			defm : HWWriteResPair<WriteVecIMul, [HWPort0], 5>;
				defm : HWWriteResPair<WritePMULLD, [HWPort0], 10, [2], 2>;
	defm : HWWriteResPair<WriteShuffle, [HWPort5], 1>;			defm : HWWriteResPair<WriteShuffle, [HWPort5], 1>;
	defm : HWWriteResPair<WriteBlend, [HWPort15], 1>;			defm : HWWriteResPair<WriteBlend, [HWPort15], 1>;
	defm : HWWriteResPair<WriteShuffle256, [HWPort5], 3>;			defm : HWWriteResPair<WriteShuffle256, [HWPort5], 3>;
	defm : HWWriteResPair<WriteVarBlend, [HWPort5], 2, [2]>;			defm : HWWriteResPair<WriteVarBlend, [HWPort5], 2, [2]>;
	defm : HWWriteResPair<WriteVarVecShift, [HWPort0, HWPort5], 2, [2, 1]>;			defm : HWWriteResPair<WriteVarVecShift, [HWPort0, HWPort5], 2, [2, 1]>;
	defm : HWWriteResPair<WriteMPSAD, [HWPort0, HWPort5], 6, [1, 2]>;			defm : HWWriteResPair<WriteMPSAD, [HWPort0, HWPort5], 6, [1, 2]>;

	// String instructions.			// String instructions.
	▲ Show 20 Lines • Show All 3,007 Lines • Show Last 20 Lines

lib/Target/X86/X86SchedSandyBridge.td

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
def : WriteRes<WriteVecStore, [SBPort23, SBPort4]>;		def : WriteRes<WriteVecStore, [SBPort23, SBPort4]>;
def : WriteRes<WriteVecLoad, [SBPort23]> { let Latency = 6; }		def : WriteRes<WriteVecLoad, [SBPort23]> { let Latency = 6; }
def : WriteRes<WriteVecMove, [SBPort05]>;		def : WriteRes<WriteVecMove, [SBPort05]>;

defm : SBWriteResPair<WriteVecShift, [SBPort5], 1>;		defm : SBWriteResPair<WriteVecShift, [SBPort5], 1>;
defm : SBWriteResPair<WriteVecLogic, [SBPort5], 1>;		defm : SBWriteResPair<WriteVecLogic, [SBPort5], 1>;
defm : SBWriteResPair<WriteVecALU, [SBPort1], 3>;		defm : SBWriteResPair<WriteVecALU, [SBPort1], 3>;
defm : SBWriteResPair<WriteVecIMul, [SBPort0], 5>;		defm : SBWriteResPair<WriteVecIMul, [SBPort0], 5>;
		defm : SBWriteResPair<WritePMULLD, [SBPort0], 5, [1], 1, 6>; // TODO this is probably wrong for 256/512-bit for the "generic" model
defm : SBWriteResPair<WriteShuffle, [SBPort5], 1>;		defm : SBWriteResPair<WriteShuffle, [SBPort5], 1>;
defm : SBWriteResPair<WriteBlend, [SBPort15], 1>;		defm : SBWriteResPair<WriteBlend, [SBPort15], 1>;
defm : SBWriteResPair<WriteVarBlend, [SBPort1, SBPort5], 2>;		defm : SBWriteResPair<WriteVarBlend, [SBPort1, SBPort5], 2>;
defm : SBWriteResPair<WriteMPSAD, [SBPort0, SBPort15], 5, [1,2], 3>;		defm : SBWriteResPair<WriteMPSAD, [SBPort0, SBPort15], 5, [1,2], 3>;

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
// Horizontal add/sub instructions.		// Horizontal add/sub instructions.
////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
▲ Show 20 Lines • Show All 505 Lines • ▼ Show 20 Lines	def: InstRW<[SBWriteResGroup20], (instregex "MMX_PMADDUBSWrr",
"MMX_PMULUDQirr",		"MMX_PMULUDQirr",
"MMX_PSADBWirr",		"MMX_PSADBWirr",
"(V?)PMADDUBSWrr",		"(V?)PMADDUBSWrr",
"(V?)PMADDWDrr",		"(V?)PMADDWDrr",
"(V?)PMULDQrr",		"(V?)PMULDQrr",
"(V?)PMULHRSWrr",		"(V?)PMULHRSWrr",
"(V?)PMULHUWrr",		"(V?)PMULHUWrr",
"(V?)PMULHWrr",		"(V?)PMULHWrr",
"(V?)PMULLDrr",
"(V?)PMULLWrr",		"(V?)PMULLWrr",
"(V?)PMULUDQrr",		"(V?)PMULUDQrr",
"(V?)PSADBWrr")>;		"(V?)PSADBWrr")>;

def SBWriteResGroup21 : SchedWriteRes<[SBPort1]> {		def SBWriteResGroup21 : SchedWriteRes<[SBPort1]> {
let Latency = 3;		let Latency = 3;
let NumMicroOps = 1;		let NumMicroOps = 1;
let ResourceCycles = [1];		let ResourceCycles = [1];
▲ Show 20 Lines • Show All 913 Lines • ▼ Show 20 Lines	def SBWriteResGroup89 : SchedWriteRes<[SBPort0,SBPort23]> {
let ResourceCycles = [1,1];		let ResourceCycles = [1,1];
}		}
def: InstRW<[SBWriteResGroup89], (instregex "(V?)PMADDUBSWrm",		def: InstRW<[SBWriteResGroup89], (instregex "(V?)PMADDUBSWrm",
"(V?)PMADDWDrm",		"(V?)PMADDWDrm",
"(V?)PMULDQrm",		"(V?)PMULDQrm",
"(V?)PMULHRSWrm",		"(V?)PMULHRSWrm",
"(V?)PMULHUWrm",		"(V?)PMULHUWrm",
"(V?)PMULHWrm",		"(V?)PMULHWrm",
"(V?)PMULLDrm",
"(V?)PMULLWrm",		"(V?)PMULLWrm",
"(V?)PMULUDQrm",		"(V?)PMULUDQrm",
"(V?)PSADBWrm")>;		"(V?)PSADBWrm")>;

def SBWriteResGroup89_2 : SchedWriteRes<[SBPort0,SBPort23]> {		def SBWriteResGroup89_2 : SchedWriteRes<[SBPort0,SBPort23]> {
let Latency = 10;		let Latency = 10;
let NumMicroOps = 2;		let NumMicroOps = 2;
let ResourceCycles = [1,1];		let ResourceCycles = [1,1];
▲ Show 20 Lines • Show All 461 Lines • Show Last 20 Lines

lib/Target/X86/X86SchedSkylakeClient.td

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	// Vector integer operations.			// Vector integer operations.
	def : WriteRes<WriteVecLoad, [SKLPort23]> { let Latency = 6; }			def : WriteRes<WriteVecLoad, [SKLPort23]> { let Latency = 6; }
	def : WriteRes<WriteVecStore, [SKLPort237, SKLPort4]>;			def : WriteRes<WriteVecStore, [SKLPort237, SKLPort4]>;
	def : WriteRes<WriteVecMove, [SKLPort015]>;			def : WriteRes<WriteVecMove, [SKLPort015]>;

	defm : SKLWriteResPair<WriteVecALU, [SKLPort15], 1>; // Vector integer ALU op, no logicals.			defm : SKLWriteResPair<WriteVecALU, [SKLPort15], 1>; // Vector integer ALU op, no logicals.
	defm : SKLWriteResPair<WriteVecShift, [SKLPort0], 1>; // Vector integer shifts.			defm : SKLWriteResPair<WriteVecShift, [SKLPort0], 1>; // Vector integer shifts.
	defm : SKLWriteResPair<WriteVecIMul, [SKLPort0], 5>; // Vector integer multiply.			defm : SKLWriteResPair<WriteVecIMul, [SKLPort0], 5>; // Vector integer multiply.
				defm : SKLWriteResPair<WritePMULLD, [SKLPort0], 10, [2], 2>;
				courbetUnsubmitted Not Done Reply Inline Actions llvm-exegesis measures 2xP01 here. courbet: llvm-exegesis measures 2xP01 here.
	defm : SKLWriteResPair<WriteShuffle, [SKLPort5], 1>; // Vector shuffles.			defm : SKLWriteResPair<WriteShuffle, [SKLPort5], 1>; // Vector shuffles.
	defm : SKLWriteResPair<WriteBlend, [SKLPort15], 1>; // Vector blends.			defm : SKLWriteResPair<WriteBlend, [SKLPort15], 1>; // Vector blends.
	defm : SKLWriteResPair<WriteVarBlend, [SKLPort5], 2, [2]>; // Vector variable blends.			defm : SKLWriteResPair<WriteVarBlend, [SKLPort5], 2, [2]>; // Vector variable blends.
	defm : SKLWriteResPair<WriteMPSAD, [SKLPort0, SKLPort5], 6, [1, 2]>; // Vector MPSAD.			defm : SKLWriteResPair<WriteMPSAD, [SKLPort0, SKLPort5], 6, [1, 2]>; // Vector MPSAD.

	// Vector bitwise operations.			// Vector bitwise operations.
	// These are often used on both floating point and integer vectors.			// These are often used on both floating point and integer vectors.
	defm : SKLWriteResPair<WriteVecLogic, [SKLPort015], 1>; // Vector and/or/xor.			defm : SKLWriteResPair<WriteVecLogic, [SKLPort015], 1>; // Vector and/or/xor.
	▲ Show 20 Lines • Show All 2,787 Lines • Show Last 20 Lines

lib/Target/X86/X86SchedSkylakeServer.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	// Vector integer operations.			// Vector integer operations.
	def : WriteRes<WriteVecLoad, [SKXPort23]> { let Latency = 5; }			def : WriteRes<WriteVecLoad, [SKXPort23]> { let Latency = 5; }
	def : WriteRes<WriteVecStore, [SKXPort237, SKXPort4]>;			def : WriteRes<WriteVecStore, [SKXPort237, SKXPort4]>;
	def : WriteRes<WriteVecMove, [SKXPort015]>;			def : WriteRes<WriteVecMove, [SKXPort015]>;

	defm : SKXWriteResPair<WriteVecALU, [SKXPort15], 1>; // Vector integer ALU op, no logicals.			defm : SKXWriteResPair<WriteVecALU, [SKXPort15], 1>; // Vector integer ALU op, no logicals.
	defm : SKXWriteResPair<WriteVecShift, [SKXPort0], 1>; // Vector integer shifts.			defm : SKXWriteResPair<WriteVecShift, [SKXPort0], 1>; // Vector integer shifts.
	defm : SKXWriteResPair<WriteVecIMul, [SKXPort0], 5>; // Vector integer multiply.			defm : SKXWriteResPair<WriteVecIMul, [SKXPort0], 5>; // Vector integer multiply.
				defm : SKXWriteResPair<WritePMULLD, [SKXPort0], 10, [2], 2>; // Vector integer multiply.
				courbetUnsubmitted Not Done Reply Inline Actions I don't have a skylake server to test that, but I'm surprised that this is different from SKL. Is this a typo ? courbet: I don't have a skylake server to test that, but I'm surprised that this is different from SKL.
				craig.topperAuthorUnsubmitted Not Done Reply Inline Actions SKX adds an extra FMA unit and vector multiplier in port 5 for AVX512. 512-bit operations combine the 256-bit port0 and 1 units. So an extra unit was added to maintain 2 ports for 512-bit. I’m not sure the port 5 unit can be used for 128 and 256 bit, but the scheduler model thinks so. The scheduler model definitely doesn’t model 512 bit correctly, but that’s a larger problem than I want to fix here. craig.topper: SKX adds an extra FMA unit and vector multiplier in port 5 for AVX512. 512-bit operations…
	defm : SKXWriteResPair<WriteShuffle, [SKXPort5], 1>; // Vector shuffles.			defm : SKXWriteResPair<WriteShuffle, [SKXPort5], 1>; // Vector shuffles.
	defm : SKXWriteResPair<WriteBlend, [SKXPort15], 1>; // Vector blends.			defm : SKXWriteResPair<WriteBlend, [SKXPort15], 1>; // Vector blends.
	defm : SKXWriteResPair<WriteVarBlend, [SKXPort5], 2, [2]>; // Vector variable blends.			defm : SKXWriteResPair<WriteVarBlend, [SKXPort5], 2, [2]>; // Vector variable blends.
	defm : SKXWriteResPair<WriteMPSAD, [SKXPort0, SKXPort5], 6, [1, 2]>; // Vector MPSAD.			defm : SKXWriteResPair<WriteMPSAD, [SKXPort0, SKXPort5], 6, [1, 2]>; // Vector MPSAD.

	// Vector bitwise operations.			// Vector bitwise operations.
	// These are often used on both floating point and integer vectors.			// These are often used on both floating point and integer vectors.
	defm : SKXWriteResPair<WriteVecLogic, [SKXPort015], 1>; // Vector and/or/xor.			defm : SKXWriteResPair<WriteVecLogic, [SKXPort015], 1>; // Vector and/or/xor.
	▲ Show 20 Lines • Show All 5,958 Lines • Show Last 20 Lines

lib/Target/X86/X86Schedule.td

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

	// Vector integer operations.			// Vector integer operations.
	def WriteVecLoad : SchedWrite;			def WriteVecLoad : SchedWrite;
	def WriteVecStore : SchedWrite;			def WriteVecStore : SchedWrite;
	def WriteVecMove : SchedWrite;			def WriteVecMove : SchedWrite;
	defm WriteVecALU : X86SchedWritePair; // Vector integer ALU op, no logicals.			defm WriteVecALU : X86SchedWritePair; // Vector integer ALU op, no logicals.
	defm WriteVecShift : X86SchedWritePair; // Vector integer shifts.			defm WriteVecShift : X86SchedWritePair; // Vector integer shifts.
	defm WriteVecIMul : X86SchedWritePair; // Vector integer multiply.			defm WriteVecIMul : X86SchedWritePair; // Vector integer multiply.
				defm WritePMULLD : X86SchedWritePair; // PMULLD
	defm WriteShuffle : X86SchedWritePair; // Vector shuffles.			defm WriteShuffle : X86SchedWritePair; // Vector shuffles.
	defm WriteBlend : X86SchedWritePair; // Vector blends.			defm WriteBlend : X86SchedWritePair; // Vector blends.
	defm WriteVarBlend : X86SchedWritePair; // Vector variable blends.			defm WriteVarBlend : X86SchedWritePair; // Vector variable blends.
	defm WriteMPSAD : X86SchedWritePair; // Vector MPSAD.			defm WriteMPSAD : X86SchedWritePair; // Vector MPSAD.

	// Vector bitwise operations.			// Vector bitwise operations.
	// These are often used on both floating point and integer vectors.			// These are often used on both floating point and integer vectors.
	defm WriteVecLogic : X86SchedWritePair; // Vector and/or/xor.			defm WriteVecLogic : X86SchedWritePair; // Vector and/or/xor.
	▲ Show 20 Lines • Show All 617 Lines • Show Last 20 Lines

lib/Target/X86/X86ScheduleBtVer2.td

	Show First 20 Lines • Show All 335 Lines • ▼ Show 20 Lines

	def : WriteRes<WriteVecLoad, [JLAGU, JFPU01, JVALU]> { let Latency = 5; }			def : WriteRes<WriteVecLoad, [JLAGU, JFPU01, JVALU]> { let Latency = 5; }
	def : WriteRes<WriteVecStore, [JSAGU, JFPU1, JSTC]>;			def : WriteRes<WriteVecStore, [JSAGU, JFPU1, JSTC]>;
	def : WriteRes<WriteVecMove, [JFPU01, JVALU]>;			def : WriteRes<WriteVecMove, [JFPU01, JVALU]>;

	defm : JWriteResFpuPair<WriteVecALU, [JFPU01, JVALU], 1>;			defm : JWriteResFpuPair<WriteVecALU, [JFPU01, JVALU], 1>;
	defm : JWriteResFpuPair<WriteVecShift, [JFPU01, JVALU], 1>;			defm : JWriteResFpuPair<WriteVecShift, [JFPU01, JVALU], 1>;
	defm : JWriteResFpuPair<WriteVecIMul, [JFPU0, JVIMUL], 2>;			defm : JWriteResFpuPair<WriteVecIMul, [JFPU0, JVIMUL], 2>;
				defm : JWriteResFpuPair<WritePMULLD, [JFPU0, JFPU01, JVIMUL, JVALU], 4, [2, 1, 2, 1], 3>;
	defm : JWriteResFpuPair<WriteMPSAD, [JFPU0, JVIMUL], 3, [1, 2]>;			defm : JWriteResFpuPair<WriteMPSAD, [JFPU0, JVIMUL], 3, [1, 2]>;
	defm : JWriteResFpuPair<WriteShuffle, [JFPU01, JVALU], 1>;			defm : JWriteResFpuPair<WriteShuffle, [JFPU01, JVALU], 1>;
	defm : JWriteResFpuPair<WriteBlend, [JFPU01, JVALU], 1>;			defm : JWriteResFpuPair<WriteBlend, [JFPU01, JVALU], 1>;
	defm : JWriteResFpuPair<WriteVarBlend, [JFPU01, JVALU], 2, [1, 4], 3>;			defm : JWriteResFpuPair<WriteVarBlend, [JFPU01, JVALU], 2, [1, 4], 3>;
	defm : JWriteResFpuPair<WriteVecLogic, [JFPU01, JVALU], 1>;			defm : JWriteResFpuPair<WriteVecLogic, [JFPU01, JVALU], 1>;
	defm : JWriteResFpuPair<WriteShuffle256, [JFPU01, JVALU], 1>;			defm : JWriteResFpuPair<WriteShuffle256, [JFPU01, JVALU], 1>;
	defm : JWriteResFpuPair<WriteVarVecShift, [JFPU01, JVALU], 1>; // NOTE: Doesn't exist on Jaguar.			defm : JWriteResFpuPair<WriteVarVecShift, [JFPU01, JVALU], 1>; // NOTE: Doesn't exist on Jaguar.

	▲ Show 20 Lines • Show All 507 Lines • Show Last 20 Lines

lib/Target/X86/X86ScheduleSLM.td

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	def : WriteRes<WriteVecStore, [SLM_FPC_RSV01, SLM_MEC_RSV]>;			def : WriteRes<WriteVecStore, [SLM_FPC_RSV01, SLM_MEC_RSV]>;
	def : WriteRes<WriteVecLoad, [SLM_MEC_RSV]> { let Latency = 3; }			def : WriteRes<WriteVecLoad, [SLM_MEC_RSV]> { let Latency = 3; }
	def : WriteRes<WriteVecMove, [SLM_FPC_RSV01]>;			def : WriteRes<WriteVecMove, [SLM_FPC_RSV01]>;

	defm : SLMWriteResPair<WriteVecShift, [SLM_FPC_RSV0], 1>;			defm : SLMWriteResPair<WriteVecShift, [SLM_FPC_RSV0], 1>;
	defm : SLMWriteResPair<WriteVecLogic, [SLM_FPC_RSV01], 1>;			defm : SLMWriteResPair<WriteVecLogic, [SLM_FPC_RSV01], 1>;
	defm : SLMWriteResPair<WriteVecALU, [SLM_FPC_RSV01], 1>;			defm : SLMWriteResPair<WriteVecALU, [SLM_FPC_RSV01], 1>;
	defm : SLMWriteResPair<WriteVecIMul, [SLM_FPC_RSV0], 4>;			defm : SLMWriteResPair<WriteVecIMul, [SLM_FPC_RSV0], 4>;
				defm : SLMWriteResPair<WritePMULLD, [SLM_FPC_RSV0], 11, [11], 7>;
	defm : SLMWriteResPair<WriteShuffle, [SLM_FPC_RSV0], 1>;			defm : SLMWriteResPair<WriteShuffle, [SLM_FPC_RSV0], 1>;
	defm : SLMWriteResPair<WriteBlend, [SLM_FPC_RSV0], 1>;			defm : SLMWriteResPair<WriteBlend, [SLM_FPC_RSV0], 1>;
	defm : SLMWriteResPair<WriteMPSAD, [SLM_FPC_RSV0], 7>;			defm : SLMWriteResPair<WriteMPSAD, [SLM_FPC_RSV0], 7>;

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// Horizontal add/sub instructions.			// Horizontal add/sub instructions.
	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////

	▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

lib/Target/X86/X86ScheduleZnver1.td

	Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines
	def : WriteRes<WriteVecMove, [ZnFPU]>;			def : WriteRes<WriteVecMove, [ZnFPU]>;
	def : WriteRes<WriteVecLoad, [ZnAGU]> { let Latency = 8; }			def : WriteRes<WriteVecLoad, [ZnAGU]> { let Latency = 8; }

	defm : ZnWriteResFpuPair<WriteVecShift, [ZnFPU], 1>;			defm : ZnWriteResFpuPair<WriteVecShift, [ZnFPU], 1>;
	defm : ZnWriteResFpuPair<WriteVecLogic, [ZnFPU], 1>;			defm : ZnWriteResFpuPair<WriteVecLogic, [ZnFPU], 1>;
	defm : ZnWriteResFpuPair<WritePHAdd, [ZnFPU], 1>;			defm : ZnWriteResFpuPair<WritePHAdd, [ZnFPU], 1>;
	defm : ZnWriteResFpuPair<WriteVecALU, [ZnFPU], 1>;			defm : ZnWriteResFpuPair<WriteVecALU, [ZnFPU], 1>;
	defm : ZnWriteResFpuPair<WriteVecIMul, [ZnFPU0], 4>;			defm : ZnWriteResFpuPair<WriteVecIMul, [ZnFPU0], 4>;
				defm : ZnWriteResFpuPair<WritePMULLD, [ZnFPU0], 4>; // FIXME
				GGaneshUnsubmitted Not Done Reply Inline Actions This needs a fix definitely. I will do it! GGanesh: This needs a fix definitely. I will do it!
	defm : ZnWriteResFpuPair<WriteShuffle, [ZnFPU], 1>;			defm : ZnWriteResFpuPair<WriteShuffle, [ZnFPU], 1>;
	defm : ZnWriteResFpuPair<WriteBlend, [ZnFPU01], 1>;			defm : ZnWriteResFpuPair<WriteBlend, [ZnFPU01], 1>;
	defm : ZnWriteResFpuPair<WriteShuffle256, [ZnFPU], 2>;			defm : ZnWriteResFpuPair<WriteShuffle256, [ZnFPU], 2>;

	// Vector Shift Operations			// Vector Shift Operations
	defm : ZnWriteResFpuPair<WriteVarVecShift, [ZnFPU12], 1>;			defm : ZnWriteResFpuPair<WriteVarVecShift, [ZnFPU12], 1>;

	// MOVMSK Instructions.			// MOVMSK Instructions.
	▲ Show 20 Lines • Show All 1,503 Lines • Show Last 20 Lines

test/CodeGen/X86/avx2-schedule.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,905 Lines • ▼ Show 20 Lines	; ZNVER1-NEXT: retq # sched: [1:0.50]
ret <16 x i16> %3		ret <16 x i16> %3
}		}
declare <16 x i16> @llvm.x86.avx2.pmulh.w(<16 x i16>, <16 x i16>) nounwind readnone		declare <16 x i16> @llvm.x86.avx2.pmulh.w(<16 x i16>, <16 x i16>) nounwind readnone

define <8 x i32> @test_pmulld(<8 x i32> %a0, <8 x i32> %a1, <8 x i32> *%a2) {		define <8 x i32> @test_pmulld(<8 x i32> %a0, <8 x i32> %a1, <8 x i32> *%a2) {
; GENERIC-LABEL: test_pmulld:		; GENERIC-LABEL: test_pmulld:
; GENERIC: # %bb.0:		; GENERIC: # %bb.0:
; GENERIC-NEXT: vpmulld %ymm1, %ymm0, %ymm0 # sched: [5:1.00]		; GENERIC-NEXT: vpmulld %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
; GENERIC-NEXT: vpmulld (%rdi), %ymm0, %ymm0 # sched: [9:1.00]		; GENERIC-NEXT: vpmulld (%rdi), %ymm0, %ymm0 # sched: [11:1.00]
; GENERIC-NEXT: retq # sched: [1:1.00]		; GENERIC-NEXT: retq # sched: [1:1.00]
;		;
; HASWELL-LABEL: test_pmulld:		; HASWELL-LABEL: test_pmulld:
; HASWELL: # %bb.0:		; HASWELL: # %bb.0:
; HASWELL-NEXT: vpmulld %ymm1, %ymm0, %ymm0 # sched: [10:2.00]		; HASWELL-NEXT: vpmulld %ymm1, %ymm0, %ymm0 # sched: [10:2.00]
; HASWELL-NEXT: vpmulld (%rdi), %ymm0, %ymm0 # sched: [17:2.00]		; HASWELL-NEXT: vpmulld (%rdi), %ymm0, %ymm0 # sched: [17:2.00]
; HASWELL-NEXT: retq # sched: [7:1.00]		; HASWELL-NEXT: retq # sched: [7:1.00]
;		;
▲ Show 20 Lines • Show All 2,191 Lines • Show Last 20 Lines

test/CodeGen/X86/slow-pmulld.ll

Show First 20 Lines • Show All 1,209 Lines • ▼ Show 20 Lines	; AVX-64-NEXT: retq
%z = zext <8 x i16> %A to <8 x i32>		%z = zext <8 x i16> %A to <8 x i32>
%m = mul nuw nsw <8 x i32> %z, <i32 18778, i32 18778, i32 18778, i32 18778, i32 18778, i32 18778, i32 18778, i32 18778>		%m = mul nuw nsw <8 x i32> %z, <i32 18778, i32 18778, i32 18778, i32 18778, i32 18778, i32 18778, i32 18778, i32 18778>
ret <8 x i32> %m		ret <8 x i32> %m
}		}

define <16 x i32> @test_mul_v16i32_v16i16_minsize(<16 x i16> %A) minsize {		define <16 x i32> @test_mul_v16i32_v16i16_minsize(<16 x i16> %A) minsize {
; SLM32-LABEL: test_mul_v16i32_v16i16_minsize:		; SLM32-LABEL: test_mul_v16i32_v16i16_minsize:
; SLM32: # %bb.0:		; SLM32: # %bb.0:
; SLM32-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]		; SLM32-NEXT: movdqa {{.*#+}} xmm5 = [18778,18778,18778,18778]
; SLM32-NEXT: pmovzxwd {{.*#+}} xmm3 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero		; SLM32-NEXT: pshufd {{.*#+}} xmm3 = xmm1[2,3,0,1]
; SLM32-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]		; SLM32-NEXT: pshufd {{.*#+}} xmm4 = xmm0[2,3,0,1]
; SLM32-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; SLM32-NEXT: pmovzxwd {{.*#+}} xmm4 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero
; SLM32-NEXT: pmovzxwd {{.*#+}} xmm2 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero		; SLM32-NEXT: pmovzxwd {{.*#+}} xmm2 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
; SLM32-NEXT: movdqa {{.*#+}} xmm1 = [18778,18778,18778,18778]		; SLM32-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; SLM32-NEXT: pmulld %xmm1, %xmm4		; SLM32-NEXT: pmovzxwd {{.*#+}} xmm1 = xmm4[0],zero,xmm4[1],zero,xmm4[2],zero,xmm4[3],zero
; SLM32-NEXT: pmulld %xmm1, %xmm0		; SLM32-NEXT: pmovzxwd {{.*#+}} xmm3 = xmm3[0],zero,xmm3[1],zero,xmm3[2],zero,xmm3[3],zero
; SLM32-NEXT: pmulld %xmm1, %xmm2		; SLM32-NEXT: pmulld %xmm5, %xmm0
; SLM32-NEXT: pmulld %xmm1, %xmm3		; SLM32-NEXT: pmulld %xmm5, %xmm2
; SLM32-NEXT: movdqa %xmm4, %xmm1		; SLM32-NEXT: pmulld %xmm5, %xmm1
		; SLM32-NEXT: pmulld %xmm5, %xmm3
; SLM32-NEXT: retl		; SLM32-NEXT: retl
;		;
; SLM64-LABEL: test_mul_v16i32_v16i16_minsize:		; SLM64-LABEL: test_mul_v16i32_v16i16_minsize:
; SLM64: # %bb.0:		; SLM64: # %bb.0:
; SLM64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]		; SLM64-NEXT: movdqa {{.*#+}} xmm5 = [18778,18778,18778,18778]
; SLM64-NEXT: pmovzxwd {{.*#+}} xmm3 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero		; SLM64-NEXT: pshufd {{.*#+}} xmm3 = xmm1[2,3,0,1]
; SLM64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]		; SLM64-NEXT: pshufd {{.*#+}} xmm4 = xmm0[2,3,0,1]
; SLM64-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; SLM64-NEXT: pmovzxwd {{.*#+}} xmm4 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero
; SLM64-NEXT: pmovzxwd {{.*#+}} xmm2 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero		; SLM64-NEXT: pmovzxwd {{.*#+}} xmm2 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero
; SLM64-NEXT: movdqa {{.*#+}} xmm1 = [18778,18778,18778,18778]		; SLM64-NEXT: pmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
; SLM64-NEXT: pmulld %xmm1, %xmm4		; SLM64-NEXT: pmovzxwd {{.*#+}} xmm1 = xmm4[0],zero,xmm4[1],zero,xmm4[2],zero,xmm4[3],zero
; SLM64-NEXT: pmulld %xmm1, %xmm0		; SLM64-NEXT: pmovzxwd {{.*#+}} xmm3 = xmm3[0],zero,xmm3[1],zero,xmm3[2],zero,xmm3[3],zero
; SLM64-NEXT: pmulld %xmm1, %xmm2		; SLM64-NEXT: pmulld %xmm5, %xmm0
; SLM64-NEXT: pmulld %xmm1, %xmm3		; SLM64-NEXT: pmulld %xmm5, %xmm2
; SLM64-NEXT: movdqa %xmm4, %xmm1		; SLM64-NEXT: pmulld %xmm5, %xmm1
		; SLM64-NEXT: pmulld %xmm5, %xmm3
; SLM64-NEXT: retq		; SLM64-NEXT: retq
;		;
; SLOW32-LABEL: test_mul_v16i32_v16i16_minsize:		; SLOW32-LABEL: test_mul_v16i32_v16i16_minsize:
; SLOW32: # %bb.0:		; SLOW32: # %bb.0:
; SLOW32-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]		; SLOW32-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]
; SLOW32-NEXT: pmovzxwd {{.*#+}} xmm3 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero		; SLOW32-NEXT: pmovzxwd {{.*#+}} xmm3 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero
; SLOW32-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]		; SLOW32-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]
; SLOW32-NEXT: pmovzxwd {{.*#+}} xmm4 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero		; SLOW32-NEXT: pmovzxwd {{.*#+}} xmm4 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

test/CodeGen/X86/sse41-schedule.ll

	Show First 20 Lines • Show All 4,811 Lines • ▼ Show 20 Lines
	; GENERIC-LABEL: test_pmulld:			; GENERIC-LABEL: test_pmulld:
	; GENERIC: # %bb.0:			; GENERIC: # %bb.0:
	; GENERIC-NEXT: pmulld %xmm1, %xmm0 # sched: [5:1.00]			; GENERIC-NEXT: pmulld %xmm1, %xmm0 # sched: [5:1.00]
	; GENERIC-NEXT: pmulld (%rdi), %xmm0 # sched: [11:1.00]			; GENERIC-NEXT: pmulld (%rdi), %xmm0 # sched: [11:1.00]
	; GENERIC-NEXT: retq # sched: [1:1.00]			; GENERIC-NEXT: retq # sched: [1:1.00]
	;			;
	; SLM-LABEL: test_pmulld:			; SLM-LABEL: test_pmulld:
	; SLM: # %bb.0:			; SLM: # %bb.0:
	; SLM-NEXT: pmulld %xmm1, %xmm0 # sched: [4:1.00]			; SLM-NEXT: pmulld %xmm1, %xmm0 # sched: [11:11.00]
	; SLM-NEXT: pmulld (%rdi), %xmm0 # sched: [7:1.00]			; SLM-NEXT: pmulld (%rdi), %xmm0 # sched: [14:11.00]
	; SLM-NEXT: retq # sched: [4:1.00]			; SLM-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-SSE-LABEL: test_pmulld:			; SANDY-SSE-LABEL: test_pmulld:
	; SANDY-SSE: # %bb.0:			; SANDY-SSE: # %bb.0:
	; SANDY-SSE-NEXT: pmulld %xmm1, %xmm0 # sched: [5:1.00]			; SANDY-SSE-NEXT: pmulld %xmm1, %xmm0 # sched: [5:1.00]
	; SANDY-SSE-NEXT: pmulld (%rdi), %xmm0 # sched: [11:1.00]			; SANDY-SSE-NEXT: pmulld (%rdi), %xmm0 # sched: [11:1.00]
	; SANDY-SSE-NEXT: retq # sched: [1:1.00]			; SANDY-SSE-NEXT: retq # sched: [1:1.00]
	;			;
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; SKX-LABEL: test_pmulld:			; SKX-LABEL: test_pmulld:
	; SKX: # %bb.0:			; SKX: # %bb.0:
	; SKX-NEXT: vpmulld %xmm1, %xmm0, %xmm0 # sched: [10:0.67]			; SKX-NEXT: vpmulld %xmm1, %xmm0, %xmm0 # sched: [10:0.67]
	; SKX-NEXT: vpmulld (%rdi), %xmm0, %xmm0 # sched: [16:0.67]			; SKX-NEXT: vpmulld (%rdi), %xmm0, %xmm0 # sched: [16:0.67]
	; SKX-NEXT: retq # sched: [7:1.00]			; SKX-NEXT: retq # sched: [7:1.00]
	;			;
	; BTVER2-SSE-LABEL: test_pmulld:			; BTVER2-SSE-LABEL: test_pmulld:
	; BTVER2-SSE: # %bb.0:			; BTVER2-SSE: # %bb.0:
	; BTVER2-SSE-NEXT: pmulld %xmm1, %xmm0 # sched: [2:1.00]			; BTVER2-SSE-NEXT: pmulld %xmm1, %xmm0 # sched: [4:2.00]
	; BTVER2-SSE-NEXT: pmulld (%rdi), %xmm0 # sched: [7:1.00]			; BTVER2-SSE-NEXT: pmulld (%rdi), %xmm0 # sched: [9:2.00]
	; BTVER2-SSE-NEXT: retq # sched: [4:1.00]			; BTVER2-SSE-NEXT: retq # sched: [4:1.00]
	;			;
	; BTVER2-LABEL: test_pmulld:			; BTVER2-LABEL: test_pmulld:
	; BTVER2: # %bb.0:			; BTVER2: # %bb.0:
	; BTVER2-NEXT: vpmulld %xmm1, %xmm0, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vpmulld %xmm1, %xmm0, %xmm0 # sched: [4:2.00]
	; BTVER2-NEXT: vpmulld (%rdi), %xmm0, %xmm0 # sched: [7:1.00]			; BTVER2-NEXT: vpmulld (%rdi), %xmm0, %xmm0 # sched: [9:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-SSE-LABEL: test_pmulld:			; ZNVER1-SSE-LABEL: test_pmulld:
	; ZNVER1-SSE: # %bb.0:			; ZNVER1-SSE: # %bb.0:
	; ZNVER1-SSE-NEXT: pmulld %xmm1, %xmm0 # sched: [4:1.00]			; ZNVER1-SSE-NEXT: pmulld %xmm1, %xmm0 # sched: [4:1.00]
	; ZNVER1-SSE-NEXT: pmulld (%rdi), %xmm0 # sched: [11:1.00]			; ZNVER1-SSE-NEXT: pmulld (%rdi), %xmm0 # sched: [11:1.00]
	; ZNVER1-SSE-NEXT: retq # sched: [1:0.50]			; ZNVER1-SSE-NEXT: retq # sched: [1:0.50]
	;			;
	▲ Show 20 Lines • Show All 680 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add SchedRW for PMULLDClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 140040

lib/Target/X86/X86InstrAVX512.td

lib/Target/X86/X86InstrSSE.td

lib/Target/X86/X86SchedBroadwell.td

lib/Target/X86/X86SchedHaswell.td

lib/Target/X86/X86SchedSandyBridge.td

lib/Target/X86/X86SchedSkylakeClient.td

lib/Target/X86/X86SchedSkylakeServer.td

lib/Target/X86/X86Schedule.td

lib/Target/X86/X86ScheduleBtVer2.td

lib/Target/X86/X86ScheduleSLM.td

lib/Target/X86/X86ScheduleZnver1.td

test/CodeGen/X86/avx2-schedule.ll

test/CodeGen/X86/slow-pmulld.ll

test/CodeGen/X86/sse41-schedule.ll

[X86] Add SchedRW for PMULLD
ClosedPublic