This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/X86/X86ScheduleBtVer2.td
202	My understanding is that ResourceCycles has to be 4 because you want to be able to compute a reciprocal throughput of 2. According to the amd documents, the pipes used by a float variable blend are "FPA\|FPM". I wonder whether we could have [JFPU0, JFPU1] instead of [JFPU01], and then change ResourceCycles to [2, 1]. A variable blend is 3 uOps, and internally, it is likely to be implemented as the sequence {VAND,VANDN,VOR}, where VAND and VANDN can execute in parallel. It may be worthy to run some experiments to see which approach is better. That being said, your approach is not wrong, and I don't have a strong opinion on this.
202–206	`let NumMicroOps = 3;`
218–225	Variable blend instructions are 3 macro ops. You should add `let NumMicroOps = 3;`.

avt77 added inline comments.Nov 9 2017, 12:20 AM

lib/Target/X86/X86ScheduleBtVer2.td
202	I have AMD laptop to make experiments but I don't have any perf test using a variable blend. Could anyone to help me with such a test?

davide removed a reviewer: davide.Nov 9 2017, 12:29 AM

I made updates required by andreadb except the question about JFPU01: if we need perf experiments please help me with perf test(s). If we agree to use the current implementation then I could commit the patch if you get me LGTM.

In D39802#920299, @avt77 wrote:

I made updates required by andreadb except the question about JFPU01: if we need perf experiments please help me with perf test(s). If we agree to use the current implementation then I could commit the patch if you get me LGTM.

I don't think it is worthy to run experiments just for the blendv case.
I just wanted to point out that there may be an alternative solution for that particular case. Having a resource cycle of 4 for the blendv is a bit odd (at least to me)..
But, as I wrote said, I really don't have a strong opinion on what numbers should go there (neither solution is ideal).
If you fix the remaining NumMicroOps issue I pointed out, then the patch looks good to me.

lib/Target/X86/X86ScheduleBtVer2.td
652–655	let NumMicroOps = 3;
659–662	Same.

This revision is now accepted and ready to land.Nov 9 2017, 4:19 AM

Closed by commit rL317785: Sched model improving on btver2: JFPU01 resource, vtestp* for xmm. (authored by avt77). · Explain WhyNov 9 2017, 6:20 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

X86ScheduleBtVer2.td

31 lines

test/

CodeGen/

X86/

avx-schedule.ll

16 lines

sse41-schedule.ll

16 lines

Diff 122093

lib/Target/X86/X86ScheduleBtVer2.td

Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines

// FIXME: integer pipes		// FIXME: integer pipes
defm : JWriteResFpuPair<WriteCvtF2I, JFPU1, 3>; // Float -> Integer.		defm : JWriteResFpuPair<WriteCvtF2I, JFPU1, 3>; // Float -> Integer.
defm : JWriteResFpuPair<WriteCvtI2F, JFPU1, 3>; // Integer -> Float.		defm : JWriteResFpuPair<WriteCvtI2F, JFPU1, 3>; // Integer -> Float.
defm : JWriteResFpuPair<WriteCvtF2F, JFPU1, 3>; // Float -> Float size conversion.		defm : JWriteResFpuPair<WriteCvtF2F, JFPU1, 3>; // Float -> Float size conversion.

def : WriteRes<WriteFVarBlend, [JFPU01]> {		def : WriteRes<WriteFVarBlend, [JFPU01]> {
let Latency = 2;		let Latency = 2;
let ResourceCycles = [2];		let ResourceCycles = [4];
		andreadbUnsubmitted Not Done Reply Inline Actions My understanding is that ResourceCycles has to be 4 because you want to be able to compute a reciprocal throughput of 2. According to the amd documents, the pipes used by a float variable blend are "FPA\|FPM". I wonder whether we could have [JFPU0, JFPU1] instead of [JFPU01], and then change ResourceCycles to [2, 1]. A variable blend is 3 uOps, and internally, it is likely to be implemented as the sequence {VAND,VANDN,VOR}, where VAND and VANDN can execute in parallel. It may be worthy to run some experiments to see which approach is better. That being said, your approach is not wrong, and I don't have a strong opinion on this. andreadb: My understanding is that ResourceCycles has to be 4 because you want to be able to compute a…
		avt77AuthorUnsubmitted Not Done Reply Inline Actions I have AMD laptop to make experiments but I don't have any perf test using a variable blend. Could anyone to help me with such a test? avt77: I have AMD laptop to make experiments but I don't have any perf test using a variable blend.
}		}
def : WriteRes<WriteFVarBlendLd, [JLAGU, JFPU01]> {		def : WriteRes<WriteFVarBlendLd, [JLAGU, JFPU01]> {
let Latency = 7;		let Latency = 7;
let ResourceCycles = [1, 2];		let ResourceCycles = [1, 4];
		andreadbUnsubmitted Not Done Reply Inline Actions `let NumMicroOps = 3;` andreadb: `let NumMicroOps = 3;`
}		}

// Vector integer operations.		// Vector integer operations.
defm : JWriteResFpuPair<WriteVecALU, JFPU01, 1>;		defm : JWriteResFpuPair<WriteVecALU, JFPU01, 1>;
defm : JWriteResFpuPair<WriteVecShift, JFPU01, 1>;		defm : JWriteResFpuPair<WriteVecShift, JFPU01, 1>;
defm : JWriteResFpuPair<WriteVecIMul, JFPU0, 2>;		defm : JWriteResFpuPair<WriteVecIMul, JFPU0, 2>;
defm : JWriteResFpuPair<WriteShuffle, JFPU01, 1>;		defm : JWriteResFpuPair<WriteShuffle, JFPU01, 1>;
defm : JWriteResFpuPair<WriteBlend, JFPU01, 1>;		defm : JWriteResFpuPair<WriteBlend, JFPU01, 1>;
defm : JWriteResFpuPair<WriteVecLogic, JFPU01, 1>;		defm : JWriteResFpuPair<WriteVecLogic, JFPU01, 1>;
defm : JWriteResFpuPair<WriteShuffle256, JFPU01, 1>;		defm : JWriteResFpuPair<WriteShuffle256, JFPU01, 1>;

def : WriteRes<WriteVarBlend, [JFPU01]> {		def : WriteRes<WriteVarBlend, [JFPU01]> {
let Latency = 2;		let Latency = 2;
let ResourceCycles = [2];		let ResourceCycles = [4];
}		}
def : WriteRes<WriteVarBlendLd, [JLAGU, JFPU01]> {		def : WriteRes<WriteVarBlendLd, [JLAGU, JFPU01]> {
let Latency = 7;		let Latency = 7;
let ResourceCycles = [1, 2];		let ResourceCycles = [1, 4];
}		}
		andreadbUnsubmitted Not Done Reply Inline Actions Variable blend instructions are 3 macro ops. You should add `let NumMicroOps = 3;`. andreadb: Variable blend instructions are 3 macro ops. You should add `let NumMicroOps = 3;`.

// FIXME: why do we need to define AVX2 resource on CPU that doesn't have AVX2?		// FIXME: why do we need to define AVX2 resource on CPU that doesn't have AVX2?
def : WriteRes<WriteVarVecShift, [JFPU01]> {		def : WriteRes<WriteVarVecShift, [JFPU01]> {}
let Latency = 1;
let ResourceCycles = [1];
}
def : WriteRes<WriteVarVecShiftLd, [JLAGU, JFPU01]> {		def : WriteRes<WriteVarVecShiftLd, [JLAGU, JFPU01]> {
let Latency = 6;		let Latency = 6;
let ResourceCycles = [1, 1];		let ResourceCycles = [1, 2];
}		}

def : WriteRes<WriteMPSAD, [JFPU0]> {		def : WriteRes<WriteMPSAD, [JFPU0]> {
let Latency = 3;		let Latency = 3;
let ResourceCycles = [2];		let ResourceCycles = [2];
}		}
def : WriteRes<WriteMPSADLd, [JLAGU, JFPU0]> {		def : WriteRes<WriteMPSADLd, [JLAGU, JFPU0]> {
let Latency = 8;		let Latency = 8;
▲ Show 20 Lines • Show All 404 Lines • ▼ Show 20 Lines	def WriteVMOVMSK: SchedWriteRes<[JFPU0]> {
let Latency = 3;		let Latency = 3;
}		}
def : InstRW<[WriteVMOVMSK], (instregex "VMOVMSKP(D\|S)(Y)?rr")>;		def : InstRW<[WriteVMOVMSK], (instregex "VMOVMSKP(D\|S)(Y)?rr")>;

// TODO: In fact we have latency '3+i'. The +i represents an additional 1 cycle transfer		// TODO: In fact we have latency '3+i'. The +i represents an additional 1 cycle transfer
// operation which moves the floating point result to the integer unit. During this		// operation which moves the floating point result to the integer unit. During this
// additional cycle the floating point unit execution resources are not occupied		// additional cycle the floating point unit execution resources are not occupied
// and ALU0 in the integer unit is occupied instead.		// and ALU0 in the integer unit is occupied instead.
def WriteVTESTY: SchedWriteRes<[JFPU01, JFPU0]> {		def WriteVTESTY: SchedWriteRes<[JFPU01, JFPU0]> {
let Latency = 4;		let Latency = 4;
let ResourceCycles = [4, 2];		let ResourceCycles = [2, 2];
}		}
		andreadbUnsubmitted Not Done Reply Inline Actions let NumMicroOps = 3; andreadb: let NumMicroOps = 3;
def : InstRW<[WriteVTESTY], (instregex "VTESTP(S\|D)Yrr")>;		def : InstRW<[WriteVTESTY], (instregex "VTESTP(S\|D)Yrr")>;
def : InstRW<[WriteVTESTY], (instregex "VPTESTYrr")>;		def : InstRW<[WriteVTESTY], (instregex "VPTESTYrr")>;

def WriteVTESTYLd: SchedWriteRes<[JLAGU, JFPU01, JFPU0]> {		def WriteVTESTYLd: SchedWriteRes<[JLAGU, JFPU01, JFPU0]> {
let Latency = 9;		let Latency = 9;
let ResourceCycles = [1, 4, 2];		let ResourceCycles = [1, 2, 2];
}		}
		andreadbUnsubmitted Not Done Reply Inline Actions Same. andreadb: Same.
def : InstRW<[WriteVTESTYLd], (instregex "VTESTP(S\|D)Yrm")>;		def : InstRW<[WriteVTESTYLd], (instregex "VTESTP(S\|D)Yrm")>;
def : InstRW<[WriteVTESTYLd], (instregex "VPTESTYrm")>;		def : InstRW<[WriteVTESTYLd], (instregex "VPTESTYrm")>;

		def WriteVTEST: SchedWriteRes<[JFPU0]> {
		let Latency = 3;
		}
		def : InstRW<[WriteVTEST], (instregex "VTESTP(S\|D)rr")>;
		def : InstRW<[WriteVTEST], (instregex "VPTESTrr")>;

		def WriteVTESTLd: SchedWriteRes<[JLAGU, JFPU0]> {
		let Latency = 8;
		}
		def : InstRW<[WriteVTESTLd], (instregex "VTESTP(S\|D)rm")>;
		def : InstRW<[WriteVTESTLd], (instregex "VPTESTrm")>;

def WriteVSQRTYPD: SchedWriteRes<[JFPU1]> {		def WriteVSQRTYPD: SchedWriteRes<[JFPU1]> {
let Latency = 54;		let Latency = 54;
let ResourceCycles = [54];		let ResourceCycles = [54];
}		}
def : InstRW<[WriteVSQRTYPD], (instregex "VSQRTPDYr")>;		def : InstRW<[WriteVSQRTYPD], (instregex "VSQRTPDYr")>;

def WriteVSQRTYPDLd: SchedWriteRes<[JLAGU, JFPU1]> {		def WriteVSQRTYPDLd: SchedWriteRes<[JLAGU, JFPU1]> {
let Latency = 59;		let Latency = 59;
Show All 29 Lines

test/CodeGen/X86/avx-schedule.ll

	Show First 20 Lines • Show All 4,605 Lines • ▼ Show 20 Lines
	; SKX-NEXT: setb %al # sched: [1:0.50]			; SKX-NEXT: setb %al # sched: [1:0.50]
	; SKX-NEXT: vtestpd (%rdi), %xmm0 # sched: [8:1.00]			; SKX-NEXT: vtestpd (%rdi), %xmm0 # sched: [8:1.00]
	; SKX-NEXT: adcl $0, %eax # sched: [1:0.50]			; SKX-NEXT: adcl $0, %eax # sched: [1:0.50]
	; SKX-NEXT: retq # sched: [7:1.00]			; SKX-NEXT: retq # sched: [7:1.00]
	;			;
	; BTVER2-LABEL: test_testpd:			; BTVER2-LABEL: test_testpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]			; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; BTVER2-NEXT: vtestpd %xmm1, %xmm0 # sched: [1:0.50]			; BTVER2-NEXT: vtestpd %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: setb %al # sched: [1:0.50]			; BTVER2-NEXT: setb %al # sched: [1:0.50]
	; BTVER2-NEXT: vtestpd (%rdi), %xmm0 # sched: [6:1.00]			; BTVER2-NEXT: vtestpd (%rdi), %xmm0 # sched: [8:1.00]
	; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]			; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_testpd:			; ZNVER1-LABEL: test_testpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.25]			; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.25]
	; ZNVER1-NEXT: vtestpd %xmm1, %xmm0 # sched: [1:0.25]			; ZNVER1-NEXT: vtestpd %xmm1, %xmm0 # sched: [1:0.25]
	; ZNVER1-NEXT: setb %al # sched: [1:0.25]			; ZNVER1-NEXT: setb %al # sched: [1:0.25]
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; SKX-NEXT: vtestpd (%rdi), %ymm0 # sched: [9:1.00]			; SKX-NEXT: vtestpd (%rdi), %ymm0 # sched: [9:1.00]
	; SKX-NEXT: adcl $0, %eax # sched: [1:0.50]			; SKX-NEXT: adcl $0, %eax # sched: [1:0.50]
	; SKX-NEXT: vzeroupper # sched: [4:1.00]			; SKX-NEXT: vzeroupper # sched: [4:1.00]
	; SKX-NEXT: retq # sched: [7:1.00]			; SKX-NEXT: retq # sched: [7:1.00]
	;			;
	; BTVER2-LABEL: test_testpd_ymm:			; BTVER2-LABEL: test_testpd_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]			; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; BTVER2-NEXT: vtestpd %ymm1, %ymm0 # sched: [4:3.00]			; BTVER2-NEXT: vtestpd %ymm1, %ymm0 # sched: [4:2.00]
	; BTVER2-NEXT: setb %al # sched: [1:0.50]			; BTVER2-NEXT: setb %al # sched: [1:0.50]
	; BTVER2-NEXT: vtestpd (%rdi), %ymm0 # sched: [9:3.00]			; BTVER2-NEXT: vtestpd (%rdi), %ymm0 # sched: [9:2.00]
	; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]			; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_testpd_ymm:			; ZNVER1-LABEL: test_testpd_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.25]			; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.25]
	; ZNVER1-NEXT: vtestpd %ymm1, %ymm0 # sched: [1:0.25]			; ZNVER1-NEXT: vtestpd %ymm1, %ymm0 # sched: [1:0.25]
	; ZNVER1-NEXT: setb %al # sched: [1:0.25]			; ZNVER1-NEXT: setb %al # sched: [1:0.25]
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; SKX-NEXT: setb %al # sched: [1:0.50]			; SKX-NEXT: setb %al # sched: [1:0.50]
	; SKX-NEXT: vtestps (%rdi), %xmm0 # sched: [8:1.00]			; SKX-NEXT: vtestps (%rdi), %xmm0 # sched: [8:1.00]
	; SKX-NEXT: adcl $0, %eax # sched: [1:0.50]			; SKX-NEXT: adcl $0, %eax # sched: [1:0.50]
	; SKX-NEXT: retq # sched: [7:1.00]			; SKX-NEXT: retq # sched: [7:1.00]
	;			;
	; BTVER2-LABEL: test_testps:			; BTVER2-LABEL: test_testps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]			; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; BTVER2-NEXT: vtestps %xmm1, %xmm0 # sched: [1:0.50]			; BTVER2-NEXT: vtestps %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: setb %al # sched: [1:0.50]			; BTVER2-NEXT: setb %al # sched: [1:0.50]
	; BTVER2-NEXT: vtestps (%rdi), %xmm0 # sched: [6:1.00]			; BTVER2-NEXT: vtestps (%rdi), %xmm0 # sched: [8:1.00]
	; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]			; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_testps:			; ZNVER1-LABEL: test_testps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.25]			; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.25]
	; ZNVER1-NEXT: vtestps %xmm1, %xmm0 # sched: [1:0.25]			; ZNVER1-NEXT: vtestps %xmm1, %xmm0 # sched: [1:0.25]
	; ZNVER1-NEXT: setb %al # sched: [1:0.25]			; ZNVER1-NEXT: setb %al # sched: [1:0.25]
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; SKX-NEXT: vtestps (%rdi), %ymm0 # sched: [9:1.00]			; SKX-NEXT: vtestps (%rdi), %ymm0 # sched: [9:1.00]
	; SKX-NEXT: adcl $0, %eax # sched: [1:0.50]			; SKX-NEXT: adcl $0, %eax # sched: [1:0.50]
	; SKX-NEXT: vzeroupper # sched: [4:1.00]			; SKX-NEXT: vzeroupper # sched: [4:1.00]
	; SKX-NEXT: retq # sched: [7:1.00]			; SKX-NEXT: retq # sched: [7:1.00]
	;			;
	; BTVER2-LABEL: test_testps_ymm:			; BTVER2-LABEL: test_testps_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]			; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; BTVER2-NEXT: vtestps %ymm1, %ymm0 # sched: [4:3.00]			; BTVER2-NEXT: vtestps %ymm1, %ymm0 # sched: [4:2.00]
	; BTVER2-NEXT: setb %al # sched: [1:0.50]			; BTVER2-NEXT: setb %al # sched: [1:0.50]
	; BTVER2-NEXT: vtestps (%rdi), %ymm0 # sched: [9:3.00]			; BTVER2-NEXT: vtestps (%rdi), %ymm0 # sched: [9:2.00]
	; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]			; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_testps_ymm:			; ZNVER1-LABEL: test_testps_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.25]			; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.25]
	; ZNVER1-NEXT: vtestps %ymm1, %ymm0 # sched: [1:0.25]			; ZNVER1-NEXT: vtestps %ymm1, %ymm0 # sched: [1:0.25]
	; ZNVER1-NEXT: setb %al # sched: [1:0.25]			; ZNVER1-NEXT: setb %al # sched: [1:0.25]
	▲ Show 20 Lines • Show All 471 Lines • Show Last 20 Lines

test/CodeGen/X86/sse41-schedule.ll

	Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines
	; SKX-LABEL: test_blendvpd:			; SKX-LABEL: test_blendvpd:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vblendvpd %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:0.67]			; SKX-NEXT: vblendvpd %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:0.67]
	; SKX-NEXT: vblendvpd %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.67]			; SKX-NEXT: vblendvpd %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.67]
	; SKX-NEXT: retq # sched: [7:1.00]			; SKX-NEXT: retq # sched: [7:1.00]
	;			;
	; BTVER2-LABEL: test_blendvpd:			; BTVER2-LABEL: test_blendvpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vblendvpd %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vblendvpd %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:2.00]
	; BTVER2-NEXT: vblendvpd %xmm2, (%rdi), %xmm0, %xmm0 # sched: [7:1.00]			; BTVER2-NEXT: vblendvpd %xmm2, (%rdi), %xmm0, %xmm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_blendvpd:			; ZNVER1-LABEL: test_blendvpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vblendvpd %xmm2, %xmm1, %xmm0, %xmm0 # sched: [1:0.50]			; ZNVER1-NEXT: vblendvpd %xmm2, %xmm1, %xmm0, %xmm0 # sched: [1:0.50]
	; ZNVER1-NEXT: vblendvpd %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.50]			; ZNVER1-NEXT: vblendvpd %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.50]
	; ZNVER1-NEXT: retq # sched: [1:0.50]			; ZNVER1-NEXT: retq # sched: [1:0.50]
	%1 = call <2 x double> @llvm.x86.sse41.blendvpd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)			%1 = call <2 x double> @llvm.x86.sse41.blendvpd(<2 x double> %a0, <2 x double> %a1, <2 x double> %a2)
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; SKX-LABEL: test_blendvps:			; SKX-LABEL: test_blendvps:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vblendvps %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:0.67]			; SKX-NEXT: vblendvps %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:0.67]
	; SKX-NEXT: vblendvps %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.67]			; SKX-NEXT: vblendvps %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.67]
	; SKX-NEXT: retq # sched: [7:1.00]			; SKX-NEXT: retq # sched: [7:1.00]
	;			;
	; BTVER2-LABEL: test_blendvps:			; BTVER2-LABEL: test_blendvps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vblendvps %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vblendvps %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:2.00]
	; BTVER2-NEXT: vblendvps %xmm2, (%rdi), %xmm0, %xmm0 # sched: [7:1.00]			; BTVER2-NEXT: vblendvps %xmm2, (%rdi), %xmm0, %xmm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_blendvps:			; ZNVER1-LABEL: test_blendvps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vblendvps %xmm2, %xmm1, %xmm0, %xmm0 # sched: [1:0.50]			; ZNVER1-NEXT: vblendvps %xmm2, %xmm1, %xmm0, %xmm0 # sched: [1:0.50]
	; ZNVER1-NEXT: vblendvps %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.50]			; ZNVER1-NEXT: vblendvps %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.50]
	; ZNVER1-NEXT: retq # sched: [1:0.50]			; ZNVER1-NEXT: retq # sched: [1:0.50]
	%1 = call <4 x float> @llvm.x86.sse41.blendvps(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)			%1 = call <4 x float> @llvm.x86.sse41.blendvps(<4 x float> %a0, <4 x float> %a1, <4 x float> %a2)
	▲ Show 20 Lines • Show All 468 Lines • ▼ Show 20 Lines
	; SKX-LABEL: test_pblendvb:			; SKX-LABEL: test_pblendvb:
	; SKX: # BB#0:			; SKX: # BB#0:
	; SKX-NEXT: vpblendvb %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:0.67]			; SKX-NEXT: vpblendvb %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:0.67]
	; SKX-NEXT: vpblendvb %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.67]			; SKX-NEXT: vpblendvb %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:0.67]
	; SKX-NEXT: retq # sched: [7:1.00]			; SKX-NEXT: retq # sched: [7:1.00]
	;			;
	; BTVER2-LABEL: test_pblendvb:			; BTVER2-LABEL: test_pblendvb:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vpblendvb %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vpblendvb %xmm2, %xmm1, %xmm0, %xmm0 # sched: [2:2.00]
	; BTVER2-NEXT: vpblendvb %xmm2, (%rdi), %xmm0, %xmm0 # sched: [7:1.00]			; BTVER2-NEXT: vpblendvb %xmm2, (%rdi), %xmm0, %xmm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_pblendvb:			; ZNVER1-LABEL: test_pblendvb:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vpblendvb %xmm2, %xmm1, %xmm0, %xmm0 # sched: [1:1.00]			; ZNVER1-NEXT: vpblendvb %xmm2, %xmm1, %xmm0, %xmm0 # sched: [1:1.00]
	; ZNVER1-NEXT: vpblendvb %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:1.00]			; ZNVER1-NEXT: vpblendvb %xmm2, (%rdi), %xmm0, %xmm0 # sched: [8:1.00]
	; ZNVER1-NEXT: retq # sched: [1:0.50]			; ZNVER1-NEXT: retq # sched: [1:0.50]
	%1 = call <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8> %a0, <16 x i8> %a1, <16 x i8> %a2)			%1 = call <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8> %a0, <16 x i8> %a1, <16 x i8> %a2)
	▲ Show 20 Lines • Show All 2,172 Lines • ▼ Show 20 Lines
	; SKX-NEXT: vptest (%rdi), %xmm0 # sched: [9:1.00]			; SKX-NEXT: vptest (%rdi), %xmm0 # sched: [9:1.00]
	; SKX-NEXT: setb %cl # sched: [1:0.50]			; SKX-NEXT: setb %cl # sched: [1:0.50]
	; SKX-NEXT: andb %al, %cl # sched: [1:0.25]			; SKX-NEXT: andb %al, %cl # sched: [1:0.25]
	; SKX-NEXT: movzbl %cl, %eax # sched: [1:0.25]			; SKX-NEXT: movzbl %cl, %eax # sched: [1:0.25]
	; SKX-NEXT: retq # sched: [7:1.00]			; SKX-NEXT: retq # sched: [7:1.00]
	;			;
	; BTVER2-LABEL: test_ptest:			; BTVER2-LABEL: test_ptest:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vptest %xmm1, %xmm0 # sched: [1:0.50]			; BTVER2-NEXT: vptest %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: setb %al # sched: [1:0.50]			; BTVER2-NEXT: setb %al # sched: [1:0.50]
	; BTVER2-NEXT: vptest (%rdi), %xmm0 # sched: [6:1.00]			; BTVER2-NEXT: vptest (%rdi), %xmm0 # sched: [8:1.00]
	; BTVER2-NEXT: setb %cl # sched: [1:0.50]			; BTVER2-NEXT: setb %cl # sched: [1:0.50]
	; BTVER2-NEXT: andb %al, %cl # sched: [1:0.50]			; BTVER2-NEXT: andb %al, %cl # sched: [1:0.50]
	; BTVER2-NEXT: movzbl %cl, %eax # sched: [1:0.50]			; BTVER2-NEXT: movzbl %cl, %eax # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_ptest:			; ZNVER1-LABEL: test_ptest:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vptest %xmm1, %xmm0 # sched: [1:1.00]			; ZNVER1-NEXT: vptest %xmm1, %xmm0 # sched: [1:1.00]
	▲ Show 20 Lines • Show All 303 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 122093

lib/Target/X86/X86ScheduleBtVer2.td

test/CodeGen/X86/avx-schedule.ll

test/CodeGen/X86/sse41-schedule.ll

Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.
ClosedPublic