This is an archive of the discontinued LLVM Phabricator instance.

[X86] Removing 'TuningSlow3OpsLEA' from ICL config
ClosedPublic

Authored by goldstein.w.n on Jan 17 2023, 4:07 PM.

Details

Summary

According to https://uops.info/ ICL and newer have fast 3-term LEA.

Diff Detail

Event Timeline

goldstein.w.n created this revision.Jan 17 2023, 4:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 4:07 PM
goldstein.w.n requested review of this revision.Jan 17 2023, 4:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 4:07 PM

I'm not sure of that. There's no description in Intel SOM. I also googled about it when writting #60043 and just found LEA was improved in Golden Cove, e.g., https://www.hardwaretimes.com/intel-golden-cove-core-architecture-deep-dive-vs-zen-3-and-sunny-cove/

https://uops.info/table.html?search=lea&cb_lat=on&cb_tp=on&cb_uops=on&cb_ports=on&cb_CLX=on&cb_ICL=on&cb_measurements=on&cb_base=on

There are no 3 cycle port 1 R32/R64 LEAs on Icelake. They are now single cycle port 1 and 5. And some cases that were 1 cycle port 1 and 5 before are now port 0/1/5/6.

I'm not sure of that. There's no description in Intel SOM. I also googled about it when writting #60043 and just found LEA was improved in Golden Cove, e.g., https://www.hardwaretimes.com/intel-golden-cove-core-architecture-deep-dive-vs-zen-3-and-sunny-cove/

Fairly certain its been changed. uops.info is generally reliable also tested just now on ICX:

	.global	_start
	.p2align 6
	.text
_start:
	movl	$10000000, %eax

	xorl	%edx, %edx
loop:
	leaq	1(%rdx, %rax, 8), %rdx
	decl	%eax
	jnz	loop


	movl	$60, %eax
	xorl	%edi, %edi
	syscall

Results in:

10,002,046      cycles                                                      
 4,678,897      p0                                                          
 4,678,955      p1                                                          
 5,321,396      p5                                                          
 5,321,559      p6

Should be 30,000,000 cycles if was 3c latency.

pengfei accepted this revision.Jan 18 2023, 5:32 AM

LGTM. Thanks for the information!

This revision is now accepted and ready to land.Jan 18 2023, 5:32 AM
This revision was landed with ongoing or failed builds.Jan 19 2023, 11:30 AM
This revision was automatically updated to reflect the committed changes.