According to https://uops.info/ ICL and newer have fast 3-term LEA.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
I'm not sure of that. There's no description in Intel SOM. I also googled about it when writting #60043 and just found LEA was improved in Golden Cove, e.g., https://www.hardwaretimes.com/intel-golden-cove-core-architecture-deep-dive-vs-zen-3-and-sunny-cove/
Comment Actions
There are no 3 cycle port 1 R32/R64 LEAs on Icelake. They are now single cycle port 1 and 5. And some cases that were 1 cycle port 1 and 5 before are now port 0/1/5/6.
Comment Actions
Fairly certain its been changed. uops.info is generally reliable also tested just now on ICX:
.global _start .p2align 6 .text _start: movl $10000000, %eax xorl %edx, %edx loop: leaq 1(%rdx, %rax, 8), %rdx decl %eax jnz loop movl $60, %eax xorl %edi, %edi syscall
Results in:
10,002,046 cycles 4,678,897 p0 4,678,955 p1 5,321,396 p5 5,321,559 p6
Should be 30,000,000 cycles if was 3c latency.