HomePhabricator

[X86][BtVer2] Fix latency/throughput of scalar integer MUL instructions.

Description

[X86][BtVer2] Fix latency/throughput of scalar integer MUL instructions.

Single operand MUL instructions that implicitly set EAX have the following
latency/throughput profile (see below):

imul %cl # latency: 3cy - uOPs: 1 - 1 JMul
imul %cx # latency: 3cy - uOPs: 3 - 3 JMul
imul %ecx # latency: 3cy - uOPs: 2 - 2 JMul
imul %rcx # latency: 6cy - uOPs: 2 - 4 JMul

mul %cl # latency: 3cy - uOPs: 1 - 1 JMul
mul %cx # latency: 3cy - uOPs: 3 - 3 JMul
mul %ecx # latency: 3cy - uOPs: 2 - 2 JMul
mul %rcx # latency: 6cy - uOPs: 2 - 4 JMul

Excluding the 64bit variant, which has a latency of 6cy, every other instruction
has a latency of 3cy. However, the number of decoded macro-opcodes (as well as
the resource cyles) depend on the MUL size.

The two operand MULs have a more predictable profile (see below):

imul %dx, %dx # latency: 3cy - uOPs: 1 - 1 JMul
imul %edx, %edx # latency: 3cy - uOPs: 1 - 1 JMul
imul %rdx, %rdx # latency: 6cy - uOPs: 1 - 4 JMul

imul $3, %dx, %dx # latency: 4cy - uOPs: 2 - 2 JMul
imul $3, %ecx, %ecx # latency: 3cy - uOPs: 1 - 1 JMul
imul $3, %rdx, %rdx # latency: 6cy - uOPs: 1 - 4 JMul

This patch updates the values in the Jaguar scheduling model and regenerates
llvm-mca tests.

Differential Revision: https://reviews.llvm.org/D66547

Details

Committed
adibiagioAug 22 2019, 8:20 AM
Differential Revision
D66547: [X86][BtVer2] Fix latency/throughput of scalar integer MUL instructions.
Parents
rL369660: [lldb] Remove ')' to fix the build
Branches
Unknown
Tags
Unknown