This is an archive of the discontinued LLVM Phabricator instance.

[X86][BtVer2] Fix latency of ALU RMW instructions.
ClosedPublic

Authored by andreadb on Aug 23 2019, 3:21 AM.

Details

Summary

Excluding ADC/SBB and the bit-test instructions (BTR/BTS/BTC), the observed latency of all other RMW integer arithmetic/logic instructions is 6cy and not 5cy.

Example (ADD):

addb $0, (%rsp)            # Latency: 6cy
addb $7, (%rsp)            # Latency: 6cy
addb %sil, (%rsp)          # Latency: 6cy

addw $0, (%rsp)            # Latency: 6cy
addw $511, (%rsp)          # Latency: 6cy
addw %si, (%rsp)           # Latency: 6cy

addl $0, (%rsp)            # Latency: 6cy
addl $511, (%rsp)          # Latency: 6cy
addl %esi, (%rsp)          # Latency: 6cy

addq $0, (%rsp)            # Latency: 6cy
addq $511, (%rsp)          # Latency: 6cy
addq %rsi, (%rsp)          # Latency: 6cy

The same latency profile applies to SUB/AND/OR/XOR/INC/DEC.

The observed latency of ADC/SBB is 7-8cy. So we need a different write to model those.
Latency of BTS/BTR/BTC is not fixed by this patch (they are much slower than what the model for btver2 currently reports).

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb created this revision.Aug 23 2019, 3:21 AM
RKSimon accepted this revision.Aug 23 2019, 4:10 AM

LGTM - for the ADC/SBB fix I'd recommend adding a WriteADCRMW class instead of adding yet more overrides.

This revision is now accepted and ready to land.Aug 23 2019, 4:10 AM

LGTM - for the ADC/SBB fix I'd recommend adding a WriteADCRMW class instead of adding yet more overrides.

Good idea. I'll do that.
Thanks for the review!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2019, 4:33 AM