This is an archive of the discontinued LLVM Phabricator instance.

[X86][BtVer2] Fix latency of ALU RMW instructions.
ClosedPublic

Authored by andreadb on Aug 23 2019, 3:21 AM.

Details

Summary

Excluding ADC/SBB and the bit-test instructions (BTR/BTS/BTC), the observed latency of all other RMW integer arithmetic/logic instructions is 6cy and not 5cy.

Example (ADD):

addb $0, (%rsp)            # Latency: 6cy
addb $7, (%rsp)            # Latency: 6cy
addb %sil, (%rsp)          # Latency: 6cy

addw $0, (%rsp)            # Latency: 6cy
addw $511, (%rsp)          # Latency: 6cy
addw %si, (%rsp)           # Latency: 6cy

addl $0, (%rsp)            # Latency: 6cy
addl $511, (%rsp)          # Latency: 6cy
addl %esi, (%rsp)          # Latency: 6cy

addq $0, (%rsp)            # Latency: 6cy
addq $511, (%rsp)          # Latency: 6cy
addq %rsi, (%rsp)          # Latency: 6cy

The same latency profile applies to SUB/AND/OR/XOR/INC/DEC.

The observed latency of ADC/SBB is 7-8cy. So we need a different write to model those.
Latency of BTS/BTR/BTC is not fixed by this patch (they are much slower than what the model for btver2 currently reports).

Diff Detail

Event Timeline

andreadb created this revision.Aug 23 2019, 3:21 AM
RKSimon accepted this revision.Aug 23 2019, 4:10 AM

LGTM - for the ADC/SBB fix I'd recommend adding a WriteADCRMW class instead of adding yet more overrides.

This revision is now accepted and ready to land.Aug 23 2019, 4:10 AM

LGTM - for the ADC/SBB fix I'd recommend adding a WriteADCRMW class instead of adding yet more overrides.

Good idea. I'll do that.
Thanks for the review!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2019, 4:33 AM