This is an archive of the discontinued LLVM Phabricator instance.

[X86][Sched] Add zero idiom sched data to the SNB model.
ClosedPublic

Authored by courbet on Sep 21 2018, 6:42 AM.

Details

Summary

On SNB, renamer-based zeroing does not work for:

  • 16 and 8-bit GPRs[1].
  • MMX [2].
  • ANDN variants [3]

[1] echo 'sub %ax, %ax' | /tmp/llvm-exegesis -mode=uops -snippets-file=-
[2] echo 'pxor %mm0, %mm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=-
[3] echo 'andnps %xmm0, %xmm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=-

Diff Detail

Repository
rL LLVM

Event Timeline

courbet created this revision.Sep 21 2018, 6:42 AM
RKSimon added inline comments.Sep 21 2018, 6:48 AM
lib/Target/X86/X86SchedSandyBridge.td
37 ↗(On Diff #166470)

Add PRF data as well ?

1143 ↗(On Diff #166470)

You might want to avoid "9.8" - Agner has a tendency to reorder chapter numbers....

Section "Sandy Bridge and Ivy Bridge Pipeline": "Register allocation and renaming"

might be better.

I'd also like to get you opinion on whether we should assume that these apply to, e.g. VXORPDYrr. Because SNB is the default model, I think it would make sense to do it if typical CPU support it.

For example it's the case for HSW and SKL/SKX:

echo 'vxorps %ymm0, %ymm0, %ymm1' | llvm-exegesis -mode=uops -snippets-file=-
---
mode:            uops
key:             
  instructions:    
    - 'VXORPSYrr YMM1 YMM0 YMM0'
  config:          ''
cpu_name:        haswell
llvm_triple:     x86_64-grtev4-linux-gnu
num_repetitions: 10000
measurements:    
  - { key: '3', value: 0.0008, debug_string: HWPort0 }
  - { key: '4', value: 0.0012, debug_string: HWPort1 }
  - { key: '5', value: 0.0006, debug_string: HWPort2 }
  - { key: '6', value: 0.0003, debug_string: HWPort3 }
  - { key: '7', value: 0.0002, debug_string: HWPort4 }
  - { key: '8', value: 0.0009, debug_string: HWPort5 }
  - { key: '9', value: 0.0028, debug_string: HWPort6 }
  - { key: '10', value: 0.0001, debug_string: HWPort7 }
error:           ''
info:            ''
assembled_snippet: C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C5FC57C8C3
...
courbet marked an inline comment as done.Sep 21 2018, 6:57 AM
courbet added inline comments.
lib/Target/X86/X86SchedSandyBridge.td
37 ↗(On Diff #166470)

Sure, I'll do it in a separate change if you don't mind.

courbet updated this revision to Diff 166473.Sep 21 2018, 6:57 AM

Better reference to agner.

andreadb accepted this revision.Sep 21 2018, 7:02 AM

Nice!

On SNB, renamer-based zeroing does not work for:

    16 and 8-bit GPRs[1].
    MMX [2].
    ANDN variants [3]

Could you add that info as a small comment in test llvm-mca/X86/SandyBridge/zero-idioms.s ? So that people don't get confused when they find that some instructions from that test have a non-zero latency.

From a tablegen point of view, the patch looks good to me.
I don't have a sandybridge, so I cannot test if those numbers are valid. However, I trust you that you have verified those with exegesis.

P.s.: are you planning to look at this for other models?

Thanks a lot Clement!

This revision is now accepted and ready to land.Sep 21 2018, 7:02 AM

Thanks Andrea !

Could you add that info as a small comment in test llvm-mca/X86/SandyBridge/zero-idioms.s ? So that people don't get confused when they find that some instructions from that test have a non-zero latency.

Good idea, done.

P.s.: are you planning to look at this for other models?

Yes, I'm planning to do HSW,BDW,SKL,SKX.

courbet updated this revision to Diff 166476.Sep 21 2018, 7:07 AM

Add comments in tests.

This revision was automatically updated to reflect the committed changes.

I'd also like to get you opinion on whether we should assume that these apply to, e.g. VXORPDYrr. Because SNB is the default model, I think it would make sense to do it if typical CPU support it.

The model has to match that cpu architecture - "supporting" additional instructions is one thing, but otherwise I'd say we need to match what Agner/Intel say.

Oddly, the Intel AoM mentions VXORPS/PD xmm and ymm and but not VANDNPS/PD as a dependency breaking zero idiom @craig.topper Any thoughts?