HomePhabricator

[X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs

Authored by lebedev.ri on Oct 2 2021, 3:40 AM.

Description

[X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs

While we already model this tuple, the load cost is divergent from reality, so fix it.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zWMhhnPYa - for intels Block RThroughput: =56.0; for ryzens, Block RThroughput: <=24.0
So pick cost of 56.

For store we have:
https://godbolt.org/z/vnqqjWx51 - for intels Block RThroughput: =12.0; for ryzens, Block RThroughput: <=4.0
So pick cost of 12.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110971