This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs

Authored by lebedev.ri on Oct 3 2021, 5:47 AM.



The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have: - for intels Block RThroughput: =56.0; for ryzens, Block RThroughput: <=17.8
So pick cost of 56.

For store we have: - for intels Block RThroughput: <=54.0; for ryzens, Block RThroughput: <=15.0
So pick cost of 54.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Oct 3 2021, 5:47 AM
RKSimon accepted this revision.Oct 3 2021, 11:15 AM


This revision is now accepted and ready to land.Oct 3 2021, 11:15 AM


Thank you for the reviews!

lebedev.ri updated this revision to Diff 376778.Oct 3 2021, 1:36 PM

Hmm, i did it again.
All of the analysis is right, but i have uploaded the wrong diff.