This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Load/store i16 Stride=4 VF=32 interleaving costs
ClosedPublic

Authored by lebedev.ri on Sep 27 2021, 6:23 AM.

Details

Summary

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.

For load we have:
https://godbolt.org/z/zP4hd8MT6 - for intels Block RThroughput: =150.0; for ryzens, Block RThroughput: <=59
So pick cost of 150.

For store we have:
https://godbolt.org/z/vKb8zTK8E - for intels Block RThroughput: =32.0; for ryzens, Block RThroughput: <=24.0
So pick cost of 64.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Sep 27 2021, 6:23 AM
This revision is now accepted and ready to land.Sep 27 2021, 10:01 AM

LGTM

Thank you for the reviews!
I will post stride=6 next, and afterwards stride=2 for i8.

This revision was landed with ongoing or failed builds.Sep 27 2021, 12:20 PM
This revision was automatically updated to reflect the committed changes.