This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 interleaving costs
ClosedPublic

Authored by lebedev.ri on Sep 29 2021, 12:26 PM.

Details

Summary

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4rY96hnGT - for intels Block RThroughput: =2.0; for ryzens, Block RThroughput: =1.0
So pick cost of 2.

For store we have:
https://godbolt.org/z/vbo37Y3r9 - for intels Block RThroughput: =1.0; for ryzens, Block RThroughput: =0.5
So pick cost of 1.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Sep 29 2021, 12:26 PM
lebedev.ri requested review of this revision.Sep 29 2021, 12:26 PM

The title says i32/f32 but this just covers i32?

The title says i32/f32 but this just covers i32?

We get that for free: rG97e04d41e646aa13b0cc5ff3812bfb7305fa4756.

awesome - but we should probably add test coverage

awesome - but we should probably add test coverage

I don't really see the point, but done.

I suspect these costs can also be used for AVX1 targets

RKSimon accepted this revision.Oct 1 2021, 6:07 AM

LGTM

This revision is now accepted and ready to land.Oct 1 2021, 6:07 AM
This revision was landed with ongoing or failed builds.Oct 1 2021, 7:49 AM
This revision was automatically updated to reflect the committed changes.