This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Load/store i64 Stride=4 VF=16 interleaving costs
ClosedPublic

Authored by lebedev.ri on Oct 16 2021, 10:49 AM.

Details

Summary

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/9bnKrefcG - for intels Block RThroughput: =40.0; for ryzens, Block RThroughput: =16.0
So could pick cost of 40

For store we have:
https://godbolt.org/z/5s3s14dEY - for intels Block RThroughput: =40.0; for ryzens, Block RThroughput: =16.0
So we could pick cost of 40.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Oct 16 2021, 10:49 AM
This revision is now accepted and ready to land.Oct 16 2021, 12:46 PM
This revision was landed with ongoing or failed builds.Oct 17 2021, 7:40 AM
This revision was automatically updated to reflect the committed changes.