This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Load/store i64/f64 Stride=2 VF=16 interleaving costs
ClosedPublic

Authored by lebedev.ri on Sep 30 2021, 8:09 AM.

Details

Summary

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1WMTojvfW - for intels Block RThroughput: =16.0; for ryzens, Block RThroughput: <=8.0
So pick cost of 16.

For store we have:
https://godbolt.org/z/1WMTojvfW - for intels Block RThroughput: =16.0; for ryzens, Block RThroughput: <=16.0
So pick cost of 16.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Sep 30 2021, 8:09 AM
RKSimon accepted this revision.Oct 1 2021, 5:45 AM

LGTM (tbh I'd be tempted to accept these vXi64/f64 costs for AVX1+ targets as well - they'd still be a lot closer than current default estimates).

This revision is now accepted and ready to land.Oct 1 2021, 5:45 AM

LGTM

Thank you for the reviews!
I'll see what next tuple i'll deal with, but this covered by most immediate interest.

(tbh I'd be tempted to accept these vXi64/f64 costs for AVX1+ targets as well - they'd still be a lot closer than current default estimates).

Yeah maybe.

This revision was landed with ongoing or failed builds.Oct 1 2021, 7:49 AM
This revision was automatically updated to reflect the committed changes.