Page MenuHomePhabricator

[X86][TTI] update costs of interleaved load\store of i64\double

Authored by magabari on Nov 14 2017, 12:32 AM.



This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2.

Diff Detail


Event Timeline

magabari created this revision.Nov 14 2017, 12:32 AM
magabari retitled this revision from [X86][TTI] update costs of interleaved load to [X86][TTI] update costs of interleaved load\store of i64\double.Nov 14 2017, 12:34 AM
magabari edited the summary of this revision. (Show Details)
magabari added a subscriber: llvm-commits.
dorit edited edge metadata.Nov 15 2017, 10:27 AM

I think it would be nice to make the testcases smaller; Right now you have something like this:
for (…) {
Dst[2*i] = Dst[2*i] + Src[2*i] * k
Dst[2*i+1] = Dst[2*i+1] + Src[2*i+1] * k
...which actually tests both strided loads and strided stores.
So you could either use one test to check both store and load costs (and even then you probably don't need both a mul and an add just to check memops costs).
Or if you want to separate the load and store cases, the Load test could be something like:
for (…) {
s += Src[2*i]
s += Src[2*i+1]
The Store test could be something like:

Dst[2*i] = k1;
Dst[2*i+1] = k2;


2 ↗(On Diff #122791)

I see some of the interleave tests in this directory use -mcpu=core_avx2 and some use -mcpu=skylake. I wonder which one we want to use?

magabari updated this revision to Diff 123134.Nov 15 2017, 11:43 PM
magabari marked an inline comment as done.

fixed dorit notes

dorit accepted this revision.Nov 15 2017, 11:55 PM

You missed just one mcpu=skylake :)
LGTM with this change

This revision is now accepted and ready to land.Nov 15 2017, 11:55 PM
This revision was automatically updated to reflect the committed changes.