Page MenuHomePhabricator

[X86][Costmodel] Load/store i16 Stride=6 VF=16 interleaving costs
ClosedPublic

Authored by lebedev.ri on Sep 27 2021, 2:39 PM.

Details

Summary

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.

For load we have:
https://godbolt.org/z/5qGb9odP6 - for intels Block RThroughput: <=106.0; for ryzens, Block RThroughput: <=34.8
So pick cost of 106.

For store we have:
https://godbolt.org/z/KrWcv4Ph7 - for intels Block RThroughput: =58.0; for ryzens, Block RThroughput: <=20.5
So pick cost of 58.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Sep 27 2021, 2:39 PM
RKSimon accepted this revision.Sep 28 2021, 8:47 AM

LGTM

This revision is now accepted and ready to land.Sep 28 2021, 8:47 AM

LGTM

@RKSimon thank you for the speedy reviews!

LGTM

@RKSimon thank you for the speedy reviews!

Once these are in I have a WIP patch to add SSE2/AVX1/AVX512BW test coverage (with their terrible costs)

@RKSimon this is pretty embarrassing, but looks like i have uploaded the patch for VF=8 here.
The analysis / patch description is indeed for VF=16, but the diff isn't.
I'm going to guess this is fine, but i'll try to be more careful.
Sorry!

This revision was landed with ongoing or failed builds.Sep 28 2021, 9:16 AM
This revision was automatically updated to reflect the committed changes.