This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Load/store i8 Stride=2 VF=4 interleaving costs
ClosedPublic

Authored by lebedev.ri on Sep 29 2021, 5:52 AM.

Details

Summary

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

Identical to VF=2.

For load we have:
https://godbolt.org/z/sGE41GYo7 - for intels Block RThroughput: =2.0; for ryzens, Block RThroughput: <=1.0
So pick cost of 2.

For store we have:
https://godbolt.org/z/ba5r3s9xa - for intels Block RThroughput: =1.0; for ryzens, Block RThroughput: <=0.5
So pick cost of 1.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Sep 29 2021, 5:52 AM

Before we go any further - do we gain anything by updating BaseT::getInterleavedMemoryOpCost to use getScalarizationOverhead?

Before we go any further - do we gain anything by updating BaseT::getInterleavedMemoryOpCost to use getScalarizationOverhead?

It depends on your definition of "anything". It basically doesn't help,
because the only improvement that gives us not overestimating XMM subreg insertion/extraction from YMM,
The costs still bogusly high.

lebedev.ri added a comment.EditedSep 29 2021, 6:57 AM

(I've posted proof of concept: D110713)

Before we go any further - do we gain anything by updating BaseT::getInterleavedMemoryOpCost to use getScalarizationOverhead?

It depends on your definition of "anything". It basically doesn't help,
because the only improvement that gives us not overestimating XMM subreg insertion/extraction from YMM,
The costs still bogusly high.

FWIW i agree that this is not really great, but what we need is D100486,
and afterwards we should materialize shuffle masks and query their costs.
I was planning on doing that, as i have noted previously in some review.
But even that will overestimate the cost.

RKSimon accepted this revision.Sep 29 2021, 9:15 AM

LGTM

This revision is now accepted and ready to land.Sep 29 2021, 9:15 AM
This revision was landed with ongoing or failed builds.Sep 29 2021, 11:53 AM
This revision was automatically updated to reflect the committed changes.