This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Load/store i32/f32 Stride=4 VF=16 interleaving costs
ClosedPublic

Authored by lebedev.ri on Oct 4 2021, 8:27 AM.

Details

Summary

This one required quite a bit of assembly surgery, but the trend continues, so i think this is right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/EKWdj8cKT - for intels Block RThroughput: <=32.0; for ryzens, Block RThroughput: <=24.0
So could pick cost of 32.

For store we have:
https://godbolt.org/z/zj4bb9P75 - for intels Block RThroughput: =32.0; for ryzens, Block RThroughput: <=16.0
So we could pick cost of 32.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Oct 4 2021, 8:27 AM
lebedev.ri updated this revision to Diff 376907.Oct 4 2021, 8:30 AM
lebedev.ri edited the summary of this revision. (Show Details)

Rebased, NFC.

RKSimon accepted this revision.Oct 4 2021, 8:57 AM

LGTM

This revision is now accepted and ready to land.Oct 4 2021, 8:57 AM

LGTM

Thank you for the reviews!