This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] Load/store i8 Stride=2 VF=32 interleaving costs
ClosedPublic

Authored by lebedev.ri on Sep 29 2021, 6:07 AM.

Details

Summary

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xz6x7c35P - for intels Block RThroughput: =6.0; for ryzens, Block RThroughput: <=2.5
So pick cost of 6.

For store we have:
https://godbolt.org/z/xz6x7c35P - for intels Block RThroughput: =4.0; for ryzens, Block RThroughput: <=2.0
So pick cost of 4.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Diff Detail

Event Timeline

lebedev.ri created this revision.Sep 29 2021, 6:07 AM
lebedev.ri retitled this revision from Edit Revision D110708: [X86][Costmodel] Load/store i8 Stride=2 VF=32 interleaving costs to [X86][Costmodel] Load/store i8 Stride=2 VF=32 interleaving costs .Sep 29 2021, 6:09 AM
RKSimon accepted this revision.Sep 29 2021, 9:11 AM

LGTM

This revision is now accepted and ready to land.Sep 29 2021, 9:11 AM

LGTM

Thank you for the reviews!
Are there issues with VF=16 (D110708)?

After this, i will need to check double check but i believe, i will look into stride=2 for i32 and i64.
(i'm not just mass-adding these, at least so far, but only those that are queried in real code, and are missing)

LGTM

Thank you for the reviews!
Are there issues with VF=16 (D110708)?

Sorry I missed that one!

After this, i will need to check double check but i believe, i will look into stride=2 for i32 and i64.
(i'm not just mass-adding these, at least so far, but only those that are queried in real code, and are missing)

Improving i32/i64 + f32/f64 test + costs coverage would be great.

Do you happen to know the relationship between these cost files and the interleaved test files in codegen\x86? They also have poor target coverage.

LGTM

Thank you for the reviews!
Are there issues with VF=16 (D110708)?

Sorry I missed that one!

After this, i will need to check double check but i believe, i will look into stride=2 for i32 and i64.
(i'm not just mass-adding these, at least so far, but only those that are queried in real code, and are missing)

Improving i32/i64 + f32/f64 test + costs coverage would be great.

Do you happen to know the relationship between these cost files and the interleaved test files in codegen\x86? They also have poor target coverage.

Define relationship? I just add new consistent test coverage (both codegen and costmodel) following a consistent naming model,
so we already have that coverage elsewhere, it should be removed in favor of the new coverage.
As for codegen tests, yes, there i only test AVX2 currently. We could extend it to test other ISA's.

This revision was landed with ongoing or failed builds.Sep 29 2021, 11:53 AM
This revision was automatically updated to reflect the committed changes.