This is an archive of the discontinued LLVM Phabricator instance.

[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 8 bit when have AVX512BW+AVX512VBMI
ClosedPublic

Authored by lebedev.ri on Nov 17 2021, 1:27 AM.

Details

Summary

If in addition to AVX512BW (that provides {k}<->{i8,i16} casts and i16 shuffles),
we have AVX512VBMI, which provides i8 shuffles, we are in an optimal situation.

Diff Detail

Event Timeline

lebedev.ri created this revision.Nov 17 2021, 1:27 AM
RKSimon added inline comments.Nov 19 2021, 1:59 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3677

AVX512F will use (pretty awful) vXi32 shuffles: https://simd.godbolt.org/z/YYzjaf7Wh

lebedev.ri added inline comments.Nov 19 2021, 2:01 AM
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3677

Yes, those are pretty awful, i'm not sure if there's much hope for plain AVX512F,
we need AVX512BW or AVX512DQ for this i'd say.

RKSimon accepted this revision.Nov 19 2021, 2:35 AM

LGTM (VBMI)

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3677

Its probably worth adding them instead of scalarization bailout though.

This revision is now accepted and ready to land.Nov 19 2021, 2:35 AM

LGTM (VBMI)

Thank you for the review!

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3677

I mean, yes, it is just not obvious to me how to do that without hardcoding them.