Page MenuHomePhabricator

[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 16 bit when have AVX512BW
ClosedPublic

Authored by lebedev.ri on Mon, Nov 15, 10:04 AM.

Details

Summary

Here we get pretty lucky. AVX512F does not provide any instructions
to convert between a k vector mask and a vector,
but AVX512BW adds {k}<->nX{i8,i16}conversions,
and just as it happens, with AVX512BW we have a i16 shuffle.

Diff Detail

Event Timeline

lebedev.ri created this revision.Mon, Nov 15, 10:04 AM

To note this is basically the final batch, the only things left are
enabling i1->i8 promotion when +VBMI, i1->i32 when +DQ,
and then changing the getInterleavedMemoryOpCostAVX512()
to query the cost of replication of i1 mask, and i believe D111460 becomes unblocked.

lebedev.ri retitled this revision from X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 16 bit when have AVX512BW to [X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 16 bit when have AVX512BW.Tue, Nov 16, 12:12 PM
RKSimon accepted this revision.Fri, Nov 19, 2:36 AM

LGTM - BW + AV512F

This revision is now accepted and ready to land.Fri, Nov 19, 2:36 AM

LGTM - BW + AV512F

Thank you for the review!