Here we get pretty lucky. AVX512F does not provide any instructions
to convert between a k vector mask and a vector,
but AVX512BW adds {k}<->nX{i8,i16}conversions,
and just as it happens, with AVX512BW we have a i16 shuffle.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
To note this is basically the final batch, the only things left are
enabling i1->i8 promotion when +VBMI, i1->i32 when +DQ,
and then changing the getInterleavedMemoryOpCostAVX512()
to query the cost of replication of i1 mask, and i believe D111460 becomes unblocked.