The basic idea is simple, if we don't have native shuffle for this element type,
then we must have native shuffle for wider element type,
so promote, replicate, demote.
I believe, asking getCastInstrCost(Instruction::Trunc is correct semantically,
case in point trunc <32 x i32> to <32 x i8> aka 2 * ZMM will naively result in
2 * XMM, that then will be packed into 1 * YMM,
and it should count the cost of said packing,
not just the truncations.
I assume 'Eff' is short for effective? But we talk about it in terms of promotion - can you use a consistent term, I don't mind which.