mtvsrdd was introduced in ISA 3.0 which moves two GPRs into a vector in single instruction. So we can use that to reduce instructions building vector from elements. Take v8i16 as example (u for undef, others for elements):
u u u a <-- original elements u u u b ... u u u a u u u b <-- mtvsrdd u u u c u u u d ... u a u b u c u d <-- vpkudum u e u f u g u h a b c d e f g h <-- vpkuwum
In theory, this applies for vectors from v2i64 to v16i8. However, rldimi+vpkudum is better codegen for v4i32.