Code that deals with small vectors but worries about performance (is hardware-aware) often passes the vectors around (on some targets as scalars, thanks to type coercion), but immediately shuffles them into a native-width vector, to do proper operations that we can't reasonably do a horrible job at. Promoting small vector types leads to some really nasty stuff, and people actively try to avoid that now ;)
Anyway, this means a pretty common pattern is this:
(vector_shuffle (v8i8 concat_vectors (v2i8 bitcast (i16)), undef..), M)
which we'll usually end up scalarizing. Instead, we can turn it into:
(vector_shuffle (v8i8 bitcast (v4i16 scalar_to_vector (i16))), M)
which lets us deal with native-width types all the time (this should be the foremost canonicalization goal whenever dealing with vectors IMHO, but I digress).