Continuing on D8883 and D8884, now that (if?) we decided on the (vector_shuffle (bitcast (scalar_to_vector))) form, we can further try to use the wider shuffle element type that was the input of the scalar_to_vector, if the shuffle mask permits it.
In practice this lets us recognize special patterns (see the DUP testcase) without needing to teach everything to deal with "this might be an i32 DUP shuffle, but is expressed in terms of i8 <0,1,2,3> sequences"