For <2 x s32>, we can use G_DUPLANE32, but with a <4 x s32> source. To make it work, we can just widen the original source with a concat_vectors.
Doing this allows <2 x float> indexed fmul instruction selection patterns to fire, which gives a nice 0.3% code size saving on Bullet with -Os.