This is a tiny patch with a big pile of test changes.
My motivating case looks like this:
- vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
- vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
- vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
+ vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
And this happens several times in the diffs. I think the instruction count and size reduction overcomes any potential domain-crossing penalty due to using an FP op in a sequence of int ops, but let me know if you see problem cases.
I think these are all improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where I'm not sure what is happening yet.