This patch attempts to optimize the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.
At the moment the inputs are only combined if they are only being used once - I'm interested in extending this so that constant inputs are always combined. It would create more constant data but would remove more shuffles (which may be introducing their own constant data for masks anyhow). Comments please.
Also removed a x86 insertps test that was testing for the old shuffle lowering system.