Given a shuffle feeding a reduction, the lane ordering of the shuffle will not alter the result. This is also true if there are a number of operations between the reduction and the shuffle, providing they only operate lane-wise. This patch searches for cases like that in Vector Combine, allowing us to check the cost of the shuffle vs an in-order identity shuffle and replace the order of possible. This only handles a single shuffle at the moment to keep things simple, and is able to ignore splats that produce results where every result is the same.
This is a more powerful version of a combine that already happens in instrcombine, capable of optimizing more cases by looking through more instructions and being able to cost the shuffle.
Does this also need to check if the binary operation is commutative? It may be a good idea to add a test with a non-commutative reduction and a qualifying shuffle, to make sure this function's behaviour isn't broken in the future.