As a result of recent work on the reassociation pass, I noticed the ordering of operands effected the behavior of the SLP vectorizer. The ordering can change how the expression tree is built and in turn change the cost of the tree. In the provided test case, derived from spec2k/mesa, the ordering of the load instructions (in the expression tree) are reversed, so the loads are gathered, rather than vectorized.
This patch attempts to resolve this issue by canonicalizing the operands of commutitive instructions based on source order. The result is an expression tree that more closely mirrors the instruction source order. Currently the canonicalization happens to both the instruction as well as the expression tree. However, only the latter is necessary to address this issue; I have no objection to leaving the instruction operands as is.
I'm sure a more robust solution exists, but I decided to begin with the simplest solution because I think it gets a fairly good bang for the buck, is safe, and is maintainable.
Correctness runs look good. Performance runs (AArch64/A53) also look good; no regressions and a few minor improvements.
Please take a look!