I have tried to produce a reduced testcase that shows the operand reordering issue, and was only able to reduce it to a 8-wide vectorization. In the added test when the following IR change is made, vectorization successfully occurs. The question is can SLP be made to recognize this and proceed on its own.
Change:
%add = add nuw nsw i32 %conv10, %conv7
To:
%add = add nuw nsw i32 %conv7, %conv10