Care is taken to order operands from least hoistable to most hoistable and to process subexpressions in the same
order.
This allows exposing more oppportunities for licm, cse and strength reduction.
Such a step should typically be applied while we still have loops in the IR and just before lowering affine ops to arith.
This is because the affine.apply canonicalization currently tries to maximally compose chains of affine.apply operations
and could undo the effects of these decompositions.
Depends on: D145784
Nit: I think to_vector no longer needs the number of stack elements.