Tensor pack operations are optimistically lowered to pad + insert_slice
when the pack operation only pads the input tensor. The existing
lowering emits insert_slice operations which do not meet the
rank-reducibility requirements of insert_slice.
This change updates the logic in linalg::lowerPack to first check the
rank-reducibility requirement. When the requirement is not met, the
lowering will emit the full sequence of pad + expand + transpose.
nit: tot -> to