This revision proposes a different implementation of the SplitReductoin transformation that does
not rely on tensor::ExpandShapeOp.
Previously, a dimension [k] would be split into [k][kk] via an ExpandShapeOp.
Instead, this revision proposes to rewrite [k] into [factor * k + kk].
There are different tradeoffs involved but the proposed implementation is more general because
the affine rewrite is well-defined. In particular, it works naturally with ? parallel dimensions and
non-trivial indexing maps.
A further rewrite of [factor * k + kk] + ExpandShapeOp is possible as a followup.
%arg5 is not used in the computation. Is this intentional?