This revision allows representing a minimal reduction at the level of linalg on tensors.
When a structured op has a reduction and returns tensor(s), the conventions are:
- it can only return a single tensor
- it cannot have any output buffer operand
- as a consequence of points 1. + 2., it must have exactly one output
- its last input argument must be a tensor of the same shape and with the same indexing map as its output.
Points 1-3 keep complexity of the representation in check by allowing only 1 result tensor, when reductions are present.
Point 4 is related to the fact that SSA values cannot represent in-place updates.
Instead, linalg adopts a similar convention that exists in e.g. vector.outerproduct: the value that is reduced into is passed as an explicit argument and a new result of the same shape is produced.
It is expected buffer allocation will fold this last input onto the result in a single output buffer, which is why linalg require the same indexing map: the last input operand is "tied" to the result.
An alternative, more complex representation, would allow for multiple results and arbitrary tied input/result pairs as well as relaxing the conditions on the indexing map equalities on the pairs. This is deemed unnecessarily complex for now and is left for a future discussion.
Does the tag actually work?