This diff enforces that the last operands correspond to the result tensors. In
reduction, we need to take result tensors as input during the calculation. Thus,
we make the block/function take more arguments to handle it.
I think this is really a stop-gap measure, and has some semantics issues cause tensors are SSA values, and this is effectively implementing an in-place update semantics on tensors. I might be mistaken, but this only happens for cases where the iterator_types are "reduction". One way to get around this might be to use the approach in https://bugs.llvm.org/show_bug.cgi?id=44777
There are unsolved issues going from the buffer world to the tensor world when reductions are involved.
Changing the semantics of all the ops, including ones that don't have reductions, is not the right way to go here.
We need a proper way to tie a tensor result to a tensor operand.
We don't have a good mechanism for this atm and this should be discussed with the bigger group.
If the objective is to unblock yourself, you could add an attribute to encode the convention that result p flows into operand k.
You can make this attribute whatever you want to encode this in your local work and ensure that the lowering to buffers does the right mapping when seeing the attribute.
None of this should change the semantics of the ops: this has farther reaching consequences that we do not understand yet.
It is possible that an extra region that operates at the level of tensors is needed to encode this cleanly, along the lines of what Mahesh proposes in: https://bugs.llvm.org/show_bug.cgi?id=44777.