Rewrite tensor::ExtractSliceOp(vector::TransferWriteOp) to vector::TransferWriteOp(tensor::ExtractSliceOp) if the full slice is overwritten and inserted into another tensor. After this rewrite, the operations bufferize in-place since all of them work on the same %iter_arg slice.
For example:
mlir
%0 = vector.transfer_write %vec, %init_tensor[%c0, %c0]
: vector<8x16xf32>, tensor<8x16xf32>
%1 = tensor.extract_slice %0[0, 0] [%sz0, %sz1] [1, 1]
: tensor<8x16xf32> to tensor<?x?xf32>
%r = tensor.insert_slice %1 into %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1]
: tensor<?x?xf32> into tensor<27x37xf32>folds to
mlir
%0 = tensor.extract_slice %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1]
: tensor<27x37xf32> to tensor<?x?xf32>
%1 = vector.transfer_write %vec, %0[%c0, %c0]
: vector<8x16xf32>, tensor<?x?xf32>
%r = tensor.insert_slice %1 into %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1]
: tensor<?x?xf32> into tensor<27x37xf32>