Rewrite tensor::ExtractSliceOp(vector::TransferWriteOp) to vector::TransferWriteOp(tensor::ExtractSliceOp) if the full slice is overwritten and inserted into another tensor. After this rewrite, the operations bufferize in-place since all of them work on the same %iter_arg slice.
For example:
mlir %0 = vector.transfer_write %vec, %init_tensor[%c0, %c0] : vector<8x16xf32>, tensor<8x16xf32> %1 = tensor.extract_slice %0[0, 0] [%sz0, %sz1] [1, 1] : tensor<8x16xf32> to tensor<?x?xf32> %r = tensor.insert_slice %1 into %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1] : tensor<?x?xf32> into tensor<27x37xf32>
folds to
mlir %0 = tensor.extract_slice %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1] : tensor<27x37xf32> to tensor<?x?xf32> %1 = vector.transfer_write %vec, %0[%c0, %c0] : vector<8x16xf32>, tensor<?x?xf32> %r = tensor.insert_slice %1 into %iter_arg[%iv0, %iv1] [%sz0, %sz1] [1, 1] : tensor<?x?xf32> into tensor<27x37xf32>