Canonicalizes the pattern
%0 = tensor.insert %scalar into %t1[...] : (scalar tensor type) %1 = tensor.insert_slice %0 into %t2[<indices>]
into
%1 = tensor.insert %scalar into %t2[<indices>]
This has a side effect on bufferization: prior to change canonicalization, the
IR below would result in two allocations (even with empty tensor elimination),
whereas afterwards it results in just the creation of two memref.store ops:
func.func @func(%arg0 : f32, %arg1: f32, %arg2: tensor<4xf32>) -> (tensor<4xf32>) { %c0 = arith.constant 0 : index %c2 = arith.constant 2 : index %c3 = arith.constant 3 : index %e1 = tensor.empty() : tensor<1xf32> %e2 = tensor.empty() : tensor<f32> %0 = tensor.insert %arg0 into %e1[%c0] : tensor<1xf32> %1 = tensor.insert %arg1 into %e2[] : tensor<f32> %2 = tensor.insert_slice %0 into %arg2[%c2][1][1] : tensor<1xf32> into tensor<4xf32> %3 = tensor.insert_slice %1 into %2[%c3][1][1] : tensor<f32> into tensor<4xf32> return %3 : tensor<4xf32> }
Seems to me you either need to check that the tensor.insert indices are all 0 or that you need to combine them together no ?
Additionally, have you looked at mlir/lib/Dialect/Tensor/Transforms/FoldTensorSubsetOps.cpp ?
It seems you could add a new pattern there ?
Note that those patterns are not canonicalizations because they are not always desirable, so we made them opt-in.