Canonicalizes the pattern

%0 = tensor.insert %scalar into %t1[...] : (scalar tensor type) %1 = tensor.insert_slice %0 into %t2[<indices>]

into

%1 = tensor.insert %scalar into %t2[<indices>]

This has a side effect on bufferization: prior to change canonicalization, the

IR below would result in two allocations (even with empty tensor elimination),

whereas afterwards it results in just the creation of two `memref.store` ops:

func.func @func(%arg0 : f32, %arg1: f32, %arg2: tensor<4xf32>) -> (tensor<4xf32>) { %c0 = arith.constant 0 : index %c2 = arith.constant 2 : index %c3 = arith.constant 3 : index %e1 = tensor.empty() : tensor<1xf32> %e2 = tensor.empty() : tensor<f32> %0 = tensor.insert %arg0 into %e1[%c0] : tensor<1xf32> %1 = tensor.insert %arg1 into %e2[] : tensor<f32> %2 = tensor.insert_slice %0 into %arg2[%c2][1][1] : tensor<1xf32> into tensor<4xf32> %3 = tensor.insert_slice %1 into %2[%c3][1][1] : tensor<f32> into tensor<4xf32> return %3 : tensor<4xf32> }