This patch solves problem that bufferization pass generate dealloc on inappropriate location.
Let me know if you need RFC on this patch, thanks for your time.
In fact, this patch can not pass test. If input IR is:
mlir #map = affine_map<(d0) -> (d0 * 5)> func.func @ternimator_use_not_deallocated(%arg0: tensor<10x10xf32>) -> tensor<10x10xf32> { %0 = scf.forall (%arg1, %arg2) in (2, 2) shared_outs(%arg3 = %arg0) -> (tensor<10x10xf32>) { %1 = bufferization.alloc_tensor() : tensor<5x5xf32> %2 = affine.apply #map(%arg1) %3 = affine.apply #map(%arg2) scf.forall.in_parallel { tensor.parallel_insert_slice %1 into %arg3[%2, %3] [5, 5] [1, 1] : tensor<5x5xf32> into tensor<10x10xf32> } } return %0 : tensor<10x10xf32> }
when bufferize with param:
-one-shot-bufferize="allow-unknown-ops copy-before-write"
will add copy before tensor.parallel_insert_slice and cause failure. Please kindly show me how to solve this problem, thanks.