This op canonicalizes away when the tensor operand has been bufferized.
This op is needed for partial bufferization of ops such as linalg.tiled_loop. Such ops can yield tensor values but not memref values. After bufferizing the loop (including the terminator) but not the loop body, ops in the loop body can DCE away because they no longer have any uses.
Example before bufferization (simplified):
linalg.tiled_loop (%i) = (%c0) to (%c24) step (%c4) ins(%t0 : tensor<?xf32>) outs(%t1 : tensor<?xf32>) { ... %0 = tensor.insert %f into %t0[...] : tensor<?xf32> linalg.yield %0 }
After bufferization of linalg.tiled_loop:
linalg.tiled_loop (%i) = (%c0) to (%c24) step (%c4) ins(%m0 : memref<?xf32>) outs(%m1 : memref<?xf32>) { ... %t0 = bufferization.to_tensor %m0 %0 = tensor.insert %f into %t0[...] : tensor<?xf32> linalg.yield }
Now the tensor.insert op can DCE away because it has no uses. This can be avoided by inserting the new op.
... %0 = tensor.insert %f into %t0[...] : tensor<?xf32> bufferization.wait_for_bufferization %0 : tensor<?xf32> linalg.yield }
Note: WaitForBufferizationOp is also needed for a subsequent commit that switches the custom IR traversal of Comprehensive Bufferize to RewritePatterns (not dialect conversion but regular rewrite patterns). In that case, tensor.insert_slice ops that have a matching tensor.extract_slice op could DCE away.
Note: WaitForBufferizationOp is purpusely a new op in the bufferization dialect (as opposed to an anonymous/unnamed op) because it can survive partial bufferization. When used in Comprehensive Bufferize (One-Shot) bufferize, all WaitForBufferizationOps should have disappeared by the time bufferization is done (unless allow-unknown-ops). Other cases are considered a bufferization failure.
Depends On D116446