This change adds a new flag to One-Shot Bufferize: privatize-buffers-in-loops
This replaces resolveUsesInRepetitiveRegions in TensorCopyInsertion, which was a workaround around certain unsupported IR in One-Shot Bufferize: Cases where a tensor is used as an init_arg and where the same tensor is also used inside of the loop. Such IR bufferizes correctly since D135049 (even without the workaround), but in a way where the init_arg operand bufferizes out-of-place, which is usually not desirable. For that reason, the workaround was kept in place even after D135049.
The workaround in TensorCopyInsertion does not take into account buffer aliasing and was therefore not comprehensive. Some IR used to bufferize incorrectly before D135049 and some IR bufferizes in an undesirable way after D135049. With this change, init_args of loops bufferize in-place in most cases as long as privatize-buffers-in-loops is activated.
Example IR:
%r = scf.for ... iter_args(%0 = %t) -> tensor<?xf32> { %read = tensor.extract %t[%idx] ... }
TensorCopyInsertion without this change (but with the workaround in TensorCopyInsertion):
%t_copy = bufferization.alloc_tensor() copy(%t) : tensor<?xf32> %r = scf.for ... iter_args(%0 = %t_copy) -> tensor<?xf32> { %read = tensor.extract %t[%idx] ... }
TensorCopyInsertion with this change (and without the workaround):
%t_copy = bufferization.alloc_tensor() copy(%t) : tensor<?xf32> %r = scf.for ... iter_args(%0 = %t) -> tensor<?xf32> { %read = tensor.extract %t_copy[%idx] ... }
All uses of %t inside of the loop are "privatized", i.e., using a copy of %t.
Loop privatization is implemented as an extension of the BufferizableOpInterface implementation of scf.for and scf.foreach_thread (scf.while not yet implemented). No further changes to One-Shot Analysis are needed.
Depends On D135056