It was previously assumed that tensor.insert_slice should be bufferized first in a greedy fashion to avoid out-of-place bufferization of the large tensor. This heuristic does not hold upon further inspection.
This CL removes the special handling of such ops and adds a test that exhibits better behavior and appears in real use cases.
The only test adversely affected is an artificial test which results in a returned memref: this pattern is not allowed by comprehensive bufferization in real scenarios anyway and the offending test is deleted.
Less code is always good but I'm wondering if there any important cases that see a performance degradation (more out-of-place bufferization) because of this.