Canonicalize the iter_args of an scf::ForOp that involve a tensor_load and
for which only the last loop iteration is actually visible outside of the
loop. The canonicalization looks for a pattern such as:
%t0 = ... : tensor_type %0 = scf.for ... iter_args(%bb0 : %t0) -> (tensor_type) { ... // %m is either tensor_to_memref(%bb00) or defined above the loop %m... : memref_type ... // uses of %m with potential inplace updates %new_tensor = tensor_load %m : memref_type ... scf.yield %new_tensor : tensor_type }
%bb0 may have either 0 or 1 use. If it has 1 use it must be exactly a
%m = tensor_to_memref %bb0 op that feeds into the yielded tensor_load
op.
If no aliasing write of %new_tensor occurs between tensor_load and yield
then the value %0 visible outside of the loop is the last tensor_load
produced in the loop.
For now, we approximate the absence of aliasing by only supporting the case
when the tensor_load is the operation immediately preceding the yield.
The canonicalization rewrites the pattern as:
// %m is either a tensor_to_memref or defined above %m... : memref_type scf.for ... { // no iter_args ... // uses of %m with potential inplace updates } %0 = tensor_load %m : memref_type
This revision also exhibits the incorrect behavior of the folding:
tensor_load(tensor_to_memref(x)) -> x which ignores aliasing.
The test is altered to pass despite the erroneous folding, but should be revisited once the folding is hardened.
Where are you checking that it feeds into the tensor_load?