Bubble up extract_slice above Linalg operation.
A sequence of operations
%0 = linalg.<op> ... arg0, arg1, ... %1 = tensor.extract_slice %0 ...
can be replaced with
%0 = tensor.extract_slice %arg0 %1 = tensor.extract_slice %arg1 %2 = linalg.<op> ... %0, %1, ...
This results in the reduce computation of the linalg operation.
The implementation uses the tiling utility functions. One difference
from the tiling process is that we don't need to insert the checking
code for the out-of-bound accesses. The use of the slice itself
represents that the code writer is sure about the boundary condition.
To avoid adding the boundary condtion check code, addBoundaryCheck
is introduced for the tiling utility functions.
nit: typo