We'd like to take a progressive approach towards Fconvolution op
CodeGen, by 1) tiling it to fit compute hierarchy first, and then
- tiling along window dimensions with size 1 to reduce the problem
to be matmul-like. After that, we can 3) downscale high-D convolution
ops to low-D by removing the size-1 window dimensions. The final
step would be 4) vectorizing the low-D convolution op directly.
We have patterns for 1), 2), and 4). This commit adds a pattern for
- for linalg.conv_2d_nhwc_hwcf ops as a starter. Supporting other
high-D convolution ops should be similar and mechanical.
We can iterate from this to get started but note that there are implications on bufferization.
Alternatively we could use rank-reducing InsertSliceOp / ExtractSliceOp; this may be needed for proper inplace bufferization but let's punt for now until we see the whole end-to-end story.