This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Add a pattern to vectorize convolution ops
AbandonedPublic

Authored by antiagainst on Oct 13 2021, 8:14 AM.

Details

Summary

This commit adds a pattern to vectorize Linalg convolution
ops implementing the ConvolutionOpInterface. It is able to
handle n-D convolution with different layouts and strides.
The pattern directly lowers the convolution op into vector
transfer read/write/contract ops by effectively unrolling
convolution output window dimensions.

This is just one step in the convolution CodeGen, where we
expect to go from a high-level convolution op, convert it
to a proper linalg.conv named op, tile the linalg.conv op
a few times, and then tile the filter window dimensions by
size 1. Afterwards this pattern can kick in. So it is
assuming all size-1 filter window dimensions.

There are certainly other ways we can lower convolution
ops, bigger divergent approaches like img2col, indirect
convolution, fourier or Winograd implementations. This
approach is orthogonal to those and does not preclude them.

Though, this pattern enables a direct tiling and vectorization
path, which assumes minimal capabilities from the hardware
(just vector load/store/arithmetics able to support matmul),
so it's expected to be a good default for various targets
fitting that.

Depends On D111721

Diff Detail

Event Timeline

antiagainst created this revision.Oct 13 2021, 8:14 AM
antiagainst requested review of this revision.Oct 13 2021, 8:14 AM
mehdi_amini added inline comments.Oct 13 2021, 10:01 AM
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
51

You wrote in the description that:

There are certainly other ways we can lower convolution
ops, bigger divergent approaches like img2col, indirect
convolution, fourier or Winograd implementations.

But how would these different strategies pan out behind this API (which seems totally generically named)

asaadaldien added inline comments.Oct 13 2021, 10:36 AM
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
51

Once we tile the filter by 1 this yields a fixed (sum-of-contraction) form -- a sum of matmul when window_strides is 1 --
So maybe we rename this to populateConvolutionToSumOfVectorContractPatterns ?

asaadaldien added inline comments.Oct 13 2021, 10:42 AM
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
51

or maybe populateConvolutionWithUnitFilterDimsVectorizationPatterns ?

I am sorry but I don't think this goes in the direction we have been discussing for the past few weeks.
I sent a version that mirrors what I explained in the past here: https://reviews.llvm.org/D111894.
That impl. only does the 1-D case and is aimed for iterating before we decide what should go in the vector dialect (e.g. a vector.conv op).

I am sorry but I don't think this goes in the direction we have been discussing for the past few weeks.

You mean the implementation of the pattern, not the generated IR right? I thought we agreed that the generated IR is good for a first iteration.

I'm not particularly attached to how the pattern itself is implemented. So whatever you think is better. (As I said previously it's mostly due to me not very familiar with some of the recent Linalg metaprogramming tricks..)

I sent a version that mirrors what I explained in the past here: https://reviews.llvm.org/D111894.
That impl. only does the 1-D case and is aimed for iterating before we decide what should go in the vector dialect (e.g. a vector.conv op).

Nice, thanks!

antiagainst added inline comments.Oct 18 2021, 8:50 AM
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
51

But how would these different strategies pan out behind this API (which seems totally generically named)

This is an API in Linalg/Transforms and it only concerns Linalg convolution op vectorization. So I believe that scope properly limits it to what it is meant for: a single dedicated step for vectorization Linalg convolution ops. The other approaches with big directional differences should be handled at other levels; so they should not need to fit in here.

Once we tile the filter by 1 this yields a fixed (sum-of-contraction) form -- a sum of matmul when window_strides is 1 --

The plan is to gradually support more and more cases. So eventually it will be more than what it supports here. But also sounds good to me to limit the scope of the API here and rename to be more generic later.

nicolasvasilache resigned from this revision.Oct 29 2021, 8:10 AM

We synchronized offline with @antiagainst and this revision will be abandonned in favor of a simple conv2d with extent 1 -> conv1d rewrite at the linalg level.
This will compose with tiling by setting size 1 in the right places + the new pattern + conv1d vectorization.
We can then test that the whole stack composes and reaches peak before deciding where to go next.

Declaring review bankruptcy on this revision.

antiagainst abandoned this revision.Nov 2 2021, 7:06 AM