Page MenuHomePhabricator

[mlir] Conv ops vectorization pass
ClosedPublic

Authored by limo1996 on Aug 26 2020, 6:31 AM.

Details

Summary

In this commit a new way of convolution ops lowering is introduced.
The conv op vectorization pass lowers linalg convolution ops
into vector contractions. This lowering is possible when conv op
is first tiled by 1 along specific dimensions which transforms
it into dot product between input and kernel subview memory buffers.
This pass converts such conv op into vector contraction and does
all necessary vector transfers that make it work.

PHAB_REVIEW=D86619

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
limo1996 requested review of this revision.Aug 26 2020, 6:31 AM
rriddle requested changes to this revision.Aug 26 2020, 6:45 AM

Please run clang-format.

Please also add non-integration FileCheck tests in test/Conversion that test the expected output of the patterns you have added.

Thanks.

mlir/include/mlir/Conversion/LinalgToVector/LinalgToVector.h
16 ↗(On Diff #287956)

Is this include necessary?

mlir/include/mlir/Conversion/LinalgToVector/LinalgToVectorPass.h
16 ↗(On Diff #287956)

Is this include necessary? Can you forward declare instead?

mlir/include/mlir/Conversion/Passes.td
197 ↗(On Diff #287956)

Please declare the dependent dialects of this pass, i.e. what you will generate.

mlir/lib/Conversion/LinalgToVector/LinalgToVector.cpp
17 ↗(On Diff #287956)

nit: Use a constexpr constant instead of a define.

24 ↗(On Diff #287956)

Functions should be placed in the top-level namespace and be marked static, only classes should go in anonymous namespaces.

89 ↗(On Diff #287956)

nit: Please spell out all of these types instead of using auto.

102 ↗(On Diff #287956)

nit:

  • Use unsigned or an equivalent instead.
  • Drop trivial braces
  • Cache the end iterator of for loops.
107 ↗(On Diff #287956)

nit: Use normal variable declarations:

SmallVector<Value, 4> zeros(rank, ...);

120 ↗(On Diff #287956)

nit: Use an std::array/C array when the size is constant.

120 ↗(On Diff #287956)

nit: The names here and below should use camelCase.

This revision now requires changes to proceed.Aug 26 2020, 6:45 AM
limo1996 marked 8 inline comments as done.Aug 26 2020, 8:05 AM
limo1996 added inline comments.
mlir/include/mlir/Conversion/LinalgToVector/LinalgToVector.h
16 ↗(On Diff #287956)

Yes. Otherwise it complains about member access into incomplete type 'mlir::PatternRewriter'. Moved it to .cpp file.

mlir/include/mlir/Conversion/LinalgToVector/LinalgToVectorPass.h
16 ↗(On Diff #287956)

Done. But had to include "mlir/Transforms/DialectConversion.h" as done in other passes bcos it was complaining about std::.

limo1996 updated this revision to Diff 287977.Aug 26 2020, 8:05 AM

comments of rriddle addressed

rriddle added inline comments.Aug 26 2020, 8:07 AM
mlir/include/mlir/Conversion/LinalgToVector/LinalgToVectorPass.h
16 ↗(On Diff #287956)

You can #include <memory> for unique_ptr, there is nothing from DialectConversion used here.

limo1996 added inline comments.Aug 26 2020, 8:26 AM
mlir/include/mlir/Conversion/LinalgToVector/LinalgToVectorPass.h
16 ↗(On Diff #287956)

ConversionTarget and OwningRewritePatternList are used in .cpp file

rriddle added inline comments.Aug 26 2020, 8:28 AM
mlir/include/mlir/Conversion/LinalgToVector/LinalgToVectorPass.h
16 ↗(On Diff #287956)

Yes, you should move the include to the .cpp. Header files should really only include what is actually necessary for the header, and not what is required by the implementation.

limo1996 updated this revision to Diff 287994.Aug 26 2020, 8:38 AM

more comments of rriddle addressed

limo1996 marked 4 inline comments as done.Aug 26 2020, 8:38 AM
limo1996 added inline comments.
mlir/include/mlir/Conversion/LinalgToVector/LinalgToVectorPass.h
16 ↗(On Diff #287956)

Ok. Done.

limo1996 marked an inline comment as done.Aug 26 2020, 8:39 AM
limo1996 updated this revision to Diff 288006.Aug 26 2020, 9:25 AM

added missing add_subdirectory in CMakeLists

limo1996 updated this revision to Diff 288011.Aug 26 2020, 9:29 AM

fixed add_subdirectory

limo1996 updated this revision to Diff 288869.Aug 30 2020, 1:07 PM

Conversion tests added

@rriddle I think I've resolved all of your comments.. Do you have some more? Thanks.

Jakub, for my own understanding, did you coordinate with Nicolas before he left on vacation? The reason I am asking is that he had planned something very similar, and I want to make sure if this is the result of those ideas, or something else? Somewhat more background would help here.

I haven't sync with him regarding implementation details but I think he has an overview of what my goal is and how do I want to achieve it as he is my co-host.

This is not how I was imagining convolution to be lowered in a first stab.
I was expecting 1. taking subviews, 2. either using reshape or extending subviews to allow it to be rank-reducing, 3. rewrite as a linalg.matmul and reuse the existing vectorization patterns.

This revision wants to introduce a vectorization pass with assumptions that look fishy to me.
I have left some comments assuming we're taking the route of direct vectorization of convolutions.

In any case we should have a VC to discuss these assumptions that I have not followed and how they differ from the existing work based on promotion + static shapes + vectorization + forwarding patterns.

Also + @asaadaldien who has started looking at similar things for depthwise convolutions on mobile.

mlir/include/mlir/Conversion/LinalgToVector/LinalgToVectorPass.h
1 ↗(On Diff #288869)

It is way too premature to call this a new Linalg pass.
The type of assumptions that are hardcoded in the pass suggest that this should be a test pass in test/lib that applies the patterns.
Too many things can break to accept this as a core pass atm.

mlir/integration_test/Dialect/Linalg/Conv/test-conv-2d-call.mlir
51 ↗(On Diff #288869)

All these tests only work as expected because all the filters are 3x...x3.

mlir/lib/Conversion/LinalgToVector/LinalgToVector.cpp
30 ↗(On Diff #288869)

It should not just expect it: the pattern must fail to apply when the dimensions you expect to be 1 are not 1.
If you look at how the vectorization patterns for contraction ops are implemented, the precondition is that all shapes are static.
Then the proper vector.transfer and vector.contract are introduced with those sizes.

I find that the expectations of these conversions are adhoc and can lead to miscompiles since they are not checked.

82 ↗(On Diff #288869)

Please us UpperCamelCase for template parameters

82 ↗(On Diff #288869)

[StartOffset, N - EndOffset) only allow encoding a contiguous set of dimensions.
In practice this seems enough but I am wondering if it would be more idiomatic to just pass an array of booleans at construction time to make things more idiomatic and future-proof.

Then we can use that array as a filter to determine which dimensions are vectorized.
In particular, I don't see why the choice is made here to only vectorize reduction dimensions

89 ↗(On Diff #288869)

Why is the vector size hardcoded to always being 3 ?
This is generally incorrect (e.g. convolutions with size 5 kernels).

Note that this is a quite fundamental issue because this hardcoded 3 also serves as a magic typecast that turns a dynamic problem into a static one in:

%22 = vector.transfer_read %19[%c0, %c0, %c0, %c0], %c0_f32  : memref<?x?x?x?xf32>, vector<3x3x3xf32>
125 ↗(On Diff #288869)

This just assumes the only dimensions that will be used are [startOffset, startOffset + numDims) and that they are reductions.
This is prone to errors.

It would be better to have something more robust and more adaptive that extracts the information from the op and that performs necessary checks / assertions.

Additionally, it is unclear to me why the decision has been taken and hardcoded to always vectorize exactly the reduction dimensions.
Vectorizing only the reduction dimensions is not expected to perform well: one can only use flat reductions and misses out on FMAs (so the perf is capped to 50% peak from the get go). Instead I would expect that you'd want to also vectorize along batch and channel dimensions when available. This gives additional parallel dimensions which have a chance of getting near peak.

Still even with mixing parallel and reduction dimensions, I expect that reducing to linalg.matmul first is a simpler path to perf.

limo1996 updated this revision to Diff 290284.Sep 7 2020, 7:25 AM

Moved pass to test/lib.
Replaced start and end offset with Mask.
Introduced preconditions for input and kernel sizes.
Reflected changes in tests.

nicolasvasilache accepted this revision.Sep 7 2020, 7:32 AM

Looks good, thanks @limo1996 !

mlir/test/lib/Transforms/TestConvVectorization.cpp
33

doc is now stale, please revisit.

90

lowerCamelCase plz; e.g. mask = msk;

111

Plz add a few words to document preconditions.

117

Remove spurious braces.

193

Everything above deserves to be in core, more precisely in Linalg/Vectorization.cpp.
Only the test pass (i.e. l192 - EOF) should be here.

This is not how I was imagining convolution to be lowered in a first stab.
I was expecting 1. taking subviews, 2. either using reshape or extending subviews to allow it to be rank-reducing, 3. rewrite as a linalg.matmul and reuse the existing vectorization patterns.

This revision wants to introduce a vectorization pass with assumptions that look fishy to me.
I have left some comments assuming we're taking the route of direct vectorization of convolutions.

In any case we should have a VC to discuss these assumptions that I have not followed and how they differ from the existing work based on promotion + static shapes + vectorization + forwarding patterns.

Also + @asaadaldien who has started looking at similar things for depthwise convolutions on mobile.

I agree with @nicolasvasilache mapping conv into matmul is going to first step to get faster conv.
What I found vectorizing the reduction over inner dim is something can be done automatically with llvm loop auto vectorization.

limo1996 updated this revision to Diff 290415.Sep 8 2020, 12:21 AM
limo1996 marked 5 inline comments as done.

All comments except for the migration to Linalg/Vectorization.cpp resolved.

limo1996 updated this revision to Diff 290425.Sep 8 2020, 1:19 AM
limo1996 marked an inline comment as done.

Moved conversion to core (kept only pass).
Removed getZero helper function.
Simplified code with vector and std intrinsics op creation.

limo1996 updated this revision to Diff 290428.Sep 8 2020, 1:32 AM

integration tests split into different commit

limo1996 retitled this revision from [mlir] Linalg to Vector pass added. to [mlir] Conv ops vectorization pass.Sep 8 2020, 1:39 AM
limo1996 edited the summary of this revision. (Show Details)
This revision was not accepted when it landed; it landed in state Needs Review.Sep 8 2020, 1:48 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
ftynse added inline comments.Sep 8 2020, 2:36 AM
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
374

uint is not a C++ type.

rriddle added inline comments.Sep 8 2020, 2:38 AM
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
573

nit: Can you put this in the private section and make this a class? Classes exposed in headers should have a clean interface.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
374

nit: Don't mark value-typed things like this as const when used in a function.

limo1996 marked an inline comment as done.Sep 8 2020, 3:00 AM
limo1996 added inline comments.
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
573

Yes sure. Shall I create new revision for that?

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
374

It cause build failure so I fixed it without review here: https://github.com/llvm/llvm-project/commit/83d82d1fb1cfac06257ebbd7c063a3d2d1af20fb

limo1996 marked 3 inline comments as done.Sep 10 2020, 12:08 AM
limo1996 added inline comments.
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
573
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
374
limo1996 marked an inline comment as done.Sep 10 2020, 12:09 AM