Page MenuHomePhabricator

[mlir][Linalg] Create a named batch_matmul op and pipe it through.
ClosedPublic

Authored by nicolasvasilache on Apr 16 2020, 1:37 PM.

Details

Summary

This revision is the first in a set of improvements that aim at allowing
more generalized named Linalg op generation from a mathematical
specification.

This revision allows creating a new op and checks that the parser,
printer and verifier are hooked up properly.

This opened up a few design points that will be addressed in the future:

  1. A named linalg op has a static region builder instead of an explicitly parsed region. This is not currently compatible with assemblyFormat so a custom parser / printer are needed.
  2. The convention for structured ops and tensor return values needs to evolve to allow tensor-land and buffer land specifications to agree
  3. ReferenceIndexingMaps and referenceIterators will need to become static to allow building attributes at parse time.
  4. Error messages will be improved once we have 3. and we pretty print in custom form.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptApr 16 2020, 1:37 PM
nicolasvasilache edited the summary of this revision. (Show Details)Apr 16 2020, 2:59 PM

Building region from an OperationState at parse time is not straightforward because the region is not attached to an op, which is required to fill it with a builder. A temporary fake op is
created for this purpose and the resulting region is taken from it.

Is this comment relevant to this revision? What region are you trying to "build" at parse time here? Why can't the typical addRegion() followed parseRegionBody be used?

mlir/include/mlir/Dialect/Linalg/IR/LinalgTraits.h
123–124

of -> or
buffer -> memref

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
273

You don't need getBlocks() - region.front() will work.

mlir/test/Dialect/Linalg/roundtrip.mlir
628

For the custom format, better to drop the parentheses
linalg.batchmatmul %a3, %b3, %c3 :

FWIW, it'd be better with an underscore batch_matmul.

Building region from an OperationState at parse time is not straightforward because the region is not attached to an op, which is required to fill it with a builder. A temporary fake op is
created for this purpose and the resulting region is taken from it.

I now see the relevant method. But to use the builder, you only need a block or a region, and not an op(?). You could do an addRegion(), followed by pushing a new block into it, and then create the builder to insert. Am I missing something?

nicolasvasilache marked 5 inline comments as done.Apr 17 2020, 7:43 AM

I now see the relevant method. But to use the builder, you only need a block or a region, and not an op(?). You could do an addRegion(), followed by pushing a new block into it, and then create the builder to insert. Am I missing something?

This is what I started with but along the line I realized that region.getContext() requires region to be attached to a Operation*.
See: https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/IR/Region.h#L31

However this was a red herring only due to the fact that I was constructing OpBuilder from region.
I.e. this fails:

Region *region = result.addRegion();
OpBuilder opBuilder(region);

This works fine:

Region *region = result.addRegion();
OpBuilder opBuilder(context);
opBuilder.setInsertionPoint(&region.front(), region.front().begin());

Thanks for noting!

mlir/include/mlir/Dialect/Linalg/IR/LinalgTraits.h
123–124

First typo is corrected.
For the second, Linalg uses the terminology tensors and buffers consistently.
Memref is a particular implementation of buffers, the only one supported for now.

mlir/test/Dialect/Linalg/roundtrip.mlir
628

Fair enough.

nicolasvasilache edited the summary of this revision. (Show Details)Apr 17 2020, 7:45 AM

I now see the relevant method. But to use the builder, you only need a block or a region, and not an op(?). You could do an addRegion(), followed by pushing a new block into it, and then create the builder to insert. Am I missing something?

This is what I started with but along the line I realized that region.getContext() requires region to be attached to a Operation*.
See: https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/IR/Region.h#L31

However this was a red herring only due to the fact that I was constructing OpBuilder from region.
I.e. this fails:

Region *region = result.addRegion();
OpBuilder opBuilder(region);

This works fine:

Region *region = result.addRegion();
OpBuilder opBuilder(context);
opBuilder.setInsertionPoint(&region.front(), region.front().begin());

Thanks for noting!

That's great. In fact, the following is less verbose and should work:

auto opBuilder = OpBuilder::atBlockBegin(&region.front());
nicolasvasilache marked 2 inline comments as done.Apr 17 2020, 11:35 AM

That's great. In fact, the following is less verbose and should work:

auto opBuilder = OpBuilder::atBlockBegin(&region.front());

It doesn't unfortunately because of exactly the same problem: there is no context in the block or the region.
The OpBuilder must be constructed with a context and then step into the region.

Address comments.

That's great. In fact, the following is less verbose and should work:

auto opBuilder = OpBuilder::atBlockBegin(&region.front());

It doesn't unfortunately because of exactly the same problem: there is no context in the block or the region.
The OpBuilder must be constructed with a context and then step into the region.

I see - it won't have the context with the block based ctors as well. Nit: then you could just use setInsertionPointToStart(block) after the context arg ctor.

bondhugula requested changes to this revision.Apr 17 2020, 12:17 PM

Mostly superficial comments...

mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOpsInterface.td
94

of -> or

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
763–764

Doc comments please.

791–794

All of this looks problematic. Do you need this on YieldOp's verifier instead of LinalgOp's verifier? (Check for its terminator there and verify?)

1086

Nit: bodyRegion is weird. Just region is good or opRegion.

1097

setInsertionPointToStart(&bodyRegion.front())

1104

Micronit: " " -> ' '; likewise below.

mlir/lib/Dialect/Linalg/Transforms/LinalgToLoops.cpp
124

Doc comment please.

This revision now requires changes to proceed.Apr 17 2020, 12:17 PM
mravishankar added inline comments.Apr 17 2020, 12:54 PM
mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOpsSpec.tc
3

What is the reference for this specification ? ONNX/TF both seem to have a batch dimension for B as well. Without that this is effectively broadcasting B

silvas added inline comments.Apr 17 2020, 1:40 PM
mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOpsSpec.tc
3

This isn't enough to legalize e.g. tf.BatchMatMul or torch.matmul, which allow leading batch dimensions on both sides.

https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/batch-mat-mul
https://pytorch.org/docs/stable/torch.html#torch.matmul

In IREE we have a batch matmul op that handles batch on both sides:
https://github.com/google/iree/blob/f80f39c7e96c2af15741e9c774eb8b54bf38df28/iree/compiler/Dialect/VMLA/IR/VMLAOps.td#L323

I expect that in a typical lowering flow, we will legalize tf.BatchMatMul or torch.matmul by reshaping all the batch dimensions into a single dimension on both sides (possibly a dummy "1" dimension in case of no batch on one side). Then we can expand this op into generic form and fuse/cleanup those reshapes which will eliminate batch dimensions on either side.

I don't see a situation where we would create this op.

My intuition is that batch matmul with a batch dimension only on one side is not that interesting, because fundamentally it is the same as a regular matmul, because you just fold the batch dimension into the free dimension of the respective operand (e.g. in the case you have here, you can just reshape the two dimensions Batch,M in the LHS into a single dimension of extent Batch*M). Batch matmul is only interesting from a lowering perspective when you have a batch dimension on both sides, which introduces a distinct data-reuse behavior as compared to a normal matmul.

So in terms of defining a set of "primitives" or lowering to library calls (e.g. https://devblogs.nvidia.com/cublas-strided-batched-matrix-multiply/), having a batch on both sides seems to be the only relevant case. So I would recommend defining this as:

def batch_matmul(A: f32(Batch, M, K), B: f32(Batch, K, N)) -> (C: f32(Batch, M, N)) {
nicolasvasilache marked 11 inline comments as done.

Address review.
Hacking tablgen.

mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOpsSpec.tc
3

It's just something to get started, the semantics are variadic and will require extensions.

Once these are implemented it will be easy to update.
If you have a strong preference for another op, let me know what you'd prefer (an op that also exercises reduction).
It can't be dot/matvec/matmul for now because that's already taken and more work is needed to replace them.

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
791–794

This is consistent with LoopOps::YieldOp and ReturnOp, what would justify diverging from that in Linalg specifically?
If relevant this seems like it would warrant a global change.
Note that LinalgOp is an interface though, not an Op per se.
I'll have a followup NFC to rename globally.

Still, I was missing the SingleBlockImplicitTerminator<"YieldOp"> so I added it where relevant + updated some tests.

nicolasvasilache marked an inline comment as done.Apr 17 2020, 9:06 PM

The tablegen story is awful atm, suggestions most welcome.

mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOpsSpec.tc
3

I went with @silvas ' suggestion, we can iterate on the semantics later once we have variadic support.

Another tablgen hack at a distance.

Using @mehdi_amini's version for CMake dependencies.

Ok, the cmake story now seems as good as it is going to get in the near future.

Add missing comment.

Minor thing: please update 'batch_matmul' in revision title.

mravishankar added inline comments.Apr 18 2020, 10:16 PM
mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOpsSpec.tc
3

Thanks for the update. I am not sure I follow what the semantics is variadic implies, i.e. I dont see anything variadic about the op as defined here, but I might be misreading the terms.
My concern was merely if the named ops are supposed to have implicit broadcast semantics (in thoery it can, but that seems to lead to complications when it comes to things like dynamic broadcasting, etc. based on discussion on discourse). As it was defined previously, I read B as having broadcast semantics. Anyway, its OK now so thanks for taking care of it.

mravishankar added inline comments.Apr 18 2020, 11:28 PM
mlir/include/mlir/Dialect/Linalg/IR/LinalgNamedStructuredOpsSpec.tc
3

Actually strike the last comment. The spec has nothing to do with broadcasting. But, the current spec is indeed more preferable.

ftynse accepted this revision.Apr 20 2020, 7:11 AM
ftynse added inline comments.
mlir/include/mlir/Dialect/Linalg/IR/LinalgTraits.h
351

Nit: why plural "Traits" ?

356

Nit: consider adding a short doc and/or vertical whitespace around declarations

mlir/include/mlir/Dialect/Linalg/Transforms/CMakeLists.txt
6

Nit typo: depend

nicolasvasilache retitled this revision from [mlir][Linalg] Create a named batchmatmul op and pipe it through. to [mlir][Linalg] Create a named batch_matmul op and pipe it through..Apr 20 2020, 7:17 AM
bondhugula marked an inline comment as done.Apr 20 2020, 1:16 PM
bondhugula added inline comments.
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
791–794

I actually also meant to highlight the lines above as well (788-789). This change will go against unifying the loop dialect and linalg dialect yield ops (including any other) into a single std yield - but we don't have to worry about it all now. It can be easily adjusted when/if an std.yield is added later.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 21 2020, 9:42 AM
This revision was automatically updated to reflect the committed changes.