Page MenuHomePhabricator

[mlir][Linalg] Create a tool to generate named Linalg ops from a Tensor Comprehensions-like specification.

Authored by nicolasvasilache on Mar 30 2020, 8:56 AM.


This revision adds a tool that generates the ODS and C++ implementation for "named" Linalg ops according to the [RFC discussion](

While the mechanisms and language aspects are by no means set in stone, this revision allows connecting the pieces end-to-end from a mathematical-like specification.

Some implementation details and short-term decisions taken for the purpose of bootstrapping and that are not set in stone include:
1. using a "[Tensor Comprehension](" syntax
2. implicit and eager discovery of dims and symbols when parsing
3. using EDSC ops to specify the computation (e.g. std_addf, std_mul_f, ...)

A followup revision will connect this tool to tablegen mechanisms and allow the emission of named Linalg ops that automatically lower to various loop forms and run end to end.

For the following "Tensor Comprehension-inspired" string:

def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
  C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));


With -gen-ods-decl=1, this emits (modulo formatting):
  def batch_matmulOp : LinalgNamedStructured_Op<"batch_matmul", [
    NamedStructuredOpTraits]> {
      let arguments = (ins Variadic<LinalgOperand>:$views);
      let results = (outs Variadic<AnyRankedTensor>:$output_tensors);
      let extraClassDeclaration = [{
        llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
        llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
        void regionBuilder(ArrayRef<BlockArgument> args);
      let hasFolder = 1;

With -gen-ods-impl, this emits (modulo formatting):
  llvm::Optional<SmallVector<StringRef, 8>> batch_matmul::referenceIterators() {
      return SmallVector<StringRef, 8>{ getParallelIteratorTypeName(),
                                        getReductionIteratorTypeName() };
  llvm::Optional<SmallVector<AffineMap, 8>> batch_matmul::referenceIndexingMaps()
    MLIRContext *context = getContext();
    AffineExpr d0, d1, d2, d3;
    bindDims(context, d0, d1, d2, d3);
    return SmallVector<AffineMap, 8>{
        AffineMap::get(4, 0, {d0, d1, d3}),
        AffineMap::get(4, 0, {d3, d2}),
        AffineMap::get(4, 0, {d0, d1, d2}) };
  void batch_matmul::regionBuilder(ArrayRef<BlockArgument> args) {
    using namespace edsc;
    using namespace intrinsics;
    ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);

    ValueHandle _4 = std_mulf(_0, _1);
    ValueHandle _5 = std_addf(_2, _4);
    (linalg_yield(ValueRange{ _5 }));

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ftynse added inline comments.Apr 6 2020, 4:25 AM

"proper token" is unclear as error message


Do you actually need pointers? Can't we just store Expressions as is, eventually with appropriate move semantics to avoid extra copies?


Why does a parsing function accept an _output_ stream?


Plz document what does it print


Do you actually need this? I only see DenseMap<TensorExpr *>, which should be using a generic pointer-based map implementation.


How about DenseMapInfo<StringRef>::getTombstoneKey() instead?


Same as above, AffineMap also has a DenseMapInfo if I'm not mistaken.


Nit: document how the visitation behaves if the callback mutates the visited object


Nit: would emplace_back work?


Have you considered storing tensors in an llvm::StringMap indexed by name instead of doing linear lookups every time?


Naming nit: isa is widely used for downcasting, this is just a lookup; prefer is.


Would emplace_back work?


Could you just have a default message expected %tokenname% instead of having a similar string everywhere


It looks like it would parse just about any id. "expected a type id" sounds a bit misleading because "type id" is not a production rule, and there's no additional check on the id somehow being a type.


Nit: add a description in the assertion. Also, are we sure this can never happen?


Ultra-nit: we tend to use single quotes rather than backticks in error messages


Nit: /*allowEmptyList=*/true




This may crash if you have less LHS declarations than RHS definitions.




dimCount and symbolCount make the comment look outdated, is it?


Did you check that indexings were different?


Nit: I'd use early return here


[Not for this commit]: I would rather have the parser accept the correct syntax, and have a separate check that implements "semantic" rules.






typo: "symbolicc"


Why is the result optional?


Nit: could we use more meaningful names than ss2?


C++14 supports auto for lambda arguments


Alternatively, you could use ss.str() instead of valueHandleStr below.
Also, consider better names than ss, ss2, ss3. One ss is acceptable in a short function, but here it's really tricky to keep in mind which stream is associated with which string.

ftynse requested changes to this revision.Apr 6 2020, 4:25 AM
This revision now requires changes to proceed.Apr 6 2020, 4:25 AM
ftynse added inline comments.Apr 6 2020, 5:28 AM

I'm not sure I understand what is the concern here? The ODS check verifies the content of the produced .td file, _not_ the result of feeding that .td file to mlir-tblgen -gen-op-defs, which is indeed a separate concern. The IMPL check verifies the implementations of methods that are declared in the .td file and there is simply no other place where we can verify them.

The staging here is:
1a. mlir-linalg-ods-gen -gen-ods-decl %this_file% >
1b. mlir-linalg-ods-gen -gen-impl %this_file% >
2a. mlir-tblgen -gen-op-decl > ods.h
2b. mlir-tblgen -gen-op-decl >

  1. include and into the implementation file; and ods.h into the header file.

@nicolasvasilache the test you referenced also has RUN lines making sure mlir-tblgen can consume what the first stage produces. Consider adding them here as well. This could help detect cases of ODS syntax change (the simple syntactic test passes, but not the piping check). That's why there is only a trivial check to make sure FileCheck eats something.

nicolasvasilache marked 65 inline comments as done.Apr 6 2020, 8:07 PM
nicolasvasilache added inline comments.
I'm not sure I understand what is the concern here? 
Consider adding them here as well.

That's precisely what the concern was IIUC, piping through mlir-tblgen (see previous snapshot that I updated improperly

Restored that part of the test.


Rather than continue duplicating here, MLIR should expose the lexer and parser and everyone's life will be better.


Yes, I am trimming liberally until MLIR exposes its lexer and parser at which point all this can disappear.


why ? shows it's perfectly fine to use abstract base classes and LLVM RTTI.


you should otherwise your computation is non-deterministic


It's this or uniqu'ing, underlying storage, placement new etc etc.
I went for the simple solution.
When we have strong data that we need to scale much more we can revisit.


reductions loops don't commute in FP land


I need 2 extra maps and really don't anticipate a single named op to ever to a point where this would matter.
Of course if proven otherwise I'm happy to reconsider.


I'm reluctant to invest more in duplicating something that should be exposed by core in a later NFC revision.


Agreed, there are a few other things for follwups too, thanks!


it's a faster symbolcc


this is what the ODS currently is because of manual "named ops", will be cleaned later.

nicolasvasilache marked 12 inline comments as done.

Address review comments.

silvas added a comment.Apr 6 2020, 8:09 PM

meta-point: @ftynse let's not review the core parser code at the top of the file, as Nicolas says that they are just copypasta from the .mlir parser and won't be in the final patch.

Otherwise, thanks @ftynse for helping with the review :)


Ah, okay. Sorry for the confusion! When I saw C++ code I was assuming it was emitted by mlir-tblgen gen-op-def. But I see now that there is a mlir-linalg-tblgen -gen-impl that emits C++ as well. Sorry for the noise!!!

Thanks for your details reviews @silvas @ftynse !
Anything else ?

silvas added inline comments.Apr 6 2020, 8:27 PM

Actually, when rereading I see that we do indeed invoke mlir-tblgen -gen-op-decls. I specifically object to the TBLGEN check prefixes here. I consider it a bug to do that (although i see the precendent in, but I would have raised the same objection there), since it violates the layering: somebody updating mlir-tblgen shouldn't be able to break this test.

Consider the implications of what is being checked now in this test...

// TBLGEN-LABEL: linalg::batchmatmulOp declarations
^ could be broken by a change in a *comment* in the generated file :x
//       TBLGEN: class batchmatmulOpOperandAdaptor {
^ could be broken by adding a common base class to the operand adaptor classes, or a change in naming convention for the adaptor classes
//       TBLGEN: class batchmatmulOp : public Op<
^ could be changed by a change in base classes or naming convention.

Note that none of those changes I've indicated would actually break any actual use of this code. So this test is just artificially constraining the implementation of mlir-tblgen for no real value. And even if you strip it down, all you would really be testing is def batchmatmulOp results in a class batchmatmulOp in the output, which is already tested in many places, such as, say,

We need to be courteous to the maintainers of other components and give them the flexibility to adjust the implementations of their components.

silvas added a comment.Apr 6 2020, 8:29 PM

@nicolasvasilache any progress on reusing the MLIR parser? I consider that refactoring as blocking for submitting this patch. I don't want us to have a custom parser copypasted here that somebody has to clean up later without a strong reason.

ftynse added inline comments.Apr 7 2020, 2:15 AM

I agree that this specific test is over-constraining for mlir-tblgen implementation. What I intended to test in intrinsicgen, and what I would like to see replicated here, is that the tablegen input produced by intrinsicgen, or my mlir-linalg-ods-gen, can be consumed by mlir-tblgen at all. Basically, we don't need to check for any output if we can find a way to check that mlir-tblgen exited with code 0 on the produced file. FileChecking the class name is just a workaround.

If we don't do this check, we risk ending up in a situation where all of the existing tests pass (mlir-tblgen still generates expected C++ from ODS, and mlir-linalg-ods-gen still generates the strings expected by its test, just those strings are no longer valid ODS), but the pipeline fails. And given mlir-tblgen's tendency to assert or crash on improperly structured yet valid TableGen, it would be annoying to debug.

ftynse added a comment.Apr 7 2020, 2:30 AM

meta-point: @ftynse let's not review the core parser code at the top of the file, as Nicolas says that they are just copypasta from the .mlir parser and won't be in the final patch.

@silvas I wouldn't review it if it was actual copy-pasta. It is an incomplete and modified copy, which is therefore likely to have some weird behavior or be able to get into an irreversible state where the original code wouldn't get.


Because you pay the runtime overhead price for two abstractions serving essentially the same goal. Why?


I suppose you mean the IR you produce does not have a deterministic order of dimensions, which makes it hard to check. TC semantics says that all dimensions are interchangable, so their textual order should not matter. If it does, we should discuss the semantics and avoid branding this input as TC-like.


I don't think vectors of unique_ptr are simpler than vectors of values. This is an extra abstraction with associated cognitive overhead. This is also one extra dynamic allocation per element, as opposed to occasional allocations in the vector, and no strong reason to maintain the pointer as unique or to auto-deallocate (other than you forced the allocation in the first place). There is no actual uniquing of expression, neither is there underlying storage or placement new, you seem to be mistaking this with how types/attributes are handled in MLIR.


Well, you currently have two extra vectors. I just don't see why prefer using a vector of pairs and implementing a search for _every one of them_ is better than using a dedicated container with accessor immediately available.

@silvasean I consider that refactoring as blocking for submitting this patch. I don't want us to have a custom parser copypasted here that somebody has to clean up later without a strong reason.
I have been following precedent here, see which also introduces its own tokenizer / lexer / parser.

As far as I understand it, MLIR has been pretty opinionated about not wanting to expose its tokenizer / lexer / parser: I tried to have them exposed in the past but objections have been along the line of "it's very easy code to write anyway".
I would strongly prefer we revisit that but IMO it would be unfortunate that work is blocked on this refactoring.

Does this help mitigate your position?

Does this help mitigate your position?

Yes. I take back my request to break it out. I buy Chris' statement "I don’t think that splitting this out and pretending it is reusable is a good idea - too much of it is specific to decisions in the MLIR syntax".


Ah, ok. Then you can just remove the | FileCheck. The RUN line checks that the program has exit code 0, which won't be the case if mlir-tblgen runs into a syntax or processing error.

ftynse added inline comments.Apr 7 2020, 11:59 AM

Perfect, let's do this!

silvas accepted this revision.Apr 8 2020, 2:46 PM

LGTM. Let's do this!

nicolasvasilache marked 13 inline comments as done.Apr 9 2020, 12:15 PM
nicolasvasilache added inline comments.

Updated the test to get the minimal checkable thing: the class name.


Marking as done, this is part of the more global reply on downcasting, uptr etc.


Added a documentation section in


unique_ptr + abstract base class: basically because I use downcasting.

vector<Expression> will slice unless derived classes have sizeof == 0 (i.e. there is an underlying pointer payload).

An option is to implement a similar arena + pImpl to what MLIR does for the "by-value" abstractions.
I consider this to be unnecessarily complex for my use case (parser that runs at compiler compile time):

vector<unique_ptr<...>> is a standard and simple way to solve the slicing and ownership issue, its performance drawback are not relevant at this time IMO.


fair enough, done, thanks!

nicolasvasilache marked 4 inline comments as done.

Address review comments.

ftynse accepted this revision.Apr 10 2020, 7:30 AM

Please fix the Windows build problem before landing. It looks like the pre-merge testing has such build now so you can use it for the initial check.


Nit: angle bracket notation


This won't work on Windows. Consider adding a -test-emit-additional-includes flag to mlir-linalg-ods-gen and use it here instead of trying shell magic.


llvm_unreachable ?

nicolasvasilache marked 2 inline comments as done.

Address last review comments.

nicolasvasilache marked an inline comment as done.


This revision was not accepted when it landed; it landed in state Needs Review.Apr 10 2020, 11:05 AM
This revision was automatically updated to reflect the committed changes.