This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Extend generic ops to allow tensors
AbandonedPublic

Authored by nicolasvasilache on Dec 30 2019, 4:13 PM.

Details

Summary

This diff adds support to allow linalg.generic and
linalg.indexed_generic to take tensor input and output
arguments.

The subset of output tensor operand types must appear
verbatim in the result types after an arrow. The parser,
printer and verifier are extended to accomodate this
behavior.

The Linalg operations now support variadic ranked tensor
return values. This extension exhibited issues with the
current handling of NativeCall in RewriterGen.cpp. As a
consequence, RewriteerGen is tentatively special-cased to
support the new behavior (better suggestions are welcome).

Relevant cleanups and name uniformization are applied.

Relevant invalid and roundtrip test are added.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptDec 30 2019, 4:13 PM

Unit tests: pass. 61127 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

rriddle requested changes to this revision.Dec 30 2019, 4:41 PM
rriddle added inline comments.
mlir/include/mlir/Dialect/Linalg/IR/LinalgTraits.h
118

getInputs().getTypes()

Also, use Type|Value instead of auto. They are practically the same amount of characters.

119

nit: Merge assignment and condition
if (auto t = ...)

128

Same comments as above.

mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h
82

Why SmallVector<..., 0>?

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
67

Can you just use the same mechanism as 2 lines above? i.e. just stream it.

102

You need to check this for failure

mlir/tools/mlir-tblgen/RewriterGen.cpp
586

It isn't immediately clear to me why this is necessary. What return value is convertible to small vector, but not ArrayRef for example?

This revision now requires changes to proceed.Dec 30 2019, 4:41 PM
ftynse added inline comments.Dec 31 2019, 3:39 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
92–93

Nit: Query -> Return, otherwise it souds like you query none

391

Reuse LinalgOperand.predicate here

541

Do you allow mixing tensor and memref arguments? If so, consider mentioning it explicitly.

603

Is this a placeholder? Consider using an op instead "some.condition(...)", a comment looks like it's related to the following code, but it is not.

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
291

Nit: use emitOpError if you want to refer the op name and drop the "op " part of the message

mlir/lib/Dialect/Linalg/Transforms/LinalgTransforms.cpp
187

should this rather be an assert(succeeded(...) && ...) ?

266

return isa_and_nonnull<...

mlir/lib/Dialect/VectorOps/VectorTransforms.cpp
485–486

Nit: would llvm::to_vector<1> work?

Locally #include "Intrinsics.h" to try and avoid

clang-tidy: error: 'mlir/Dialect/Linalg/EDSC/Intrinsics.h' file not found [clang-diagnostic-error]

Unit tests: pass. 61127 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

nicolasvasilache marked 11 inline comments as done.

Addressing review comments

mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h
82

This needs a type compatible with SmallVector<Value, x> to be used as a NativeCall with for an op with Variadic returns.
Do you have a better suggestion?

mlir/lib/Dialect/VectorOps/VectorTransforms.cpp
485–486

Nope, this is just constructing a 1-element vector from a single value.

mlir/tools/mlir-tblgen/RewriterGen.cpp
586

Reworked and added a comment.

Unit tests: pass. 61127 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

nicolasvasilache marked 8 inline comments as done.Dec 31 2019, 6:19 AM
nicolasvasilache added inline comments.
mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h
82

tentatively marking "Done", will update if a better suggestion is given.

Addressing more review comments.

Undo local include of Intrinsics.h, this is a clang-tidy issue.

Unit tests: pass. 61127 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Unit tests: pass. 61127 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Update commit message.

Unit tests: pass. 61159 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

rriddle added inline comments.Jan 1 2020, 5:25 PM
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
186–187

All of this cleanup seems unrelated to this revision. It would be easier to review if you moved such changes out.

mlir/tools/mlir-tblgen/RewriterGen.cpp
586

Could you use ValueRange instead?

Rebase on top of previously separated cleanups.

nicolasvasilache marked an inline comment as done.Jan 2 2020, 7:30 AM
nicolasvasilache marked 2 inline comments as done.Jan 2 2020, 7:35 AM
nicolasvasilache added inline comments.
mlir/tools/mlir-tblgen/RewriterGen.cpp
586

Not immediately:

/usr/local/google/home/ntv/work/github/llvm-project/mlir/include/mlir/IR/OperationSupport.h:623:3: note: candidate constructor
  ValueRange(iterator_range<OperandRange::iterator> values)
  ^
/usr/local/google/home/ntv/work/github/llvm-project/mlir/include/mlir/IR/OperationSupport.h:625:3: note: candidate constructor
  ValueRange(iterator_range<ResultRange::iterator> values)
  ^
/usr/local/google/home/ntv/work/github/llvm-project/mlir/include/mlir/IR/OperationSupport.h:627:3: note: candidate constructor
  ValueRange(ArrayRef<Value> values = llvm::None);
  ^
/usr/local/google/home/ntv/work/github/llvm-project/mlir/include/mlir/IR/OperationSupport.h:628:3: note: candidate constructor
  ValueRange(OperandRange values);
  ^
/usr/local/google/home/ntv/work/github/llvm-project/mlir/include/mlir/IR/OperationSupport.h:629:3: note: candidate constructor
  ValueRange(ResultRange values);
  ^
/usr/local/google/home/ntv/work/github/llvm-project/mlir/include/mlir/Support/STLExtras.h:223:3: note: candidate inherited constructor
  indexed_accessor_range_base(const iterator_range<iterator> &range)
  ^
/usr/local/google/home/ntv/work/github/llvm-project/mlir/include/mlir/IR/OperationSupport.h:613:21: note: constructor from base class 'indexed_accessor_range_base<mlir::ValueRange, llvm::PointerUnion<const mlir::Value *, mlir::OpOperand *, mlir::OpResult *>, mlir::Value, mlir::Value, mlir::Value>' inherited here
  using RangeBaseT::RangeBaseT;
nicolasvasilache marked an inline comment as done.

Fix warning on unused uncaptured attribute in RewriterGen.

nicolasvasilache marked an inline comment as done.Jan 2 2020, 7:44 AM
nicolasvasilache added inline comments.
mlir/tools/mlir-tblgen/RewriterGen.cpp
586

Forgot the first line, here:

tools/mlir/test/lib/Transforms/../DeclarativeTransforms/TestLinalgTransformPatterns.h.inc:579:19: error: call to constructor of 'mlir::ValueRange' is ambiguous
    for (auto v : ValueRange{ {permuteGenericLinalgOp(rewriter, op, {1, 2, 0}, "PERMUTE")} }) { tblgen_repl_values.push_back(v); }
                  ^         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Unit tests: pass. 61169 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Unit tests: pass. 61169 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Unit tests: pass. 61169 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Without commenting on the specifics of the change itself, was wondering what advantage there is to allow generic ops to take tensor values as well. Until now lowering from Linalg to (say) GPU Dialect was fairly straight-forward cause Linalg operated on memref types. I am not sure what would happen if the generic op with tensors would be lowered to GPU dialect (presumably should be an error), and lowering it correctly would need some buffer allocation to kick in. So it seems like this is adding complexity. Could you please provide some context behind the need for this change?

@mravishankar actually this will simplify things quite a lot.
Copy-pasting from the mlir@tensorflow.org discussion: https://groups.google.com/a/tensorflow.org/g/mlir/c/pbWk9a-t3Xc

This has a number of implications in some of the transformations that are traditionally done at the level of the HLO dialect.
In particular, everything related to trivial fusion of pointwise operators can be done immediately using the region.
This avoids the need for the current, more cumbersome and phase-ordered, flow that does:
1. mark fusion with XLA fusion nodes,
2. allocate buffers for everything
3. convert to Linalg
4. apply fusion in Linalg
5. perform an analysis and remove temporary buffers that have been fused.

Note that step 4. may not necessarily do what one wants at step 1. since we are talking about different systems that are not really designed to talk to each other.

Instead, this can be replaced by:
1. apply fusion of ops using regions

Temporary buffers never get materialized or anything.
This becomes especially handy when implicit of explicit broadcast semantics are involved: some things are trivial to fuse at the level of Linalg on tensors and all the unnecessary intermediate memory is never allocated.

There are many other implications on the type of transforms that become available at this level (hint: look at the TASO compiler) but I only listed the most obvious one.

In my mind the codegen path where things are the most natural is:

User
-> Language / Framework 
-> HLO + Linalg on tensors 
-> LHLO + Linalg on buffers 
(note that buffer allocation in Linalg on tensors -> Linalg on buffers can be very progressive intermixing ops with both tensor and buffers arbitrarily)
-> Affine/StructuredControlFlow (still named Loops atm ..)
-> backends

Different transformations apply at each level.

This does of course not remove the need for the buffer allocation pass but it is just a trivial extension of what already exists.
The particular case you mention: linalg + tensors -> GPU is illegal, the IR must be legalized with buffer allocation first.

rriddle accepted this revision.Jan 2 2020, 10:14 AM
rriddle added inline comments.
mlir/include/mlir/Dialect/Linalg/IR/LinalgTraits.h
118

Can we use Type instead of auto here? Same below.

mlir/tools/mlir-tblgen/RewriterGen.cpp
586

This is extremely unfortunate and should really get some attention if we have to do this. Can you file a bug so that this is cleaned up? We shouldn't be materializing vectors like this if we don't need to.

nicolasvasilache marked 2 inline comments as done.Jan 2 2020, 10:56 AM

Addressed, thanks!

@nicolasvasilache : Totally agree with all the points you make. I understand the advantage of performing fusion at the tensor level. I was under the impression that there would be a separate dialect for "Linalg operations on tensors", and the buffer allocation pass would "convert" from that dialect into the existing Linalg dialect. But, I guess inter-mixing of operations from different dialects to having different "incarnations" of the operation (one operating on tensors and other on buffers) is equivalent. I see that this makes sense for generic ops, but how would this work for other operations like linalg.matmul and linalg.matvec, etc. where I am not sure you can have a tensor argument. So it seems like this is making generic ops different than other linalg ops.
Another question is about buffer allocation. Is there some buffer allocation already implemented in Linalg?
In any case, since the contract between Linalg and other "backend" dialects is that the lowering is only supported if Linalg ops work on memrefs, I dont really have any major concerns and happy to see how things work out.

yangjunpro added inline comments.Jan 2 2020, 1:51 PM
mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
550

"that create"==>"which create", small typos

mlir/include/mlir/Dialect/Linalg/IR/LinalgTraits.h
113

"Value v", might change to pass by reference for better performance? Although it might be a little bit marginal here.

127

There is a little bit duplicated code between getInputTensorTypes() and getOutputTensorTypes(), do we need to extract a small common function for reusability?

mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h
83

Can't we directly write the return type as SmallVector<Value> ? "0" looks peculiar here.

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
302

If we already have the index, why do we bother to use the zip()?
I think by directly use op.getResultTypes()[index], outputTensorTypes[index] is enough.

nicolasvasilache marked 8 inline comments as done.Jan 2 2020, 2:34 PM
mlir/include/mlir/Dialect/Linalg/IR/LinalgTraits.h
113

Value is a POD, taking a reference would actually add a level of indirection and could be detrimental to perf.

mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h
83

SmallVector really wants 2 template args.

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
302

getResultTypes is an iterator range and does not work with random access.

nicolasvasilache marked 3 inline comments as done.Jan 2 2020, 2:40 PM

@mravishankar

I was under the impression that there would be a separate dialect for "Linalg operations on tensors", and the buffer allocation pass would "convert" from that dialect into the existing Linalg dialect. But, I guess inter-mixing of operations from different dialects to having different "incarnations" of the operation (one operating on tensors and other on buffers) is equivalent.

Yes this was the original intention but I realized that this didn't warrant yet another dialect and lowering can be made much more progressive by mixing types.

I see that this makes sense for generic ops, but how would this work for other operations like linalg.matmul and linalg.matvec, etc. where I am not sure you can have a tensor argument. So it seems like this is making generic ops different than other linalg ops.

Actually all these named ops should be auto-generated as special configurations of generic ops. See my comments on this thread: https://groups.google.com/a/tensorflow.org/g/mlir/c/CUL3_8fnJb4
The fact that there are manually declared matmul, conv etc ops today is more historical, they should be retired in the future.

Another question is about buffer allocation. Is there some buffer allocation already implemented in Linalg?

Not yet but it is very easy to do something simple that can evolve over time.

Thanks for your comments and questions!

nicolasvasilache abandoned this revision.Jan 3 2020, 3:41 PM

For some reason this does not close automatically despite the landing and the commit message listing this diff.
Abandonning manually.