This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Lowering of vector.extract/insert_slices to LLVM IR
ClosedPublic

Authored by aartbik on Jan 15 2020, 1:55 PM.

Details

Summary

Uses progressive lowering to convert vector.extract_slices and vector_insert_slices to equivalent vector operations that can be subsequently lowered into LLVM.

Diff Detail

Event Timeline

aartbik created this revision.Jan 15 2020, 1:55 PM
aartbik edited the summary of this revision. (Show Details)Jan 15 2020, 1:56 PM
aartbik added reviewers: andydavis1, rriddle.

Unit tests: pass. 61804 tests passed, 0 failed and 781 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Please provide much more explanation in the commit message because the progressive lowering behavior may be unexpected for most people.

Nothing profound in my comments, just some things I think we should reorganize to keep our house in order.
This looks great, thanks Aart for pushing this forward!

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
899

This looks more like a rewrite / folding pattern than a conversion to me (i.e. the types don't change).
I would prob go for a folding pattern in VectorOps.cpp but we also need to make sure it is accessible and
In any case please document with a before / after IR example.

900

Well on second inspection you already have that in the VectorToVectorOps conversion.
So please don't duplicate here and just make the vectorToVector canonicalization patterns "insert" the "foldingPatterns".

Structurally this is also an indication that VectorToLLVM should start including all the VectorToVector rewrites so that things compose properly.

910

We should assert(operands[0].getDefiningOp is non-null.
In the current design, tuples do not pass function boundaries and do not lower to LLVMIR so we want to fail hard so use cases pop up and we can reevaluate the decision with concrete evidence.

Please also make sure the message is informative enough to surface the context I just explained.

At that point, the whole things should look like:

assert(...);
rewriter.replaceOp(cast<...>);
return matchSuccess();

i.e. no need to dyn_cast + check, this is guaranteed by the pattern rewrite infra.

919

Doc please, with before / after IR.

929

Yes that is hacky indeed :)

You could just return failure if the uses are not empty:

if (!op->getResult()->use_empty())
  return matchFailure();

The pattern will fail to apply until all canonicalizations occured at which point it will apply.
This will also error out gracefully by using the pattern rewrite infrastructure and fail a "full conversion" if things don't canonicalize.

952

Very cool, yay intuition :) !

Same comments as above apply still: bit more doc with before/after IR.
I may not have set a good example myself here in the past, sorry about that!
As the number of patterns grow we want short and intuitive examples to just get what these do without going deep in code.

Also, this should go through VectorToVector patterns and be included in the VectorToLLVM patterns (e.g. what if the target is not LLVM, some of our friends at Xilinx seem to be in this case for example).

976

Strides is probably become overloaded, maybe we should rename strided_slice and insert_strided_slice as Insert/ExtractSlice and these ops as Insert/ExtractSliceTuple or something along those lines?
In particular the stride attribute is not really a stride but a step.

In any case, we should have help functions for computing this.
Could you please reuse/refactor/adapt the first functions in VectorTransforms.cpp and move the to VectorOps/Utils.h as appropriate?
Thanks!

987

We should have help functions for computing this.
Could you please reuse/refactor/adapt the first functions in VectorTransforms.cpp and move the to VectorOps/Utils.h as appropriate?
Thanks!

1007

Great, hooray for progressive lowering and pattern/canonicalization compositions !

mlir/lib/Dialect/VectorOps/VectorOps.cpp
1701 ↗(On Diff #238363)

This can be called directly from other getXXXPatterns that want to use it, no need to duplicate the pattern.

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
548

This should be made into a VectorToVector test, that more directly tests "extract_slices" -> "strided_slices" + the removal of the tuple op.
Atm it tests too much at a distance and involves multiple patterns coming together.
Testing those patterns coming together could be done in a separate test pass where canonicalization, DCE and pattern interaction would kick in too and many things would get simplified.

aartbik updated this revision to Diff 238653.Jan 16 2020, 3:55 PM
aartbik marked 5 inline comments as done.

addressed comments

aartbik marked 13 inline comments as done.Jan 16 2020, 4:04 PM
aartbik added inline comments.
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
899

Agreed in isolation, but since we introduce vector.tuple in other rewritings, I really need a ConversionPattern here, so that I see the newly introduced operands.

900

I avoided the duplication for now by just having the lowering change. Once we have a proper vector2vector rewriting, we can move it into folding again and populate.

929

Two complications I added to the comments. For newly introduced tuples, neither the op nor the op.getResult use_empty call is up-to-date (it is always empty). However, since this is a lowering pass, such new nodes need to be legalized anyway, so removing them is the right way. We still could assert if our rules somehow introduce a tuple we don't consume.

952

It sounds like we really need to move some of this into a vector to vector pass. That will also avoid some of the complications above.

976

I see similar TODOs elsewhere, so let's postpone that into one CL later.

mlir/lib/Dialect/VectorOps/VectorOps.cpp
1701 ↗(On Diff #238363)

For now I do everything in lowering, leaving proper interaction with folding/canonicalization for later.

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
548

I agree. Once we have a proper vector to vector pass, we should test this (and others) in intermediate stages too.

rriddle added inline comments.Jan 16 2020, 4:06 PM
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
899

// -> ///

924

getOperand(tupleGetOp.getIndex())

931

Same here.

940

Note, we don't run DCE during dialect conversion.

1016

replaceOpWithNewOp?

Unit tests: pass. 61909 tests passed, 0 failed and 782 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

aartbik updated this revision to Diff 238656.Jan 16 2020, 4:21 PM
aartbik marked 10 inline comments as done.

addressed comments

aartbik added inline comments.Jan 16 2020, 4:46 PM
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
940

I was under the impression we may introduce that in the future. Rephrased this a bit.

nicolasvasilache requested changes to this revision.Jan 20 2020, 7:39 PM
nicolasvasilache added inline comments.
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
976

I think with the changes and proto we discussed offline, this will change significantly before landing?
I can't remember whether you'll still need to use existing utils.

Marking as "request changes" until we see the new E2E impl.

This revision now requires changes to proceed.Jan 20 2020, 7:39 PM
aartbik updated this revision to Diff 239427.Jan 21 2020, 1:51 PM

Lowering of vector.extract/insert_slices to LLVM IR

aartbik retitled this revision from [mlir] [VectorOps] Lowering of vector.extract_slices to LLVM IR to [mlir] [VectorOps] Lowering of vector.extract/insert_slices to LLVM IR.Jan 21 2020, 1:52 PM
aartbik edited the summary of this revision. (Show Details)

PTAL

This new approach directly uses the canonicalization for vector.tuple_get on a tuple (so no need to dup the rule!). Furthermore, by using the vector to vector rewriting prior to the lowering, the introduction of a vector tuple is completely transparent, since subsequent rewriting removes them again. An additional advantage is that we get DCE for free, so need to introduce a DCE for just vector.tuple. The introduction of DCE forced me to fix a few test cases with dead code (good to do that anyway).

Note, since this code is much more structured, I also introduced vector.insert_slices in addition to vector_extract_slices, so that the final mechanism becomes more apparent.

Unit tests: pass. 62017 tests passed, 0 failed and 783 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

aartbik updated this revision to Diff 239447.Jan 21 2020, 3:48 PM

fixed a few minor typos

Unit tests: pass. 62017 tests passed, 0 failed and 783 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

rriddle requested changes to this revision.Jan 21 2020, 4:33 PM
rriddle added inline comments.
mlir/lib/Dialect/VectorOps/VectorOps.cpp
1699 ↗(On Diff #239447)

This should just be in the 'fold' method instead of a canonicalization pattern.

This revision now requires changes to proceed.Jan 21 2020, 4:33 PM
rriddle added inline comments.Jan 21 2020, 4:35 PM
mlir/lib/Dialect/VectorOps/VectorOps.cpp
1699 ↗(On Diff #239447)

'fold' should be used whenever possible because it is applicable in many more places, e.g. during dialect conversion and OpBuilder::createOrFold.

rriddle added inline comments.Jan 21 2020, 4:36 PM
mlir/lib/Dialect/VectorOps/VectorOps.cpp
1699 ↗(On Diff #239447)

Also, please split this out into a different revision as this is unrelated.

aartbik marked an inline comment as done.Jan 21 2020, 5:14 PM

sounds good, one question though

mlir/lib/Dialect/VectorOps/VectorOps.cpp
1699 ↗(On Diff #239447)

I followed the other "canonicalization" patterns in VectorOps. Just for my own understanding, should these strictly speaking have been written as folders too (the name "Folder" seems to imply that)?

aartbik marked an inline comment as done.Jan 21 2020, 9:14 PM

sounds good, one question though

mlir/lib/Dialect/VectorOps/VectorOps.cpp
1699 ↗(On Diff #239447)

Other question, isn't fold for constants only? Here the tuple itself does not need to be constant, the only thing that matters is that get-on-tuple becomes a pass through of the value at the corresponding index?

andydavis1 added inline comments.Jan 22 2020, 11:36 AM
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1072

We need a TODO to move this code into a VectorOpsUtils class, we have various versions of it floating around in VectorOps.cp and VectorTransforms.cpp

rriddle added inline comments.Jan 22 2020, 11:41 AM
mlir/lib/Dialect/VectorOps/VectorOps.cpp
1699 ↗(On Diff #239447)

fold has a slightly different contract than canonicalization patterns, and can be used (generally) for the following:

  • In-place canonicalization
  • Folding to an existing SSA value(does not have to be constant)
  • Folding to an attribute value

So with that being said, if the canonicalization relies on creating new operations then it must use a pattern.

aartbik marked 2 inline comments as done.Jan 22 2020, 11:50 AM

Hold off on reviewing this CL, since it will be broken up in three.
The final remaining part for the lowering will be super small :=)

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
1072

The TODO is already there, see line 1038

mehdi_amini added inline comments.Jan 22 2020, 11:57 AM
mlir/lib/Dialect/VectorOps/VectorOps.cpp
1699 ↗(On Diff #239447)

River: do we have a doc on this? Seems like potentially a good entry for the FAQ otherwise?

aartbik marked an inline comment as done.Jan 22 2020, 1:57 PM
aartbik added inline comments.
mlir/lib/Dialect/VectorOps/VectorOps.cpp
1699 ↗(On Diff #239447)

In particular, it was not immediately apparent to me that we *always* call fold(), passing in constants as arguments when available, but also allowing access to the non-constant operands through the regular getters (giving it potentially "two sources of truth"). Now that I have working, it makes sense, but a bit more API doc would have been helpful.

aartbik updated this revision to Diff 240278.Jan 24 2020, 12:58 PM

integrate new lowering mechanism in LLVM lowering

PTAL (note, this will not build yet, but it shows how small this CL has become by defining the progressive lowering in a separate patterns)

Unit tests: unknown.

clang-tidy: fail. clang-tidy found 2 errors and 0 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

aartbik updated this revision to Diff 240325.Jan 24 2020, 4:29 PM

synced with latest update

\o/ yeah, it has been a long road, but PTAL

this CL is much, much smaller now; extra test added, and rewritten some existing tests to ensure code is not dead on entry

Unit tests: pass. 62194 tests passed, 0 failed and 815 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

rriddle accepted this revision.Jan 27 2020, 10:35 AM
This revision is now accepted and ready to land.Jan 27 2020, 10:35 AM
This revision was automatically updated to reflect the committed changes.