This is an archive of the discontinued LLVM Phabricator instance.

[mlir] bubble up tensor.extract_slice above linalg op
AbandonedPublic

Authored by okkwon on Mar 10 2022, 2:49 PM.

Details

Summary

Bubble up extract_slice above Linalg operation.

A sequence of operations

%0 = linalg.<op> ... arg0, arg1, ...
%1 = tensor.extract_slice %0 ...

can be replaced with

%0 = tensor.extract_slice %arg0
%1 = tensor.extract_slice %arg1
%2 = linalg.<op> ... %0, %1, ...

This results in the reduce computation of the linalg operation.

Diff Detail

Event Timeline

okkwon created this revision.Mar 10 2022, 2:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 2:49 PM
okkwon edited the summary of this revision. (Show Details)Mar 10 2022, 2:51 PM
okkwon edited the summary of this revision. (Show Details)
mravishankar added inline comments.Mar 10 2022, 3:03 PM
mlir/lib/Dialect/Linalg/Transforms/BubbleUpExtractSlice.cpp
29

Instead of doing this, I'd just convert OpFoldResult to Value by insert arith.constant <val> : index operations for cases where it is holding an integer attribute and then use applyMapToValues. This will canonicalize back anyway.

82

Actually this might be too conservative. The only condition that is needed I think is that the output indexing map must be a "permutation". The use of applyMapToValues should handle all other valid indexing maps.

97

This is not enough. The output might not be accessed as an identity, but a permutation. So you need to "invert" the indexing map used for the output, compose it with the indexing map for the inputs and them apply the map.

You can use [inversePermutation](https://github.com/llvm/llvm-project/blob/14e4d2e5643e505039588d50b31e29e3042009f5/mlir/include/mlir/IR/AffineMap.h#L430) for inverting the output map, and compose it with the indexing map for the operand as follows

AffineMap outputIndexMap = ...
AffineMap inputIndexingMap = ...
AffineMap inverseOutputIndexingMap = inversePermutation(outputIndexingMap);
AffineMap composedMap = inputIndexingMap.compose(inverseOutputIndexingMap);
.... applyMapToValues(composedMap, sliceOffsets)
.... applyMapToValues(composedMap, sliceSizes)
99

Strides are tricky... I'd rather just check that strides for the insert slice are all 1s.

mlir/test/Dialect/Linalg/bubble-up-extract-slice-op.mlir
92

Would be interesting to try some other LinalgOps as well. Like linalg.matmul linalg.batch_matmul linalg.conv_2d_nhwc_hwcf . See more named ops list here : https://github.com/llvm/llvm-project/blob/main/mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py. The python file is the source of truth for these operations. They are used to generate the ODS definition for these ops automatically (there is also an intermediate YAML serialization step thrown in for good measure).

okkwon updated this revision to Diff 414536.Mar 10 2022, 5:55 PM

Address Mahesh's comments

okkwon marked 4 inline comments as done.Mar 10 2022, 5:55 PM
okkwon updated this revision to Diff 415541.Mar 15 2022, 12:32 PM

Minor updates

okkwon updated this revision to Diff 415542.Mar 15 2022, 12:33 PM

Update the commit message

okkwon added inline comments.Mar 15 2022, 12:36 PM
mlir/test/Dialect/Linalg/bubble-up-extract-slice-op.mlir
92

Matmul and other non-trivial operations will be covered later since they show more complex index mapping than the simple permutation. I left a TODO comment for that.

okkwon updated this revision to Diff 415552.Mar 15 2022, 1:12 PM

Update the commit message

okkwon published this revision for review.Mar 15 2022, 1:14 PM
okkwon added a reviewer: mravishankar.
okkwon marked an inline comment as done.Mar 15 2022, 3:15 PM
okkwon updated this revision to Diff 415988.Mar 16 2022, 2:28 PM

check projected permutation for inputs

This seems to be needed because ApplyMapToValues() only handles AffineDimExpr
and AffineSymExpr; More complicated affine expresssions are not handled. To
simplify the eligible cases and the transformation, let's handle the inputs
only when their indexing map is a projected permutation.

okkwon updated this revision to Diff 416362.Mar 17 2022, 5:16 PM

merge and rebase

mravishankar requested changes to this revision.Mar 18 2022, 9:30 AM
mravishankar added inline comments.
mlir/include/mlir/Dialect/Linalg/Passes.h
81 ↗(On Diff #415552)

I'd drop the pass from here. These passses have been hard to maintain. I've found passes here (especially in Linalg) to be not very useful. It is best to expose the patterns using populate*Pattern method. I'd do the same here.

mlir/include/mlir/Dialect/Linalg/Passes.td
383 ↗(On Diff #415552)

Same here, drop the pass. You anyway have the test pass to run the pattern.

mlir/lib/Dialect/Linalg/Transforms/BubbleUpExtractSlice.cpp
60

Probably also need to check for the linalg op to have a single result. We can generalize it later on.

74

Nit : replace with

if (llvm::any_of(linalgOp.getIndexingMaps(), [](AffineMap map) { return !map.isPorjectedPermutation(); })) {
  return false;
}
124

This can run into type mismatch issues if the tensor.insert_slice return type and the newOp return type dont match. Its always just easier to check the types match and insert a tensor.cast if the types dont match, with the expectation that the cast folds away during canonicalizations.

This revision now requires changes to proceed.Mar 18 2022, 9:30 AM
okkwon updated this revision to Diff 416543.Mar 18 2022, 10:04 AM
okkwon marked 3 inline comments as done.

Address Mahesh's comments

okkwon marked an inline comment as done.Mar 18 2022, 10:05 AM
okkwon added inline comments.
mlir/lib/Dialect/Linalg/Transforms/BubbleUpExtractSlice.cpp
124

Would you mind giving me an example? I am using the same offsets, sizes, and slides, so the types are supposed to be the same.

okkwon abandoned this revision.Mar 21 2022, 2:44 PM

The plan is changed because

  1. I made a big code change to support matmul cases, and
  2. Mahesh pointed me to the tiling implementation, which works for any Linalg operations. I need to take a look at the code and understand how it works for all without any restrictions on the linalg operation, and
  3. Apply the same or similar approach to my implementation.

Thanks Mahesh for the inputs.