This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/
-
Tensor/
-
IR/
-
TensorOps.td
-
Transforms/
1
TransformUtils.h
-
Utils/
7/7
ReshapeOpsUtils.h
-
Interfaces/
-
ViewLikeInterface.h
-
lib/
-
Dialect/
-
Tensor/
-
IR/
-
TensorOps.cpp
-
Transforms/
-
CMakeLists.txt
23/24
ExtractSliceFromReshape.cpp
-
Utils/
-
CMakeLists.txt
-
ReshapeOpsUtils.cpp
-
Interfaces/
-
ViewLikeInterface.cpp
-
test/
-
Dialect/Tensor/
-
Tensor/
-
extract-slice-from-collapse-shape.mlir
-
lib/Dialect/Tensor/
-
Dialect/
-
Tensor/
-
CMakeLists.txt
2/2
TestTensorTransforms.cpp

Differential D129699

[mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape`
ClosedPublic

Authored by christopherbate on Jul 13 2022, 3:03 PM.

Download Raw Diff

Tokens

"Like" token, awarded by chongxing.

Details

Reviewers

ThomasRaoux
nicolasvasilache
mravishankar

Commits

rGf4a478cd0178: [mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape`
rG571195787573: [mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape`

Summary

This change adds a set of utilities to replace the result of a
tensor.collapse_shape -> tensor.extract_slice chain with the
equivalent result formed by aggregating slices of the
tensor.collapse_shape source. In general, it is not possible to
commute extract_slice and collapse_shape if linearized dimensions
are sliced. The i-th dimension of the tensor.collapse_shape
result is a "linearized sliced dimension" if:

Reassociation indices of tensor.collapse_shape in the i'th position is greater than size 1 (multiple dimensions of the input are collapsed)
The i-th dimension is sliced by tensor.extract_slice.

We can work around this by stitching together the result of
tensor.extract_slice by iterating over any linearized sliced dimensions.
This is equivalent to "tiling" the linearized-and-sliced dimensions of
the tensor.collapse_shape operation in order to manifest the result
tile (the result of the tensor.extract_slice). The user of the
utilities must provide the mechanism to create the tiling (e.g. a loop).
In the tests, it is demonstrated how to apply the utilities using either
scf.for or scf.foreach_thread.

The below example illustrates the pattern using scf.for:

%0 = linalg.generic ... -> tensor<3x7x11x10xf32>
%1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32>
%2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32>

We can construct %2 by generating the following IR:

%dest = linalg.init_tensor() : tensor<10x10xf32>
%2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> {
   // Step 1: Map this output idx (%iv) to a multi-index for the input (%3):
   %linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv)
   %3:3 = arith.delinearize_index %iv into (3, 7, 11)
   // Step 2: Extract the slice from the input
   %4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] :
         tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32>
   %5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] :
         tensor<1x1x1x10xf32> into tensor<1x10xf32>
   // Step 3: Insert the slice into the destination
   %6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] :
         tensor<1x10xf32> into tensor<10x10xf32>
   scf.yield %6 : tensor<10x10xf32>
}

The pattern was discussed in the RFC here: https://discourse.llvm.org/t/rfc-tensor-extracting-slices-from-tensor-collapse-shape/64034

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

christopherbate created this revision.Jul 13 2022, 3:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 3:03 PM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 18 others. · View Herald Transcript

christopherbate requested review of this revision.Jul 13 2022, 3:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 3:03 PM

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

christopherbate added reviewers: ThomasRaoux, nicolasvasilache, mravishankar.Jul 13 2022, 3:03 PM

christopherbate edited the summary of this revision. (Show Details)Jul 13 2022, 3:07 PM

High level comment, reshape-like operations should not really be part of tiling interface... They are really metadata operations and not compute operations. So I think a bit more context on the use case of this would be useful. Adding a tiling interface implementation for what should be just metadata operations leads to some unintended consequences (from experience having done exactly this for tensor.extract_slice and tensor.insert_slice during incubation in IREE).

This revision now requires changes to proceed.Jul 13 2022, 4:51 PM

Harbormaster completed remote builds in B175242: Diff 444430.Jul 13 2022, 5:19 PM

High level comment, reshape-like operations should not really be part of tiling interface... They are really metadata operations and not compute operations. So I think a bit more context on the use case of this would be useful.

Probably the biggest reason to allow this sort of transformation as part of the normal tile-and-fuse flow is that it allows you to reduce multiple dimensions as a single dimension without blocking pulling producers into the loop nest.

I wrote this diff in two ways -- one being just adding it to tiling interface, the other adding a standalone rewrite to do the generateResultTileValue and adding a test pass, without hooking into TilingInterface. The first way ends up being less code and doesn't require extra test pass to demonstrate the end-to-end result.

I'll push up the other method if you think that's better.

Probably the biggest reason to allow this sort of transformation as part of the normal tile-and-fuse flow is that it allows you to reduce multiple dimensions as a single dimension without blocking pulling producers into the loop nest.

Totally agreed on this point. There was co-incidentally a similar discussion about this on IREE discord. Essentially you want to allow tile and fuse to work through reshapes. Currently the way tile + fuse is implemented is you use the tensor.extract_slice to find the tile of the producer that you need to compute in place. But that requires the producer to be an operation that implements the TilingInterface. If you tensor.extract_slice comes from a result of a reshape, then the fusion stops. This is probably what you are referring to as well. Instead, you can fold the reshape and tensor.extract_slice to get a tensor.extract_slice of the source of the reshape. Then the fusion proceed as normal. I think this folding pattern is the missing component. If we have that we can then figure out how to add it to the mix and get tile and fuse to work through reshapes....

Use a dedicated rewrite pattern to replace the tensor.extract_slice rather than using an implementation of TilingInterface for tensor.collapse_shape.

Herald added a subscriber: mgorny. · View Herald TranscriptJul 14 2022, 11:14 AM

Totally agreed on this point. There was co-incidentally a similar discussion about this on IREE discord. Essentially you want to allow tile and fuse to work through reshapes. Currently the way tile + fuse is implemented is you use the tensor.extract_slice to find the tile of the producer that you need to compute in place. But that requires the producer to be an operation that implements the TilingInterface. If you tensor.extract_slice comes from a result of a reshape, then the fusion stops. This is probably what you are referring to as well. Instead, you can fold the reshape and tensor.extract_slice to get a tensor.extract_slice of the source of the reshape. Then the fusion proceed as normal. I think this folding pattern is the missing component.

I briefly checked out your discussion, and it appears that we are after the same thing.

If we have that we can then figure out how to add it to the mix and get tile and fuse to work through reshapes....

I updated the diff so that the pattern is implemented as a dedicated rewrite with a public method to directly produce the equivalent result of a tensor.collapse_shape -> tensor.extract_slice chain.

Maybe we need a SliceableInterface in addition to TilingInterface. Then it would be easier to hook into the tile and fuse flow if the idea is that only computation operations implement TilingInterface.

We would also need a control mechanism for tile and fuse to allow the caller to stop the greedy fusion early if desired. Were you already planning on adding a mechanism for that?

I updated the diff so that the pattern is implemented as a dedicated rewrite with a public method to directly produce the equivalent result of a tensor.collapse_shape -> tensor.extract_slice chain.Maybe we need a SliceableInterface in addition to TilingInterface. Then it would be easier to hook into the tile and fuse flow if the idea is that only computation operations implement TilingInterface. We would also need a control mechanism for tile and fuse to allow the caller to stop the greedy fusion early if desired. Were you already planning on adding a mechanism for that?

Thanks! I'll take a look. There is a lot to unpack here and I need to understand how this works a bit more. Maybe @nicolasvasilache might already have some ideas here.
With respect to SliceableInterface, its along the lines what Nicolas had suggested a while back to, but more in terms of a "SubSetInterface" where you can define how to extract/insert subset of data (not necessarily rectangular or contiguous). Not sure if anything has already been done on that.
For control, I think the greedy pattern is more for "demonstration". Any control mechanism will probably use the guts of things and be wrapped within the transform dialect. Its really hard IMO to provide a universal control mechanism, but one thing that can work is having a callback that allows callers to decide if a slice has to be used for fusion or not. I didnt add that cause I wasnt sure how it would interact with the transform dialect approach, but I have used such callbacks in a different context, and they did the job.

mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp
27	I think the `arith` dialect has some of this code as utilities (https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/Arithmetic/Utils/Utils.h)

Harbormaster completed remote builds in B175460: Diff 444738.Jul 14 2022, 1:55 PM

Add support for using scf.foreach_thread instead of an scf.for loop nest.
Refactor the implementation to deal with added complexity of multiple loop types.

christopherbate retitled this revision from [mlir][Tensor] Add a TilingInterface implementation for `tensor.collapse_shape`. to [mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape`.Jul 14 2022, 9:51 PM

christopherbate marked an inline comment as done.

Harbormaster completed remote builds in B175563: Diff 444876.Jul 14 2022, 10:21 PM

christopherbate edited the summary of this revision. (Show Details)Jul 18 2022, 12:25 PM

Very cool stuff, thanks @christopherbate this is something that has been missing for a long time!

mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp
32	Can you move the helpers to `include/mlir/Dialect/Utils/ReshapeOpsUtils.h` for better visibility?
69	Probably worth it for this one to inspect the tensor dim and immediately produce `IndexAttr(size)`. Unfortunately, createOrFold always returns a Value atm.
140	nit: Plz reflow comment
141	nit: space after for
142	I think we can move to `SmallVector<OpFoldResult> lbs, ubs, steps;` thanks to recent additions that allow creating more folded expressions than before.
151	You now have `makeComposedFoldedAffineApply` that takes OpFoldResult and does the right thing under the hood.
196	Why the difference with the above ? I personally prefer the mixed `s0 + d0 * s1` form everywhere if possible.
226	I would also move this to the reshape utils for visibility, I anticipate this will become a more important use case over time.
239	nit: `llvm::append_range(extractOffsets, llvm::map_range())` may be a bit nicer.
251	seemslike this is the impl you want for your `getShapeDimSizes` above and then you can just reuse here ?
279	Must the strides always be all 1s ? Or should you pass them as an extra argument too?
288	I'll comment on your other PR, but I would either: move this op to arith and keep the operands as they are, or keep it in tensor but then take an actual tensor as the basis and abstract away the fact that you get the Dims as a lowering detail. Unless you have a stronger argument for keeping the semantics as is that I don't see yet?
352	nit: values.

Left a few comments. Removing my blocker on this. Thanks!

mravishankar added inline comments.Jul 19 2022, 9:24 PM

mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp
39	It might be better to make the return type `Optional<llvm::SmallBitVector>` or `FailureOr<llvm::SmallBitVector>` and return errors instead of `(void)`-ing them this way. Also not clear to me what is the expected behavior if the shape are dynamic. I am expecting that it would return failure that needs to be propagated back up the stack.
111	Just out of curiosity, is there a solution space where we dont use `scf.for` or `scf.for_each` here? Its unfortunate coupling of these loop constructs with the transformation.
345	I dont think you need to cast it to the interface op. The method is implemented on `tensor.extract_slice` itself. You should be able to call it directly.

This revision is now accepted and ready to land.Jul 19 2022, 9:24 PM

Address comments

christopherbate marked an inline comment as done.Jul 20 2022, 8:00 PM

christopherbate added inline comments.

mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp
111	So we need some sort of intermediate construct that handles the discontinuous iteration space. I just chose these loops because they seem commonly used. I'm not sure whether you're asking whether the implementation can be improved to be more generic to the loop op kind or whether there's a better abstraction that can be used instead of a loop. For 1, maybe. But it's probably the same amount of effort to add a method called `buildWithMyLoopOp` to the implementation class. For 2, yes, I think there is more opportunity to fill missing abstractions here. If the linear dim size is 128 and you don't insert a loop, then instead you could directly insert an "unrolled" form of 128 iterations of what you originally put in the body. Of course, in this form it is too unwieldy. One option would be to add an op equivalent to a higher-order `tensor.extract_slice` and/or `vector.transfer_read/write`: it maps some iterations space to a set of slices. That may be easier to further transform versus the loop form.

Harbormaster completed remote builds in B176649: Diff 446340.Jul 20 2022, 8:04 PM

nicolasvasilache added inline comments.Jul 21 2022, 6:16 AM

mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp
111	Yes, we will need new types and ops to handle the discontinuity and that intercepts gather/scatter/parallel_scatter and related abstractions. This is much longer-term though. `it maps some iterations space to a set of slices` -> yes the piecewise discontinuities are important to handle right, I am unclear yet what a good abstraction for this looks like but it will be a major focus or ours for the rest of the year.

Rebase

Harbormaster completed remote builds in B177050: Diff 446896.Jul 22 2022, 11:17 AM

chongxing awarded a token.Jul 25 2022, 7:02 PM

chongxing added a subscriber: chongxing.

christopherbate edited the summary of this revision. (Show Details)Aug 16 2022, 4:09 PM

christopherbate added a parent revision: D131997: [mlir][Affine] Add affine.delinearize_index operation.

nicolasvasilache added inline comments.Aug 17 2022, 11:54 AM

mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp
39	+1 to FailureOr anywhere we need to propagate errors.

Refactor to remove dependence on scf.for/scf.foreach_thread and caller provides iteration mechanism. Fix a couple bugs and improve formatting of tests.

christopherbate edited the summary of this revision. (Show Details)Aug 19 2022, 3:10 PM

Herald added a subscriber: Groverkss. · View Herald TranscriptAug 19 2022, 3:10 PM

Harbormaster completed remote builds in B182300: Diff 454114.Aug 19 2022, 3:22 PM

Thanks!

mlir/include/mlir/Dialect/Tensor/Transforms/Transforms.h
39 ↗	(On Diff #454114)	Could we please add some spacing and structuring to the comments? ... by the caller. The class provides two methods: 1. create: short description 2. emitExtractSliceFromCollapseShapeBody: short description Intended use case: ============== The caller should first ... Then the caller should ... Example: ======= zzz
77 ↗	(On Diff #454114)	This looks more like it belongs in a `TransformUtils.h`
77 ↗	(On Diff #454114)	`s/Builder/Helper` I find this class is a helper that exposes methods. Such methods take a builder to do X. So the class itself is not a Builder, and we since that name has heavy implications in MLIR, we should avoid reusing it IMO.
114 ↗	(On Diff #454114)	More structure to the doc here too plz: ... by: 1. 2. 3. The returned pair ...
114 ↗	(On Diff #454114)	As in the other PR I commented on, I think inverting may be misleading, we are really propagating through the CollapseShapeOp. In a followup, I imagine we could refactor and reuse some of the recent improvements introduced in: https://reviews.llvm.org/D128986 ?
mlir/include/mlir/Dialect/Utils/ReshapeOpsUtils.h
392	this class provides ?
394	The name `ExtractShapeExtractSlice` is hard to relate to. Also, this not a `Builder`. Something like `CollapseLinearizationHelper` ?
409	Better structuring of the docc here would also help readability (e.g. isolating pieces of equations better, having paragraphs)
409	invert -> fold plz
411	Example in IR would help to relate the proper pieces, the description is a bit too abstract to follow.
mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp
95	With C++17 I think it becomes nicer to just: SmallVector<Range> ranges; ranges.reserve(...); for (const auto & [o, s, t] : llvm::zip(...)) ranges.push_back(Range{o, s, t});
151	see, it's a helper :)
mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp
141	Fine for a test but if it want to be hoisted, this should prob be 1 pattern per loop op type. Maybe add a TODO?

Address comments

Harbormaster completed remote builds in B184409: Diff 457037.Aug 31 2022, 11:32 AM

christopherbate marked 4 inline comments as done.Aug 31 2022, 11:35 AM

christopherbate added inline comments.

mlir/include/mlir/Dialect/Tensor/Transforms/Transforms.h
77 ↗	(On Diff #454114)	Moved to new file `TransformUtils.h` and renamed `ExtractSliceFromCollapseShapeHelper`.
114 ↗	(On Diff #454114)	As in the other PR I commented on, I think inverting may be misleading, we are really propagating through the CollapseShapeOp. I don't follow how inverting is misleading. In `x -> CollapseShape -> y -> ExtractSlice -> z` , each of `x,y,z` are different Tensors with different spaces defined by their shapes. The `CollapseShape` and `Extractslice` define maps that map one coordinate to another. The effect of this function is to take coordinates in `z`, which are given by the tile coordinates / loop induction variables, and map them back to coordinates in `x`. I tried to make this point of view explicit in the implementation by using functions `invertSliceIndexing` and `invertCollapseShapeIndexing`. In a followup, I imagine we could refactor and reuse some of the recent improvements introduced in: https://reviews.llvm.org/D128986 ? There are some common aspects that could be factored out to unify these index space manipulation ideas. For example, extracting an element from a collapse-shape'd memref could be phrased as inverting the index transformation of the collapse, which is exactly what thy are doing with the lienarize/delinearize. They could also use the new AffineDelinearizeOp in order to simplify some of the code. The two implementations would be the same except for one aspect: all the complexity in this use case comes from the fact that we have to carry around information about which dimensions are sliced and/or linearized in order to make the generated code efficient. In the linked diff, they don't have to worry about the effect of the slicing aspect, as that code is only dealing with loading individual elements. Even in the body of the loop generated by this function, we are not guaranteed to just load scalars. Rather, you need to also account for the potential of a) slicing a non-linearized dimension (in which case you don't need a loop dimension) or b) taking the entirety of a linearized dimension. You need this information at basically every point in this transformation. If we could discard the slicing aspect, then it would be greatly simplified into the special case of just loading scalar elements.
mlir/include/mlir/Dialect/Utils/ReshapeOpsUtils.h
394	Oops, that was supposed to be `CollapseShapeExtractSlice`. fixed.
409	I changed it, but please see my other comment on the rationale for using "invert" here.
mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp
95	agreed, fixed

christopherbate marked 2 inline comments as done.Aug 31 2022, 11:38 AM

christopherbate added inline comments.

mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp
141	I broke it into two patterns.

Rebase

Harbormaster completed remote builds in B184412: Diff 457040.Aug 31 2022, 11:39 AM

nicolasvasilache accepted this revision.Aug 31 2022, 12:53 PM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/Tensor/Transforms/Transforms.h
114 ↗	(On Diff #454114)	`I don't follow how inverting is misleading.` Ok, then I misunderstood this from the doc. Could we improve the doc with better structuring and some IR example that makes the inversion pop up very clearly?

christopherbate removed parent revisions: D131997: [mlir][Affine] Add affine.delinearize_index operation, D129697: [mlir][Arithmetic] Add `arith.delinearize_index` operation.Aug 31 2022, 1:21 PM

Expand the documentation for the main class that is added. I checked the result of the Doxygen build to make sure it looks correct with relevant hyperlinks and headings correctly generated.

Shortened some function/class names.

christopherbate added inline comments.Aug 31 2022, 3:58 PM

mlir/include/mlir/Dialect/Tensor/Transforms/Transforms.h
114 ↗	(On Diff #454114)	Ok, I've expanded the IR example to be more complicated and added a step-by-step commentary that explains what is happening, including why I'm saying invert/inversion here. Let me know if you still want me to adjust the wording. I've checked the output of this comment block in the Doxygen build as well to ensure all the formatting shows up correctly.

Harbormaster completed remote builds in B184469: Diff 457122.Aug 31 2022, 4:11 PM

Fix typos.

Harbormaster completed remote builds in B184497: Diff 457155.Aug 31 2022, 7:17 PM

Looks great, thanks for all the doc !

mlir/include/mlir/Dialect/Tensor/Transforms/TransformUtils.h
42	nit: "the the"

Fix typo in doc

Harbormaster completed remote builds in B184743: Diff 457480.Sep 1 2022, 8:05 PM

christopherbate marked an inline comment as done.Sep 2 2022, 8:19 AM

Rebase

Harbormaster completed remote builds in B184842: Diff 457617.Sep 2 2022, 10:15 AM

Windows bot seems to be failing for unrelated reason

This revision was landed with ongoing or failed builds.Sep 2 2022, 10:29 AM

Closed by commit rG571195787573: [mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape` (authored by christopherbate). · Explain Why

This revision was automatically updated to reflect the committed changes.

christopherbate added a commit: rG571195787573: [mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape`.

There is an issue of circular dependency here: you're introducing a dependency from Dialect/Utils/ to the ViewLikeInterface, but it already depends on Dialect/Utils

mehdi_amini added inline comments.Sep 2 2022, 4:34 PM

mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp
17	On top of the cyclic dependency mentioned elsewhere, this introduces a dependency on Linalg which isn't defined in CMake as far as I can tell: this is breaking the bots. Also I am not sure that it is conceptually correct to have lib/Dialect/Tensor to depend on Linalg? I'll revert for now.

mehdi_amini added a reverting change: rG0b1aee38bd2f: Revert "[mlir][Tensor] Add rewrites to extract slices through `tensor..Sep 2 2022, 4:36 PM

In D129699#3768090, @mehdi_amini wrote:

There is an issue of circular dependency here: you're introducing a dependency from Dialect/Utils/ to the ViewLikeInterface, but it already depends on Dialect/Utils

Thanks, I should have checked for that.

Depends on https://reviews.llvm.org/D133523 to resolve circular dependency with ViewLikeInterface

This revision is now accepted and ready to land.Sep 8 2022, 1:01 PM

christopherbate mentioned this in D133523: [mlir] NFC - move declaration of `Range` to StaticValueUtils.h.Sep 8 2022, 2:04 PM

Rebase on D133523, resolve linalg dependency issue.

Verified Bazel build is working and added required Bazel build file changes

Harbormaster completed remote builds in B185714: Diff 458872.Sep 8 2022, 2:27 PM

christopherbate added a parent revision: D133523: [mlir] NFC - move declaration of `Range` to StaticValueUtils.h.Sep 8 2022, 2:29 PM

Closed by commit rGf4a478cd0178: [mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape` (authored by christopherbate). · Explain WhySep 8 2022, 8:58 PM

This revision was automatically updated to reflect the committed changes.

christopherbate added a commit: rGf4a478cd0178: [mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape`.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Tensor/

IR/

TensorOps.td

17 lines

Transforms/

TransformUtils.h

210 lines

Utils/

ReshapeOpsUtils.h

85 lines

Interfaces/

ViewLikeInterface.h

7 lines

lib/

Dialect/

Tensor/

IR/

TensorOps.cpp

28 lines

Transforms/

CMakeLists.txt

1 line

ExtractSliceFromReshape.cpp

181 lines

Utils/

CMakeLists.txt

1 line

ReshapeOpsUtils.cpp

88 lines

Interfaces/

ViewLikeInterface.cpp

15 lines

test/

Dialect/

Tensor/

extract-slice-from-collapse-shape.mlir

164 lines

lib/

Dialect/

Tensor/

CMakeLists.txt

1 line

TestTensorTransforms.cpp

140 lines

Diff 457155

mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td

Show First 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	let builders = [
// Build an ExtractSliceOp with dynamic entries and custom result type. If		// Build an ExtractSliceOp with dynamic entries and custom result type. If
// the type passed is nullptr, it is inferred.		// the type passed is nullptr, it is inferred.
OpBuilder<(ins "Value":$source, "ValueRange":$offsets,		OpBuilder<(ins "Value":$source, "ValueRange":$offsets,
"ValueRange":$sizes, "ValueRange":$strides,		"ValueRange":$sizes, "ValueRange":$strides,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,		CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,
// Build an ExtractSliceOp with dynamic entries and inferred result type.		// Build an ExtractSliceOp with dynamic entries and inferred result type.
OpBuilder<(ins "RankedTensorType":$resultType, "Value":$source,		OpBuilder<(ins "RankedTensorType":$resultType, "Value":$source,
"ValueRange":$offsets, "ValueRange":$sizes, "ValueRange":$strides,		"ValueRange":$offsets, "ValueRange":$sizes, "ValueRange":$strides,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>		CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,
		// Build an ExtractSliceOp with mixed static and dynamic entries packed in
		// a Range vector.
		OpBuilder<(ins "Value":$source, "ArrayRef<Range>":$ranges,
		CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,
];		];

let extraClassDeclaration = extraBaseClassDeclaration # [{		let extraClassDeclaration = extraBaseClassDeclaration # [{
/// Returns the type of the base tensor operand.		/// Returns the type of the base tensor operand.
RankedTensorType getSourceType() {		RankedTensorType getSourceType() {
return getSource().getType().cast<RankedTensorType>();		return getSource().getType().cast<RankedTensorType>();
}		}

▲ Show 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	let builders = [
// Build a InsertSliceOp with mixed static and dynamic entries.		// Build a InsertSliceOp with mixed static and dynamic entries.
OpBuilder<(ins "Value":$source, "Value":$dest,		OpBuilder<(ins "Value":$source, "Value":$dest,
"ArrayRef<OpFoldResult>":$offsets, "ArrayRef<OpFoldResult>":$sizes,		"ArrayRef<OpFoldResult>":$offsets, "ArrayRef<OpFoldResult>":$sizes,
"ArrayRef<OpFoldResult>":$strides,		"ArrayRef<OpFoldResult>":$strides,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,		CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,
// Build a InsertSliceOp with dynamic entries.		// Build a InsertSliceOp with dynamic entries.
OpBuilder<(ins "Value":$source, "Value":$dest,		OpBuilder<(ins "Value":$source, "Value":$dest,
"ValueRange":$offsets, "ValueRange":$sizes, "ValueRange":$strides,		"ValueRange":$offsets, "ValueRange":$sizes, "ValueRange":$strides,
		CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,
		// Build an InsertSliceOp with mixed static and dynamic entries packed in
		// a Range vector.
		OpBuilder<(ins "Value":$source, "Value":$dest,
		"ArrayRef<Range>":$ranges,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>		CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>
];		];

let extraClassDeclaration = extraBaseClassDeclaration # [{		let extraClassDeclaration = extraBaseClassDeclaration # [{
/// Returns the type of the base tensor operand.		/// Returns the type of the base tensor operand.
RankedTensorType getSourceType() {		RankedTensorType getSourceType() {
return getSource().getType().cast<RankedTensorType>();		return getSource().getType().cast<RankedTensorType>();
}		}
▲ Show 20 Lines • Show All 582 Lines • ▼ Show 20 Lines	def Tensor_ParallelInsertSliceOp : Tensor_Op<"parallel_insert_slice", [
}];		}];

let builders = [		let builders = [
// Build a ParallelInsertSliceOp with mixed static and dynamic entries.		// Build a ParallelInsertSliceOp with mixed static and dynamic entries.
OpBuilder<(ins "Value":$source, "Value":$dest,		OpBuilder<(ins "Value":$source, "Value":$dest,
"ArrayRef<OpFoldResult>":$offsets, "ArrayRef<OpFoldResult>":$sizes,		"ArrayRef<OpFoldResult>":$offsets, "ArrayRef<OpFoldResult>":$sizes,
"ArrayRef<OpFoldResult>":$strides,		"ArrayRef<OpFoldResult>":$strides,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,		CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,
		// Build a ParallelInsertSliceOp with mixed static and dynamic entries
		// packed into a Range vector.
		OpBuilder<(ins "Value":$source, "Value":$dest,
		"ArrayRef<Range>":$ranges,
		CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>,
// Build a ParallelInsertSliceOp with dynamic entries.		// Build a ParallelInsertSliceOp with dynamic entries.
OpBuilder<(ins "Value":$source, "Value":$dest,		OpBuilder<(ins "Value":$source, "Value":$dest,
"ValueRange":$offsets, "ValueRange":$sizes, "ValueRange":$strides,		"ValueRange":$offsets, "ValueRange":$sizes, "ValueRange":$strides,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>		CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>
];		];

let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
let hasVerifier = 1;		let hasVerifier = 1;
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Tensor/Transforms/TransformUtils.h

This file was added.

				//===- TransformsUtils.h - Tensor Transformation Utilities-------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				#ifndef MLIR_DIALECT_TENSOR_TRANSFORMS_TRANSFORMUTILS_H
				#define MLIR_DIALECT_TENSOR_TRANSFORMS_TRANSFORMUTILS_H

				#include "mlir/Dialect/Tensor/IR/Tensor.h"
				#include "mlir/IR/PatternMatch.h"

				namespace mlir {
				namespace tensor {

				//===----------------------------------------------------------------------===//
				// Extract slice from `tensor.collapse_shape`
				//===----------------------------------------------------------------------===//

				/// This class assists with generating IR required to materialize an
				/// arbitrary-sized slice from the result of a CollapseShapeOp. In order to
				/// accomplish this, a loop nest or similar operation must be created by the
				/// caller. The purpose of the loop nest is to generate a "tiling by 1" of all
				/// sliced dimensions. The "tiling by 1" assembles all elements of the result
				/// tile over dimensions that would have been impossible to directly slice.
				///
				/// The class provides three methods:
				/// 1. `ExtractSliceFromCollapseHelper::create`: emits IR that should
				/// appear before the loop nest and populates the internal state.
				/// 2. `ExtractSliceFromCollapseHelper::getIterationSpaceSizes`: returns
				/// parameters used by the caller to construct the loop nest.
				/// 3. `ExtractSliceFromCollapseHelper::emitLoopNestBody`:
				/// emits IR to construct a "size-1 tile" of the desired result and returns a
				/// set of ranges where the tile should be inserted into the destination
				/// tensor.
				///
				/// ### Intended usage:
				///
				/// The caller should first call `ExtractSliceFromCollapseHelper::create` and
				/// then create a destination tensor that is the same size as the desired slice.
				/// The caller then creates a loop nest that iterates over the the
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions nit: "the the" nicolasvasilache: nit: "the the"
				/// multi-dimensional iteration space defined by `[0, ub[0]) x [0, ub[1]] x ...
				/// x [0, ub[N-1]]` where `ub` is the upper bound given by
				/// `ExtractSliceFromCollapseHelper::getIterationSpaceSizes`. Inside the body of
				/// the loop nest, the caller should call
				/// `ExtractSliceFromCollapseHelper::emitLoopNestBody` and provide the induction
				/// variables. This returns a sub-tile and a set of ranges that describe where
				/// this tile should be inserted into the result by the caller. For a complete
				/// example of usage, see the examples in the TestTensorTransforms pass.
				///
				/// ### Example:
				/// Consider the following IR:
				/// ```
				/// %0 = linalg.generic ... -> tensor<3x?x?x11x?xf32>
				/// %1 = tensor.collapse_shape %0 [[0, 1, 2], [3, 4]]
				/// : tensor<3x?x?x11x?xf32> into tensor<?x?xf32>
				/// %2 = tensor.extract_slice %1 [%offt0, %offt1][%size0, %size1][1, 1]
				/// : tensor<?x?xf32> to tensor<?x?xf32>
				/// ```
				///
				/// We can construct %2 by generating the following, which only uses `%0`:
				///
				/// ```
				/// %dest = linalg.init_tensor [%size0, %size1] : tensor<?x?xf32>
				/// %1 = tensor.dim %0, %c1 : tensor<3x?x?x11x?xf32>
				/// %2 = tensor.dim %0, %c2 : tensor<3x?x?x11x?xf32>
				/// %3 = tensor.dim %0, %c4 : tensor<3x?x?x11x?xf32>
				///
				/// %result = scf.for %iv0 = %c0 to %arg2 step %c1 iter_args(%arg6 = %dest) ->
				/// (tensor<?x?xf32>) {
				/// %5 = scf.for %iv1 = %c0 to %arg4 step %c1 iter_args(%arg8 = %arg6)
				/// -> (tensor<?x?xf32>) {
				/// %lin0 = (affine.apply) %iv0 + %offt0
				/// %lin1 = (affine.apply) %iv1 + %offt1
				///
				/// %mi0:3 = affine.delinearize_index %lin0 into (%c3, %1, %2)
				/// %mi1:2 = affine.delinearize_index %lin1 into (%c11, %3)
				///
				/// %sub_tile = tensor.extract_slice %0
				/// [%mi0#0, %mi0#1, %mi0#2, %mi1#0, %mi1#1]
				/// [1, 1, 1, 1, 1]
				/// [1, 1, 1, 1, 1]
				/// : tensor<3x?x?x11x?xf32> to tensor<1x1x1x1x1xf32>
				/// %sub_tile_collapsed = tensor.collapse_shape %sub_tile
				/// [[0, 1, 2], [3, 4]]
				/// : tensor<1x1x1x1x1xf32> into tensor<1x1xf3
				///
				/// %12 = tensor.insert_slice %sub_tile_collapsed into
				/// %arg8[%iv0, %iv1] [1, 1] [1, 1]
				/// : tensor<1x1xf32> into tensor<?x?xf32>
				/// scf.yield %12 : tensor<?x?xf32>
				/// }
				/// scf.yield %5 : tensor<?x?xf32>
				/// }
				/// ```
				///
				/// ### Explanation of example:
				///
				/// Each step above is explained below.
				///
				/// #### Step 0: Create %dest and materialization of shapes.
				/// This step is self-explanatory and performed by the caller. It can be
				/// done before or after calling `ExtractSliceFromCollapseHelper::create`,
				/// which materializes the source shape (`%0, %1, %2`).
				///
				/// #### Step 1: Create loop nest.
				///
				/// The caller creates the loop nest (depicted here is `scf.for`, but any other
				/// similar op can be used). The iteration should start at zero and proceed with
				/// step size 1 to the upper bounds given by
				/// `ExtractSliceFromCollapseHelper::getIterationSpaceSizes`. This forms the
				/// basis for the "tiling by 1".
				///
				/// #### Step 2: Transform (%iv0, %iv1) from the index space of %3 to the index
				/// space of %0.
				///
				/// This step is performed by
				/// `ExtractSliceFromCollapseHelper::emitLoopNestBody`.
				///
				/// The induction variables `%iv0` and `%iv1` live in the
				/// index space of %2 (for dimensions 0 and 1, respectively). `%lin0` and
				/// `%lin1` are the result of inverting or resolve the index space
				/// transformation represented by the slice operation, accounting for offset and
				/// stride. Subsequently, `%mi0` and `%mi1` are the result of applying the
				/// inverse index space transformation represented by `tensor.collapse_shape`.
				/// This is accomplished using `affine.delinearize_index`. Note that %iv0
				/// and %iv1 now correspond to multi-indices `%mi0:3` and `%mi1:2`.
				///
				/// #### Step 3: Extract a sub-tile slice from the source.
				///
				/// This step is also performed by
				/// `ExtractSliceFromCollapseHelper::emitLoopNestBody`.
				///
				/// The indices `%mi0` and `%mi1` are used to extract a slice from %0. This
				/// slice is then collapsed down to match the result rank.
				///
				/// #### Step 4: Insert sub-tile into the destination
				///
				/// This step is performed by the caller using the results of
				/// `ExtractSliceFromCollapseHelper::emitLoopNestBody`.
				///
				/// In the above example, the slice insertion parameters are straightforward,
				/// but in other possible situations, the slice parameters are more complicated,
				/// which is why this helper calculates them for the caller. These other
				/// situations correspond to:
				/// 1. The presence of linearized dimensions that are not sliced
				/// 2. The presence of non-linearized dimensions that are sliced.
				class ExtractSliceFromCollapseHelper {
				public:
				/// Given a CollapseShapeOp and a set of ranges describing the desired slice
				/// of its result, emits IR to materialize the shapes of the input and output
				/// tensors, and returns an instance of the initialized class. Returns failure
				/// if the slice is rank-reducing.
				static FailureOr<ExtractSliceFromCollapseHelper>
				create(OpBuilder &b, tensor::CollapseShapeOp op, ArrayRef<Range> sliceParams);

				/// Given a CollapseShapeOp and an ExtractSliceOp acting on its result, emits
				/// IR to materialize the shapes of the input and output tensors of the
				/// CollapseShapeOp, and returns an instance of the initialized class. Returns
				/// failure if the slice is rank-reducing.
				static FailureOr<ExtractSliceFromCollapseHelper>
				create(OpBuilder &b, tensor::CollapseShapeOp collapseOp,
				tensor::ExtractSliceOp extractOp);

				ExtractSliceFromCollapseHelper(
				tensor::CollapseShapeOp collapseShapeOp,
				ArrayRef<OpFoldResult> collapseShapeInputShape,
				ArrayRef<OpFoldResult> collapseShapeOutputShape,
				ArrayRef<Range> extractSliceParams,
				const llvm::SmallBitVector &linearizedDimensions,
				const llvm::SmallBitVector &slicedDimensions, ArrayRef<Value> tiledSizes)
				: collapseShapeOp(collapseShapeOp),
				collapseShapeInputShape(collapseShapeInputShape),
				collapseShapeOutputShape(collapseShapeOutputShape),
				sliceParams(extractSliceParams),
				linearizedDimensions(linearizedDimensions),
				slicedDimensions(slicedDimensions), tiledSizes(tiledSizes) {}

				/// Return the upper bounds of the iteration space (with 0 offset and stride
				/// 1) required to create the desired slice. Note that this is not the same
				/// as the `sizes` parameters of the ExtractSliceOp because not all dimensions
				/// of the slice are required to be tiled to form the result.
				const SmallVector<Value> &getIterationSpaceSizes() { return tiledSizes; }

				/// Generates the IR inside of the caller's loop nest for 1) inverting the
				/// index mappings of the ExtractSliceOp->CollapseShapeOp chain and 2)
				/// extracting the CollapseShapeOp source tensor tile for this specified
				/// iteration space point `tileInductionVars` and 3) calculating where to
				/// insert the extracted tile. The returned pair consists of the results of
				/// (2) and (3) and should be used by the caller to insert into the
				/// destination tensor.
				std::pair<Value, SmallVector<Range>>
				emitLoopNestBody(OpBuilder &builder, Location loc,
				ValueRange tileInductionVars);

				private:
				tensor::CollapseShapeOp collapseShapeOp;
				SmallVector<OpFoldResult> collapseShapeInputShape;
				SmallVector<OpFoldResult> collapseShapeOutputShape;
				SmallVector<Range> sliceParams;
				llvm::SmallBitVector linearizedDimensions;
				llvm::SmallBitVector slicedDimensions;
				SmallVector<Value> tiledSizes;
				};

				} // namespace tensor
				} // namespace mlir

				#endif // MLIR_DIALECT_TENSOR_TRANSFORMS_TRANSFORMUTILS_H

mlir/include/mlir/Dialect/Utils/ReshapeOpsUtils.h

Show All 10 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef MLIR_DIALECT_UTILS_RESHAPEOPSUTILS_H		#ifndef MLIR_DIALECT_UTILS_RESHAPEOPSUTILS_H
#define MLIR_DIALECT_UTILS_RESHAPEOPSUTILS_H		#define MLIR_DIALECT_UTILS_RESHAPEOPSUTILS_H

#include "mlir/IR/OpImplementation.h"		#include "mlir/IR/OpImplementation.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
		#include "mlir/Interfaces/ViewLikeInterface.h"
#include "mlir/Support/LLVM.h"		#include "mlir/Support/LLVM.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"

namespace mlir {		namespace mlir {

using ReassociationIndices = SmallVector<int64_t, 2>;		using ReassociationIndices = SmallVector<int64_t, 2>;
using ReassociationIndicesRef = ArrayRef<int64_t>;		using ReassociationIndicesRef = ArrayRef<int64_t>;
using ReassociationExprs = SmallVector<AffineExpr, 2>;		using ReassociationExprs = SmallVector<AffineExpr, 2>;
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	for (auto item : llvm::zip(srcReassociation, resultReassociation)) {
shape_indices.push_back(srcIndices.front() + index);		shape_indices.push_back(srcIndices.front() + index);
composedReassociation.push_back(shape_indices);		composedReassociation.push_back(shape_indices);
}		}
}		}
return {std::move(composedReassociation)};		return {std::move(composedReassociation)};
}		}
};		};

		/// The input parameters `offsets`, `sizes`, `strides` specify a rectangular
		/// non rank-reducing slice of the collapse_shape output. Try to find which
		/// dimensions have been sliced and which dimensions are not sliced (offset = 0,
		/// size = dim, size = 1). Note that this conservative as it cannot detect if a
		/// dynamic size corresponds to the full tensor dimension or not.
		llvm::SmallBitVector getSlicedDimensions(ArrayRef<OpFoldResult> sliceInputShape,
		ArrayRef<Range> sliceParams);

		/// Determine which dimensions are linearized by a `tensor.collapse_shape` op by
		/// inspecting its reassociation indices.
		llvm::SmallBitVector
		getLinearizedDimensions(ArrayRef<ReassociationIndices> reassociationIndices);

		/// Given the parameters for both operations in a `CollapseShape->ExtractSlice`
		/// chain and reified source and result shapes of the CollapseShapeOp, this
		/// class provides two functions that assist with directly forming the result
		nicolasvasilacheUnsubmitted Done Reply Inline Actions this class provides ? nicolasvasilache: this class provides ?
		/// of the extract slice by "tiling the CollapseShapeOp by 1".
		//// Example:
		nicolasvasilacheUnsubmitted Done Reply Inline Actions The name `ExtractShapeExtractSlice` is hard to relate to. Also, this not a `Builder`. Something like `CollapseLinearizationHelper` ? nicolasvasilache: The name `ExtractShapeExtractSlice` is hard to relate to. Also, this not a `Builder`.
		christopherbateAuthorUnsubmitted Done Reply Inline Actions Oops, that was supposed to be `CollapseShapeExtractSlice`. fixed. christopherbate: Oops, that was supposed to be `CollapseShapeExtractSlice`. fixed.
		// clang-format off
		/// ```
		/// %0 = linalg.generic ... -> tensor<3x7x11x10xf32>
		/// %1 = tensor.collapse_shape %0 [[0, 1, 2], [3]] : ... to tensor<341x10xf32>
		/// %2 = tensor.extract_slice %1 [13, 0] [10, 10] [2, 1] : .... tensor<10x10xf32>
		/// ```
		/// This class helps build the below IR to replace %2:
		/// ```
		/// %dest = linalg.init_tensor() : tensor<10x10xf32>
		/// %2 = scf.for %iv = %c0 to %c10 step %c1 iter_args(%arg0) -> tensor<10x10xf32> {
		/// %linear_index = affine.apply affine_map<(d0)[]->(d0*2 + 11)>(%iv)
		/// %3:3 = arith.delinearize_index %iv into (3, 7, 11)
		///
		/// // This function takes %3 (multiIndices) and the parameters for the slice below.
		/// %4 = tensor.extract_slice %0 [%3#0, %3#1, %3#2, 0] [1, 1, 1, 10] [1, 1, 1, 1] :
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Better structuring of the docc here would also help readability (e.g. isolating pieces of equations better, having paragraphs) nicolasvasilache: Better structuring of the docc here would also help readability (e.g. isolating pieces of…
		nicolasvasilacheUnsubmitted Done Reply Inline Actions invert -> fold plz nicolasvasilache: invert -> fold plz
		christopherbateAuthorUnsubmitted Done Reply Inline Actions I changed it, but please see my other comment on the rationale for using "invert" here. christopherbate: I changed it, but please see my other comment on the rationale for using "invert" here.
		/// tensor<3x7x11x10xf32> to tensor<1x1x1x10xf32>
		///
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Example in IR would help to relate the proper pieces, the description is a bit too abstract to follow. nicolasvasilache: Example in IR would help to relate the proper pieces, the description is a bit too abstract to…
		/// %5 = tensor.collapse_shape %4 [[0, 1, 2], [3]] :
		/// tensor<1x1x1x10xf32> into tensor<1x10xf32>
		/// %6 = tensor.insert_slice %5 into %arg0 [%iv, 0] [1, 10] [1, 1] :
		/// tensor<1x10xf32> into tensor<10x10xf32>
		/// scf.yield %6 : tensor<10x10xf32>
		/// }
		/// ```
		// clang-format on
		class SliceFromCollapseHelper {
		public:
		SliceFromCollapseHelper(ArrayRef<ReassociationIndices> reassociationIndices,
		ArrayRef<OpFoldResult> collapseShapeInputShape,
		ArrayRef<OpFoldResult> collapseShapeOutputShape,
		ArrayRef<Range> extractSliceParams)
		: reassociationIndices(reassociationIndices),
		collapseShapeInputShape(collapseShapeInputShape),
		collapseShapeOutputShape(collapseShapeOutputShape),
		sliceParams(extractSliceParams),
		linearizedDimensions(getLinearizedDimensions(reassociationIndices)),
		slicedDimensions(getSlicedDimensions(collapseShapeOutputShape,
		extractSliceParams)) {}

		/// This function takes multi-indices and maps them to ExtractSlice parameters
		/// in the index space of the CollapseShape's source tensor. This function's
		/// signature can be described by `(D_0, D_1,.. D_{n-1}) -> (offsets, sizes,
		/// strides)` where `n` the number of "tiled dimensions", which are the
		/// dimensions of the output that are linearized by the collapse shape op and
		/// are also sliced. Each `D_i` is a tuple that must represent a valid
		/// multi-index for the `i-th` tiled dimension. In the example above, there is
		/// only one tiled dimension (D_0) and `arith.delinearize_index` produces the
		/// multi-index (%3) that would be passed to this function to generate the
		/// parameters for the `tensor.extract_slice` op (%4).
		SmallVector<Range> getExtractSliceParams(ArrayRef<ValueRange> multiIndices);

		/// This function takes indices in the index space of the "tiled dimensions"
		/// described above and returns a set of Range variables that describe how the
		/// slice should be inserted into the destination. In the example above, `%iv`
		/// would be passed to this function to generate the parameters for the
		/// `tensor.insert_slice` op producing %6.
		SmallVector<Range> getInsertSliceParams(ValueRange tileIndices);

		private:
		SmallVector<ReassociationIndices> reassociationIndices;
		SmallVector<OpFoldResult> collapseShapeInputShape;
		SmallVector<OpFoldResult> collapseShapeOutputShape;
		SmallVector<Range> sliceParams;
		llvm::SmallBitVector linearizedDimensions;
		llvm::SmallBitVector slicedDimensions;
		};
} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_UTILS_RESHAPEOPSUTILS_H		#endif // MLIR_DIALECT_UTILS_RESHAPEOPSUTILS_H

mlir/include/mlir/Interfaces/ViewLikeInterface.h

	Show All 24 Lines
	/// operands into a list of triples. Such a list can be more convenient to			/// operands into a list of triples. Such a list can be more convenient to
	/// manipulate.			/// manipulate.
	struct Range {			struct Range {
	OpFoldResult offset;			OpFoldResult offset;
	OpFoldResult size;			OpFoldResult size;
	OpFoldResult stride;			OpFoldResult stride;
	};			};

				/// Given an array of Range values, return a tuple of (offset vector, sizes
				/// vector, and strides vector) formed by separating out the individual elements
				/// of each range.
				std::tuple<SmallVector<OpFoldResult>, SmallVector<OpFoldResult>,
				SmallVector<OpFoldResult>>
				getOffsetsSizesAndStrides(ArrayRef<Range> ranges);

	/// Return a vector of OpFoldResults given the special value			/// Return a vector of OpFoldResults given the special value
	/// that indicates whether of the value is dynamic or not.			/// that indicates whether of the value is dynamic or not.
	SmallVector<OpFoldResult, 4> getMixedValues(ArrayAttr staticValues,			SmallVector<OpFoldResult, 4> getMixedValues(ArrayAttr staticValues,
	ValueRange dynamicValues,			ValueRange dynamicValues,
	int64_t dynamicValueIndicator);			int64_t dynamicValueIndicator);

	/// Return a vector of all the static and dynamic offsets/strides.			/// Return a vector of all the static and dynamic offsets/strides.
	SmallVector<OpFoldResult, 4> getMixedStridesOrOffsets(ArrayAttr staticValues,			SmallVector<OpFoldResult, 4> getMixedStridesOrOffsets(ArrayAttr staticValues,
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

Show First 20 Lines • Show All 1,118 Lines • ▼ Show 20 Lines
void ExtractSliceOp::build(OpBuilder &b, OperationState &result, Value source,		void ExtractSliceOp::build(OpBuilder &b, OperationState &result, Value source,
ArrayRef<OpFoldResult> offsets,		ArrayRef<OpFoldResult> offsets,
ArrayRef<OpFoldResult> sizes,		ArrayRef<OpFoldResult> sizes,
ArrayRef<OpFoldResult> strides,		ArrayRef<OpFoldResult> strides,
ArrayRef<NamedAttribute> attrs) {		ArrayRef<NamedAttribute> attrs) {
build(b, result, RankedTensorType(), source, offsets, sizes, strides, attrs);		build(b, result, RankedTensorType(), source, offsets, sizes, strides, attrs);
}		}

		/// Build an ExtractSliceOp with mixed static and dynamic entries packed into a
		/// Range vector.
		void ExtractSliceOp::build(OpBuilder &b, OperationState &result, Value source,
		ArrayRef<Range> ranges,
		ArrayRef<NamedAttribute> attrs) {
		auto [offsets, sizes, strides] = getOffsetsSizesAndStrides(ranges);
		build(b, result, RankedTensorType(), source, offsets, sizes, strides, attrs);
		}

/// Build an ExtractSliceOp with dynamic entries and custom result type. If the		/// Build an ExtractSliceOp with dynamic entries and custom result type. If the
/// type passed is nullptr, it is inferred.		/// type passed is nullptr, it is inferred.
void ExtractSliceOp::build(OpBuilder &b, OperationState &result,		void ExtractSliceOp::build(OpBuilder &b, OperationState &result,
RankedTensorType resultType, Value source,		RankedTensorType resultType, Value source,
ValueRange offsets, ValueRange sizes,		ValueRange offsets, ValueRange sizes,
ValueRange strides, ArrayRef<NamedAttribute> attrs) {		ValueRange strides, ArrayRef<NamedAttribute> attrs) {
SmallVector<OpFoldResult> offsetValues = llvm::to_vector<4>(		SmallVector<OpFoldResult> offsetValues = llvm::to_vector<4>(
llvm::map_range(offsets, [](Value v) -> OpFoldResult { return v; }));		llvm::map_range(offsets, [](Value v) -> OpFoldResult { return v; }));
▲ Show 20 Lines • Show All 371 Lines • ▼ Show 20 Lines	void InsertSliceOp::build(OpBuilder &b, OperationState &result, Value source,
dispatchIndexOpFoldResults(strides, dynamicStrides, staticStrides,		dispatchIndexOpFoldResults(strides, dynamicStrides, staticStrides,
ShapedType::kDynamicStrideOrOffset);		ShapedType::kDynamicStrideOrOffset);
build(b, result, dest.getType(), source, dest, dynamicOffsets, dynamicSizes,		build(b, result, dest.getType(), source, dest, dynamicOffsets, dynamicSizes,
dynamicStrides, b.getI64ArrayAttr(staticOffsets),		dynamicStrides, b.getI64ArrayAttr(staticOffsets),
b.getI64ArrayAttr(staticSizes), b.getI64ArrayAttr(staticStrides));		b.getI64ArrayAttr(staticSizes), b.getI64ArrayAttr(staticStrides));
result.addAttributes(attrs);		result.addAttributes(attrs);
}		}

		/// Build an InsertSliceOp with mixed static and dynamic entries packed into a
		/// Range vector.
		void InsertSliceOp::build(OpBuilder &b, OperationState &result, Value source,
		Value dest, ArrayRef<Range> ranges,
		ArrayRef<NamedAttribute> attrs) {
		auto [offsets, sizes, strides] = getOffsetsSizesAndStrides(ranges);
		build(b, result, source, dest, offsets, sizes, strides, attrs);
		}

// Build a InsertSliceOp with dynamic entries.		// Build a InsertSliceOp with dynamic entries.
void InsertSliceOp::build(OpBuilder &b, OperationState &result, Value source,		void InsertSliceOp::build(OpBuilder &b, OperationState &result, Value source,
Value dest, ValueRange offsets, ValueRange sizes,		Value dest, ValueRange offsets, ValueRange sizes,
ValueRange strides, ArrayRef<NamedAttribute> attrs) {		ValueRange strides, ArrayRef<NamedAttribute> attrs) {
SmallVector<OpFoldResult> offsetValues = llvm::to_vector<4>(		SmallVector<OpFoldResult> offsetValues = llvm::to_vector<4>(
llvm::map_range(offsets, [](Value v) -> OpFoldResult { return v; }));		llvm::map_range(offsets, [](Value v) -> OpFoldResult { return v; }));
SmallVector<OpFoldResult> sizeValues = llvm::to_vector<4>(		SmallVector<OpFoldResult> sizeValues = llvm::to_vector<4>(
llvm::map_range(sizes, [](Value v) -> OpFoldResult { return v; }));		llvm::map_range(sizes, [](Value v) -> OpFoldResult { return v; }));
▲ Show 20 Lines • Show All 746 Lines • ▼ Show 20 Lines	void ParallelInsertSliceOp::build(OpBuilder &b, OperationState &result,
dispatchIndexOpFoldResults(strides, dynamicStrides, staticStrides,		dispatchIndexOpFoldResults(strides, dynamicStrides, staticStrides,
ShapedType::kDynamicStrideOrOffset);		ShapedType::kDynamicStrideOrOffset);
build(b, result, {}, source, dest, dynamicOffsets, dynamicSizes,		build(b, result, {}, source, dest, dynamicOffsets, dynamicSizes,
dynamicStrides, b.getI64ArrayAttr(staticOffsets),		dynamicStrides, b.getI64ArrayAttr(staticOffsets),
b.getI64ArrayAttr(staticSizes), b.getI64ArrayAttr(staticStrides));		b.getI64ArrayAttr(staticSizes), b.getI64ArrayAttr(staticStrides));
result.addAttributes(attrs);		result.addAttributes(attrs);
}		}

		/// Build an ParallelInsertSliceOp with mixed static and dynamic entries packed
		/// into a Range vector.
		void ParallelInsertSliceOp::build(OpBuilder &b, OperationState &result,
		Value source, Value dest,
		ArrayRef<Range> ranges,
		ArrayRef<NamedAttribute> attrs) {
		auto [offsets, sizes, strides] = getOffsetsSizesAndStrides(ranges);
		build(b, result, source, dest, offsets, sizes, strides, attrs);
		}

// Build a ParallelInsertSliceOp with dynamic entries.		// Build a ParallelInsertSliceOp with dynamic entries.
void ParallelInsertSliceOp::build(OpBuilder &b, OperationState &result,		void ParallelInsertSliceOp::build(OpBuilder &b, OperationState &result,
Value source, Value dest, ValueRange offsets,		Value source, Value dest, ValueRange offsets,
ValueRange sizes, ValueRange strides,		ValueRange sizes, ValueRange strides,
ArrayRef<NamedAttribute> attrs) {		ArrayRef<NamedAttribute> attrs) {
SmallVector<OpFoldResult> offsetValues = llvm::to_vector<4>(		SmallVector<OpFoldResult> offsetValues = llvm::to_vector<4>(
llvm::map_range(offsets, [](Value v) -> OpFoldResult { return v; }));		llvm::map_range(offsets, [](Value v) -> OpFoldResult { return v; }));
SmallVector<OpFoldResult> sizeValues = llvm::to_vector<4>(		SmallVector<OpFoldResult> sizeValues = llvm::to_vector<4>(
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

mlir/lib/Dialect/Tensor/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRTensorTransforms			add_mlir_dialect_library(MLIRTensorTransforms
	BufferizableOpInterfaceImpl.cpp			BufferizableOpInterfaceImpl.cpp
	Bufferize.cpp			Bufferize.cpp
				ExtractSliceFromReshape.cpp
	SplitPadding.cpp			SplitPadding.cpp
	SwapExtractSliceWithProducer.cpp			SwapExtractSliceWithProducer.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Tensor/Transforms			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Tensor/Transforms

	DEPENDS			DEPENDS
	MLIRTensorTransformsIncGen			MLIRTensorTransformsIncGen
	Show All 14 Lines

mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp

This file was added.

				//===- ExtractSliceFromReshape.cpp - Slice reshape rewrites-------- C++--===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements rewrites that replace slices of reshape results with
				// aggregated slices of the reshape source.
				//
				//===----------------------------------------------------------------------===//
				#include "mlir/Dialect/Affine/IR/AffineOps.h"
				#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
				#include "mlir/Dialect/Arithmetic/Utils/Utils.h"
				#include "mlir/Dialect/Linalg/IR/Linalg.h"
				#include "mlir/Dialect/SCF/IR/SCF.h"
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions On top of the cyclic dependency mentioned elsewhere, this introduces a dependency on Linalg which isn't defined in CMake as far as I can tell: this is breaking the bots. Also I am not sure that it is conceptually correct to have lib/Dialect/Tensor to depend on Linalg? I'll revert for now. mehdi_amini: On top of the cyclic dependency mentioned elsewhere, this introduces a dependency on Linalg…
				#include "mlir/Dialect/Tensor/IR/Tensor.h"
				#include "mlir/Dialect/Tensor/Transforms/TransformUtils.h"
				#include "mlir/Dialect/Tensor/Transforms/Transforms.h"
				#include "mlir/Dialect/Utils/ReshapeOpsUtils.h"
				#include "mlir/Dialect/Utils/StaticValueUtils.h"
				#include "mlir/IR/BuiltinTypes.h"
				#include "mlir/IR/OpDefinition.h"
				#include "llvm/ADT/STLExtras.h"

				using namespace mlir;
				mravishankarUnsubmitted Done Reply Inline Actions I think the `arith` dialect has some of this code as utilities (https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/Arithmetic/Utils/Utils.h) mravishankar: I think the `arith` dialect has some of this code as utilities (https://github.com/llvm/llvm…
				using namespace mlir::tensor;

				/// Get the dimension size of a value of RankedTensor type at the
				OpFoldResult getShapeDimSize(OpBuilder &b, Location loc, Value rankedTensor,
				int64_t dimIdx) {
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Can you move the helpers to `include/mlir/Dialect/Utils/ReshapeOpsUtils.h` for better visibility? nicolasvasilache: Can you move the helpers to `include/mlir/Dialect/Utils/ReshapeOpsUtils.h` for better…
				RankedTensorType tensorType = rankedTensor.getType().cast<RankedTensorType>();
				if (!tensorType.isDynamicDim(dimIdx)) {
				return b.getIndexAttr(tensorType.getDimSize(dimIdx));
				}
				Value idxValue = b.create<arith::ConstantIndexOp>(loc, dimIdx);
				return b.createOrFold<tensor::DimOp>(loc, rankedTensor, idxValue);
				}
				mravishankarUnsubmitted Done Reply Inline Actions It might be better to make the return type `Optional<llvm::SmallBitVector>` or `FailureOr<llvm::SmallBitVector>` and return errors instead of `(void)`-ing them this way. Also not clear to me what is the expected behavior if the shape are dynamic. I am expecting that it would return failure that needs to be propagated back up the stack. mravishankar: It might be better to make the return type `Optional<llvm::SmallBitVector>` or `FailureOr<llvm…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions +1 to FailureOr anywhere we need to propagate errors. nicolasvasilache: +1 to FailureOr anywhere we need to propagate errors.

				/// Get all the dimension sizes of a value of RankedTensor type.
				static SmallVector<OpFoldResult> getShapeDimSizes(OpBuilder &b, Location loc,
				Value rankedTensor) {
				SmallVector<OpFoldResult> dimSizes;
				RankedTensorType tensorType = rankedTensor.getType().cast<RankedTensorType>();
				for (unsigned i = 0; i < tensorType.getRank(); i++)
				dimSizes.push_back(getShapeDimSize(b, loc, rankedTensor, i));
				return dimSizes;
				}

				/// A tuple that represents (dimension number, dimension value).
				using DimAndIndex = std::tuple<unsigned, Value>;

				/// Transform `dimAndIndex` from the output index space of a (non-rank-reducing)
				/// slice described by `sliceParams` into the input index space.
				static DimAndIndex invertSliceIndexing(OpBuilder &b, Location loc,
				ArrayRef<Range> sliceParams,
				const DimAndIndex &dimAndIndex) {
				AffineExpr d0, s0, s1;
				bindDims(b.getContext(), d0);
				bindSymbols(b.getContext(), s0, s1);
				auto [dim, indexValue] = dimAndIndex;
				assert(dim < sliceParams.size() && "slice should be non rank-reducing");
				return std::make_pair(
				dim,
				makeComposedAffineApply(
				b, loc, s0 + d0 * s1,
				{indexValue,
				getValueOrCreateConstantIndexOp(b, loc, sliceParams[dim].offset),
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Probably worth it for this one to inspect the tensor dim and immediately produce `IndexAttr(size)`. Unfortunately, createOrFold always returns a Value atm. nicolasvasilache: Probably worth it for this one to inspect the tensor dim and immediately produce `IndexAttr…
				getValueOrCreateConstantIndexOp(b, loc, sliceParams[dim].stride)}));
				}

				/// Transform `dimAndIndex` from the result tensor index space of a
				/// CollapseShapeOp to the source tensor index space.
				static ValueRange invertCollapseShapeIndexing(
				OpBuilder &b, Location loc, ArrayRef<ReassociationIndices> reassociation,
				ArrayRef<OpFoldResult> reshapeSourceShape, const DimAndIndex &dimAndIndex) {
				const auto &[dim, indexValue] = dimAndIndex;
				SmallVector<OpFoldResult> basis;
				for (int64_t i : reassociation[dim])
				basis.push_back(reshapeSourceShape[i]);
				auto delinearized =
				b.create<AffineDelinearizeIndexOp>(loc, indexValue, basis);
				return delinearized->getResults();
				}

				FailureOr<ExtractSliceFromCollapseHelper>
				tensor::ExtractSliceFromCollapseHelper::create(
				OpBuilder &b, tensor::CollapseShapeOp collapseOp,
				tensor::ExtractSliceOp extractOp) {
				if (extractOp.getSource().getDefiningOp<tensor::CollapseShapeOp>() !=
				collapseOp)
				return failure();
				SmallVector<Range> ranges;
				ranges.reserve(extractOp.getSourceType().getRank());
				nicolasvasilacheUnsubmitted Done Reply Inline Actions With C++17 I think it becomes nicer to just: SmallVector<Range> ranges; ranges.reserve(...); for (const auto & [o, s, t] : llvm::zip(...)) ranges.push_back(Range{o, s, t}); nicolasvasilache: With C++17 I think it becomes nicer to just: ``` SmallVector<Range> ranges; ranges.reserve(...)…
				christopherbateAuthorUnsubmitted Done Reply Inline Actions agreed, fixed christopherbate: agreed, fixed
				for (const auto &[o, s, st] :
				llvm::zip(extractOp.getMixedOffsets(), extractOp.getMixedSizes(),
				extractOp.getMixedStrides())) {
				ranges.push_back({o, s, st});
				}
				return ExtractSliceFromCollapseHelper::create(b, collapseOp, ranges);
				}

				FailureOr<ExtractSliceFromCollapseHelper>
				tensor::ExtractSliceFromCollapseHelper::create(OpBuilder &b,
				tensor::CollapseShapeOp op,
				ArrayRef<Range> sliceParams) {

				// Materialize the output shape of the collapse_shape operation. This will
				// create IR describing the output shape in terms of the input shape.
				ReifiedRankedShapedTypeDims reifiedShapes;
				mravishankarUnsubmitted Done Reply Inline Actions Just out of curiosity, is there a solution space where we dont use `scf.for` or `scf.for_each` here? Its unfortunate coupling of these loop constructs with the transformation. mravishankar: Just out of curiosity, is there a solution space where we dont use `scf.for` or `scf.for_each`…
				christopherbateAuthorUnsubmitted Done Reply Inline Actions So we need some sort of intermediate construct that handles the discontinuous iteration space. I just chose these loops because they seem commonly used. I'm not sure whether you're asking whether the implementation can be improved to be more generic to the loop op kind or whether there's a better abstraction that can be used instead of a loop. For 1, maybe. But it's probably the same amount of effort to add a method called `buildWithMyLoopOp` to the implementation class. For 2, yes, I think there is more opportunity to fill missing abstractions here. If the linear dim size is 128 and you don't insert a loop, then instead you could directly insert an "unrolled" form of 128 iterations of what you originally put in the body. Of course, in this form it is too unwieldy. One option would be to add an op equivalent to a higher-order `tensor.extract_slice` and/or `vector.transfer_read/write`: it maps some iterations space to a set of slices. That may be easier to further transform versus the loop form. christopherbate: So we need some sort of intermediate construct that handles the discontinuous iteration space.
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Yes, we will need new types and ops to handle the discontinuity and that intercepts gather/scatter/parallel_scatter and related abstractions. This is much longer-term though. `it maps some iterations space to a set of slices` -> yes the piecewise discontinuities are important to handle right, I am unclear yet what a good abstraction for this looks like but it will be a major focus or ours for the rest of the year. nicolasvasilache: Yes, we will need new types and ops to handle the discontinuity and that intercepts…
				ReifyRankedShapedTypeOpInterface reifyShapedTypeInterface =
				dyn_cast<ReifyRankedShapedTypeOpInterface>(op.getOperation());
				if (failed(reifyShapedTypeInterface.reifyResultShapes(b, reifiedShapes)))
				return failure();
				SmallVector<OpFoldResult> collapseShapeOutputShape =
				getAsOpFoldResult(reifiedShapes[0]);
				SmallVector<ReassociationIndices> reassociationIndices =
				op.getReassociationIndices();

				// Determine which of the CollapseShapeOp's result dimensions are sliced
				// and/or linearized.
				llvm::SmallBitVector linearizedDimensions =
				getLinearizedDimensions(reassociationIndices);
				llvm::SmallBitVector slicedDimensions =
				getSlicedDimensions(collapseShapeOutputShape, sliceParams);

				auto collapseShapeInputShape = getShapeDimSizes(b, op.getLoc(), op.getSrc());

				SmallVector<OpFoldResult> srcShape =
				getShapeDimSizes(b, op->getLoc(), op.getSrc());

				SmallVector<Value> tileSizes;
				for (unsigned i = 0; i < sliceParams.size(); i++) {
				if (slicedDimensions[i] && linearizedDimensions[i])
				tileSizes.push_back(
				getValueOrCreateConstantIndexOp(b, op.getLoc(), sliceParams[i].size));
				}

				return ExtractSliceFromCollapseHelper(
				nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: Plz reflow comment nicolasvasilache: nit: Plz reflow comment
				op, collapseShapeInputShape, collapseShapeOutputShape, sliceParams,
				nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: space after for nicolasvasilache: nit: space after for
				linearizedDimensions, slicedDimensions, tileSizes);
				nicolasvasilacheUnsubmitted Done Reply Inline Actions I think we can move to `SmallVector<OpFoldResult> lbs, ubs, steps;` thanks to recent additions that allow creating more folded expressions than before. nicolasvasilache: I think we can move to `SmallVector<OpFoldResult> lbs, ubs, steps;` thanks to recent additions…
				}

				std::pair<Value, SmallVector<Range>>
				tensor::ExtractSliceFromCollapseHelper::emitLoopNestBody(
				OpBuilder &builder, Location loc, ValueRange tileInductionVars) {
				// Create the helper class for forming the slice parameters.
				const SmallVector<ReassociationIndices> reassociationIndices =
				collapseShapeOp.getReassociationIndices();
				SliceFromCollapseHelper helper(reassociationIndices, collapseShapeInputShape,
				nicolasvasilacheUnsubmitted Done Reply Inline Actions You now have `makeComposedFoldedAffineApply` that takes OpFoldResult and does the right thing under the hood. nicolasvasilache: You now have `makeComposedFoldedAffineApply` that takes OpFoldResult and does the right thing…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions see, it's a helper :) nicolasvasilache: see, it's a helper :)
				collapseShapeOutputShape, sliceParams);

				// Get the indices of the tiled dims (linearized by the collapse_shape
				// and sliced by the extract_slice) invert the index spaces
				// transformations.
				SmallVector<ValueRange> multiIndices;
				unsigned loopIdx = 0;
				for (unsigned i = 0, e = linearizedDimensions.size(); i < e; i++) {
				if (linearizedDimensions[i] && slicedDimensions[i]) {
				DimAndIndex tb =
				invertSliceIndexing(builder, loc, sliceParams,
				std::make_tuple(i, tileInductionVars[loopIdx++]));
				multiIndices.push_back(invertCollapseShapeIndexing(
				builder, loc, reassociationIndices, collapseShapeInputShape, tb));
				}
				}

				auto extractParams = helper.getExtractSliceParams(multiIndices);

				Value subTileResult = builder.create<tensor::ExtractSliceOp>(
				loc, collapseShapeOp.getSrc(), extractParams);

				SmallVector<Range> insertParams =
				helper.getInsertSliceParams(tileInductionVars);

				// Collapse the dimensions of the source slice back down.
				Value collapsedResult = builder.create<tensor::CollapseShapeOp>(
				loc, subTileResult, reassociationIndices);
				return std::make_pair(collapsedResult, insertParams);
				}
				mravishankarUnsubmitted Done Reply Inline Actions I dont think you need to cast it to the interface op. The method is implemented on `tensor.extract_slice` itself. You should be able to call it directly. mravishankar: I dont think you need to cast it to the interface op. The method is implemented on `tensor.
				nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: values. nicolasvasilache: nit: values.
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Must the strides always be all 1s ? Or should you pass them as an extra argument too? nicolasvasilache: Must the strides always be all 1s ? Or should you pass them as an extra argument too?
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Why the difference with the above ? I personally prefer the mixed `s0 + d0 * s1` form everywhere if possible. nicolasvasilache: Why the difference with the above ? I personally prefer the mixed `s0 + d0 * s1` form…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: `llvm::append_range(extractOffsets, llvm::map_range())` may be a bit nicer. nicolasvasilache: nit: `llvm::append_range(extractOffsets, llvm::map_range())` may be a bit nicer.
				nicolasvasilacheUnsubmitted Done Reply Inline Actions seemslike this is the impl you want for your `getShapeDimSizes` above and then you can just reuse here ? nicolasvasilache: seemslike this is the impl you want for your `getShapeDimSizes` above and then you can just…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions I would also move this to the reshape utils for visibility, I anticipate this will become a more important use case over time. nicolasvasilache: I would also move this to the reshape utils for visibility, I anticipate this will become a…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions I'll comment on your other PR, but I would either: move this op to arith and keep the operands as they are, or keep it in tensor but then take an actual tensor as the basis and abstract away the fact that you get the Dims as a lowering detail. Unless you have a stronger argument for keeping the semantics as is that I don't see yet? nicolasvasilache: I'll comment on your other PR, but I would either: - move this op to arith and keep the…

mlir/lib/Dialect/Utils/CMakeLists.txt

	add_mlir_library(MLIRDialectUtils			add_mlir_library(MLIRDialectUtils
	IndexingUtils.cpp			IndexingUtils.cpp
	ReshapeOpsUtils.cpp			ReshapeOpsUtils.cpp
	StructuredOpsUtils.cpp			StructuredOpsUtils.cpp
	StaticValueUtils.cpp			StaticValueUtils.cpp

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRIR			MLIRIR
				MLIRViewLikeInterface
	)			)

mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp

//===- ReshapeOpsUtils.cpp - Utilities used by structured ops -------------===//		//===- ReshapeOpsUtils.cpp - Utilities used by structured ops -------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Utils/ReshapeOpsUtils.h"		#include "mlir/Dialect/Utils/ReshapeOpsUtils.h"

		#include "mlir/Dialect/Utils/StaticValueUtils.h"
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/IR/Builders.h"		#include "mlir/IR/Builders.h"
		#include "mlir/IR/OpDefinition.h"
		#include "mlir/Interfaces/ViewLikeInterface.h"

#include <numeric>		#include <numeric>

using namespace mlir;		using namespace mlir;

Optional<SmallVector<ReassociationIndices>>		Optional<SmallVector<ReassociationIndices>>
mlir::getReassociationIndicesForReshape(ShapedType sourceType,		mlir::getReassociationIndicesForReshape(ShapedType sourceType,
ShapedType targetType) {		ShapedType targetType) {
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	LogicalResult mlir::reshapeLikeShapesAreCompatible(
return success();		return success();
}		}

bool mlir::hasNonIdentityLayout(Type type) {		bool mlir::hasNonIdentityLayout(Type type) {
if (auto memrefType = type.dyn_cast<MemRefType>())		if (auto memrefType = type.dyn_cast<MemRefType>())
return !memrefType.getLayout().isIdentity();		return !memrefType.getLayout().isIdentity();
return false;		return false;
}		}

		llvm::SmallBitVector
		mlir::getSlicedDimensions(ArrayRef<OpFoldResult> sliceInputShape,
		ArrayRef<Range> sliceParams) {
		assert(sliceParams.size() == sliceInputShape.size() &&
		"only supports non rank-reducing case");
		llvm::SmallBitVector mask(sliceInputShape.size());
		unsigned idx = 0;
		for (const auto &[offset, size, stride] : sliceParams) {
		Optional<int64_t> offsetConst = getConstantIntValue(offset);
		Optional<int64_t> strideConst = getConstantIntValue(stride);
		mask[idx] = !isEqualConstantIntOrValue(size, sliceInputShape[idx]) \|\|
		(!strideConst \|\| *strideConst != 1) \|\|
		(!offsetConst \|\| *offsetConst != 0);
		idx++;
		}
		return mask;
		}

		llvm::SmallBitVector mlir::getLinearizedDimensions(
		ArrayRef<ReassociationIndices> reassociationIndices) {
		llvm::SmallBitVector result(reassociationIndices.size());
		for (const auto &it : llvm::enumerate(reassociationIndices))
		result[it.index()] = it.value().size() > 1;
		return result;
		}

		SmallVector<Range> SliceFromCollapseHelper::getExtractSliceParams(
		ArrayRef<ValueRange> multiIndices) {
		assert(!multiIndices.empty() && !multiIndices[0].empty() &&
		"multiIndices should not be empty");
		unsigned loopIdx = 0;
		MLIRContext *ctx = multiIndices[0][0].getContext();
		auto oneAttr = IntegerAttr::get(IndexType::get(ctx), 1);
		auto zeroAttr = IntegerAttr::get(IndexType::get(ctx), 0);
		SmallVector<Range> offsetsSizesAndStrides;
		offsetsSizesAndStrides.reserve(collapseShapeInputShape.size());
		for (const auto &it : llvm::enumerate(reassociationIndices)) {
		// Case 1: Linearized dimensions that have also been sliced. These
		// are size of 1 because we are iterating over these dimensions. The
		// offsets are exactly the de-linearized multi-indices.
		if (slicedDimensions[it.index()] && linearizedDimensions[it.index()]) {
		llvm::append_range(
		offsetsSizesAndStrides,
		llvm::map_range(multiIndices[loopIdx++], [&](Value v) -> Range {
		return Range{getAsOpFoldResult(v), oneAttr, oneAttr};
		}));
		continue;
		}

		// Case 2: One or possibly multiple combined input dimensions, but we
		// have proven that these are not sliced. In this case we just take
		// the full extent of each dimension in the reassociation list.
		if (linearizedDimensions[it.index()]) {
		llvm::append_range(
		offsetsSizesAndStrides,
		llvm::map_range(it.value(), [&](int64_t idx) -> Range {
		return {zeroAttr, collapseShapeInputShape[idx], oneAttr};
		}));
		continue;
		}

		// Case 3: A single index, but it may be sliced.
		offsetsSizesAndStrides.push_back(sliceParams[it.index()]);
		}
		return offsetsSizesAndStrides;
		}

		SmallVector<Range>
		SliceFromCollapseHelper::getInsertSliceParams(ValueRange tileIndices) {
		MLIRContext *ctx = tileIndices[0].getContext();
		auto one = IntegerAttr::get(IndexType::get(ctx), 1);
		auto zero = IntegerAttr::get(IndexType::get(ctx), 0);
		SmallVector<Range> insertParams;
		insertParams.reserve(linearizedDimensions.size());
		unsigned loopIdx = 0;
		for (unsigned i = 0; i < linearizedDimensions.size(); i++) {
		if (linearizedDimensions[i] && slicedDimensions[i]) {
		insertParams.push_back(Range{tileIndices[loopIdx++], one, one});
		continue;
		}
		insertParams.push_back(Range{zero, sliceParams[i].size, one});
		}
		return insertParams;
		}

mlir/lib/Interfaces/ViewLikeInterface.cpp

	Show All 11 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// ViewLike Interfaces			// ViewLike Interfaces
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Include the definitions of the loop-like interfaces.			/// Include the definitions of the loop-like interfaces.
	#include "mlir/Interfaces/ViewLikeInterface.cpp.inc"			#include "mlir/Interfaces/ViewLikeInterface.cpp.inc"

				std::tuple<SmallVector<OpFoldResult>, SmallVector<OpFoldResult>,
				SmallVector<OpFoldResult>>
				mlir::getOffsetsSizesAndStrides(ArrayRef<Range> ranges) {
				SmallVector<OpFoldResult> offsets, sizes, strides;
				offsets.reserve(ranges.size());
				sizes.reserve(ranges.size());
				strides.reserve(ranges.size());
				for (const auto &[offset, size, stride] : ranges) {
				offsets.push_back(offset);
				sizes.push_back(size);
				strides.push_back(stride);
				}
				return std::make_tuple(offsets, sizes, strides);
				}

	LogicalResult mlir::verifyListOfOperandsOrIntegers(			LogicalResult mlir::verifyListOfOperandsOrIntegers(
	Operation *op, StringRef name, unsigned numElements, ArrayAttr attr,			Operation *op, StringRef name, unsigned numElements, ArrayAttr attr,
	ValueRange values, llvm::function_ref<bool(int64_t)> isDynamic) {			ValueRange values, llvm::function_ref<bool(int64_t)> isDynamic) {
	/// Check static and dynamic offsets/sizes/strides does not overflow type.			/// Check static and dynamic offsets/sizes/strides does not overflow type.
	if (attr.size() != numElements)			if (attr.size() != numElements)
	return op->emitError("expected ")			return op->emitError("expected ")
	<< numElements << " " << name << " values";			<< numElements << " " << name << " values";
	unsigned expectedNumDynamicEntries =			unsigned expectedNumDynamicEntries =
	▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

mlir/test/Dialect/Tensor/extract-slice-from-collapse-shape.mlir

This file was added.

				// RUN: mlir-opt -split-input-file -test-tensor-transform-patterns=test-rewrite-extract-slice-from-collapse-shape %s \| FileCheck %s
				// RUN: mlir-opt -split-input-file -test-tensor-transform-patterns="test-rewrite-extract-slice-from-collapse-shape use-foreach" %s \| FileCheck %s --check-prefix=FOREACH

				func.func @extract_slice_static(%input: tensor<3x5x7x11xf32>) -> tensor<20x11xf32> {
				%collapsed = tensor.collapse_shape %input [[0, 1, 2], [3]] : tensor<3x5x7x11xf32> into tensor<105x11xf32>
				%slice = tensor.extract_slice %collapsed [0, 0] [20, 11] [1, 1] : tensor<105x11xf32> to tensor<20x11xf32>
				return %slice : tensor<20x11xf32>
				}

				// CHECK: func.func @extract_slice_static(%[[arg0:.+]]:
				// CHECK-DAG: %[[c0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[c20:.+]] = arith.constant 20 : index
				// CHECK-DAG: %[[c1:.+]] = arith.constant 1 : index
				// CHECK-DAG: %[[c3:.+]] = arith.constant 3 : index
				// CHECK-DAG: %[[c5:.+]] = arith.constant 5 : index
				// CHECK-DAG: %[[c7:.+]] = arith.constant 7 : index
				// CHECK-DAG: %[[init:.+]] = linalg.init_tensor [20, 11] :
				// CHECK-DAG: %[[tile:.+]] = scf.for %[[iv:.+]] = %[[c0]] to %[[c20]] step %[[c1]] iter_args(%[[iterArg:.+]] = %[[init]])
				// CHECK: %[[multiIndex:.+]]:3 = affine.delinearize_index %[[iv]] into (%[[c3]], %[[c5]], %[[c7]]
				// CHECK: %[[slice:.+]] = tensor.extract_slice %[[arg0]][%[[multiIndex]]#0, %[[multiIndex]]#1, %[[multiIndex]]#2, 0] [1, 1, 1, 11] [1, 1, 1, 1] :
				// CHECK: %[[sliceFlat:.+]] = tensor.collapse_shape %[[slice]] {{\[}}[0, 1, 2], [3]{{\]}} :
				// CHECK: %[[update:.+]] = tensor.insert_slice %[[sliceFlat]] into %[[iterArg]][%[[iv]], 0] [1, 11] [1, 1] :
				// CHECK: scf.yield %[[update]] :
				// CHECK: return %[[tile]]

				// FOREACH: func.func @extract_slice_static(%[[arg0:.+]]:
				// FOREACH-DAG: %[[c20:.+]] = arith.constant 20 : index
				// FOREACH-DAG: %[[c3:.+]] = arith.constant 3 : index
				// FOREACH-DAG: %[[c5:.+]] = arith.constant 5 : index
				// FOREACH-DAG: %[[c7:.+]] = arith.constant 7 : index
				// FOREACH-DAG: %[[init:.+]] = linalg.init_tensor [20, 11] :
				// FOREACH-DAG: %[[tile:.+]] = scf.foreach_thread (%[[iv:.+]]) in (%[[c20]])
				// FOREACH: %[[multiIndex:.+]]:3 = affine.delinearize_index %[[iv]] into (%[[c3]], %[[c5]], %[[c7]]
				// FOREACH: %[[slice:.+]] = tensor.extract_slice %[[arg0]][%[[multiIndex]]#0, %[[multiIndex]]#1, %[[multiIndex]]#2, 0] [1, 1, 1, 11] [1, 1, 1, 1] :
				// FOREACH: %[[sliceFlat:.+]] = tensor.collapse_shape %[[slice]] {{\[}}[0, 1, 2], [3]{{\]}} :
				// FOREACH: perform_concurrently
				// FOREACH-NEXT: tensor.parallel_insert_slice %[[sliceFlat]] into %[[init]][%[[iv]], 0] [1, 11] [1, 1] :
				// FOREACH: return %[[tile]]

				// -----


				func.func @extract_slice_static_strided(%input: tensor<3x5x7x11xf32>) -> tensor<10x5xf32> {
				%collapsed = tensor.collapse_shape %input [[0, 1, 2], [3]] : tensor<3x5x7x11xf32> into tensor<105x11xf32>
				%slice = tensor.extract_slice %collapsed [13, 0] [10, 5] [2, 2] : tensor<105x11xf32> to tensor<10x5xf32>
				return %slice : tensor<10x5xf32>
				}

				// CHECK: #[[$map0:.+]] = affine_map<(d0) -> (d0 * 2 + 13)>
				// CHECK: func.func @extract_slice_static_strided(%[[arg0:.+]]:
				// CHECK-DAG: %[[c0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[c1:.+]] = arith.constant 1 : index
				// CHECK-DAG: %[[c10:.+]] = arith.constant 10 : index
				// CHECK-DAG: %[[c3:.+]] = arith.constant 3 : index
				// CHECK-DAG: %[[c5:.+]] = arith.constant 5 : index
				// CHECK-DAG: %[[c7:.+]] = arith.constant 7 : index
				// CHECK: %[[init:.+]] = linalg.init_tensor [10, 5] :
				// CHECK: %[[tile:.+]] = scf.for %[[iv:.+]] = %[[c0]] to %[[c10]] step %[[c1]] iter_args(%[[iterArg:.+]] = %[[init]])
				// CHECK: %[[inputIv:.+]] = affine.apply #[[$map0]](%[[iv]])
				// CHECK: %[[multiIndex:.+]]:3 = affine.delinearize_index %[[inputIv]] into (%[[c3]], %[[c5]], %[[c7]]
				// CHECK: %[[slice:.+]] = tensor.extract_slice %[[arg0]][%[[multiIndex]]#0, %[[multiIndex]]#1, %[[multiIndex]]#2, 0] [1, 1, 1, 5] [1, 1, 1, 2] :
				// CHECK: %[[sliceFlat:.+]] = tensor.collapse_shape %[[slice]] {{\[}}[0, 1, 2], [3]{{\]}} :
				// CHECK: %[[update:.+]] = tensor.insert_slice %[[sliceFlat]] into %[[iterArg]][%[[iv]], 0] [1, 5] [1, 1] :
				// CHECK: scf.yield %[[update]] :
				// CHECK: return %[[tile]]


				// -----


				func.func @extract_slice_dynamic(%input: tensor<3x?x?x11xf32>, %offt: index, %size: index) -> tensor<?x5xf32> {
				%collapsed = tensor.collapse_shape %input [[0, 1, 2], [3]] : tensor<3x?x?x11xf32> into tensor<?x11xf32>
				%slice = tensor.extract_slice %collapsed [%offt, 0] [%size, 5] [2, 2] : tensor<?x11xf32> to tensor<?x5xf32>
				return %slice : tensor<?x5xf32>
				}

				// CHECK: #[[map0:.+]] = affine_map<(d0)[s0] -> (d0 * 2 + s0)>
				// CHECK: func.func @extract_slice_dynamic(%[[arg0:.+]]: tensor<{{.*}}>, %[[lb:.+]]: index, %[[sz:.+]]: index)
				// CHECK-DAG: %[[c0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[c1:.+]] = arith.constant 1 : index
				// CHECK-DAG: %[[c2:.+]] = arith.constant 2 : index
				// CHECK-DAG: %[[c3:.+]] = arith.constant 3 : index
				// CHECK: %[[init:.+]] = linalg.init_tensor [%[[sz]], 5] : tensor<?x5xf32>
				// CHECK-DAG: %[[d1:.+]] = tensor.dim %arg0, %[[c1]] : tensor<3x?x?x11xf32>
				// CHECK-DAG: %[[d2:.+]] = tensor.dim %arg0, %[[c2]] : tensor<3x?x?x11xf32>
				// CHECK: %[[tile:.+]] = scf.for %[[iv:.+]] = %[[c0]] to %[[sz]] step %[[c1]] iter_args(%[[iterArg:.+]] = %[[init]])
				// CHECK: %[[inputIv:.+]] = affine.apply #[[map0]](%[[iv]])[%[[lb]]]
				// CHECK: %[[multiIndex:.+]]:3 = affine.delinearize_index %[[inputIv]] into (%[[c3]], %[[d1]], %[[d2]]) :
				// CHECK: %[[slice:.+]] = tensor.extract_slice %[[arg0]][%[[multiIndex]]#0, %[[multiIndex]]#1, %[[multiIndex]]#2, 0] [1, 1, 1, 5] [1, 1, 1, 2] :
				// CHECK: %[[sliceFlat:.+]] = tensor.collapse_shape %[[slice]] {{\[}}[0, 1, 2], [3]{{\]}} :
				// CHECK: %[[update:.+]] = tensor.insert_slice %[[sliceFlat]] into %[[iterArg]][%[[iv]], 0] [1, 5] [1, 1] :
				// CHECK: scf.yield %[[update]] :
				// CHECK: return %[[tile]] :

				// -----


				func.func @extract_slice_dynamic_multidim(%input: tensor<3x?x?x11x?xf32>, %offt0: index, %size0: index, %offt1: index, %size1: index) -> tensor<?x?xf32> {
				%collapsed = tensor.collapse_shape %input [[0, 1, 2], [3, 4]] : tensor<3x?x?x11x?xf32> into tensor<?x?xf32>
				%slice = tensor.extract_slice %collapsed [%offt0, %offt1] [%size0, %size1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
				return %slice : tensor<?x?xf32>
				}

				// CHECK: #[[map0:.+]] = affine_map<(d0)[s0] -> (d0 + s0)>
				// CHECK: func.func @extract_slice_dynamic_multidim(%[[arg0:.+]]: tensor<3x?x?x11x?xf32>, %[[lb1:.+]]: index, %[[sz1:.+]]: index, %[[lb2:.+]]: index, %[[sz2:.+]]: index)
				// CHECK-DAG: %[[c0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[c1:.+]] = arith.constant 1 : index
				// CHECK-DAG: %[[c2:.+]] = arith.constant 2 : index
				// CHECK-DAG: %[[c3:.+]] = arith.constant 3 : index
				// CHECK-DAG: %[[c4:.+]] = arith.constant 4 : index
				// CHECK-DAG: %[[c11:.+]] = arith.constant 11 : index
				// CHECK: %[[init:.+]] = linalg.init_tensor [%[[sz1]], %[[sz2]]] : tensor<?x?xf32>
				// CHECK-DAG: %[[d1:.+]] = tensor.dim %[[arg0]], %[[c1]] :
				// CHECK-DAG: %[[d2:.+]] = tensor.dim %[[arg0]], %[[c2]] :
				// CHECK-DAG: %[[d4:.+]] = tensor.dim %[[arg0]], %[[c4]] :
				// CHECK: %[[tile1:.+]] = scf.for %[[iv1:.+]] = %[[c0]] to %[[sz1]] step %[[c1]] iter_args(%[[iterArg1:.+]] = %[[init]])
				// CHECK: %[[tile2:.+]] = scf.for %[[iv2:.+]] = %[[c0]] to %[[sz2]] step %[[c1]] iter_args(%[[iterArg2:.+]] = %[[iterArg1]])
				// CHECK: %[[inputIv1:.+]] = affine.apply #[[map0:.+]](%[[iv1]])[%[[lb1]]]
				// CHECK: %[[multiIndex1:.+]]:3 = affine.delinearize_index %[[inputIv1]] into (%[[c3]], %[[d1]], %[[d2]]) :
				// CHECK: %[[inputIv2:.+]] = affine.apply #[[map0:.+]](%[[iv2]])[%[[lb2]]]
				// CHECK: %[[multiIndex2:.+]]:2 = affine.delinearize_index %[[inputIv2]] into (%[[c11]], %[[d4]]) :
				// CHECK: %[[slice:.+]] = tensor.extract_slice %[[arg0]][%[[multiIndex1]]#0, %[[multiIndex1]]#1, %[[multiIndex1]]#2, %[[multiIndex2]]#0, %[[multiIndex2]]#1] [1, 1, 1, 1, 1] [1, 1, 1, 1, 1] :
				// CHECK: %[[sliceFlat:.+]] = tensor.collapse_shape %[[slice]] {{\[}}[0, 1, 2], [3, 4]{{\]}} :
				// CHECK: %[[update:.+]] = tensor.insert_slice %[[sliceFlat]] into %[[iterArg2]][%[[iv1]], %[[iv2]]] [1, 1] [1, 1] :
				// CHECK: scf.yield %[[update]] :
				// CHECK: scf.yield %[[tile2]] :
				// CHECK: return %[[tile1]] :

				// FOREACH: #[[map1:.+]] = affine_map<(d0)[s0] -> (d0 + s0)>
				// FOREACH: func.func @extract_slice_dynamic_multidim(%[[arg0:.+]]: tensor<3x?x?x11x?xf32>, %[[lb1:.+]]: index, %[[sz1:.+]]: index, %[[lb2:.+]]: index, %[[sz2:.+]]: index)
				// FOREACH-DAG: %[[c1:.+]] = arith.constant 1 : index
				// FOREACH-DAG: %[[c2:.+]] = arith.constant 2 : index
				// FOREACH-DAG: %[[c3:.+]] = arith.constant 3 : index
				// FOREACH-DAG: %[[c4:.+]] = arith.constant 4 : index
				// FOREACH-DAG: %[[c11:.+]] = arith.constant 11 : index
				// FOREACH: %[[init:.+]] = linalg.init_tensor [%[[sz1]], %[[sz2]]] : tensor<?x?xf32>
				// FOREACH-DAG: %[[d1:.+]] = tensor.dim %[[arg0]], %[[c1]] :
				// FOREACH-DAG: %[[d2:.+]] = tensor.dim %[[arg0]], %[[c2]] :
				// FOREACH-DAG: %[[d4:.+]] = tensor.dim %[[arg0]], %[[c4]] :
				// FOREACH: %[[tile1:.+]] = scf.foreach_thread (%[[tid1:.+]], %[[tid2:.+]]) in (%[[sz1]], %[[sz2]])
				// FOREACH-DAG: %[[iv1:.+]] = affine.apply #[[map1]](%[[tid1]])[%[[lb1]]]
				// FOREACH: %[[multiIndex1:.+]]:3 = affine.delinearize_index %[[iv1]] into (%[[c3]], %[[d1]], %[[d2]]) :
				// FOREACH-DAG: %[[iv2:.+]] = affine.apply #[[map1]](%[[tid2]])[%[[lb2]]]
				// FOREACH: %[[multiIndex2:.+]]:2 = affine.delinearize_index %[[iv2]] into (%[[c11]], %[[d4]]) :
				// FOREACH: %[[slice:.+]] = tensor.extract_slice %[[arg0]][%[[multiIndex1]]#0, %[[multiIndex1]]#1, %[[multiIndex1]]#2, %[[multiIndex2]]#0, %[[multiIndex2]]#1] [1, 1, 1, 1, 1] [1, 1, 1, 1, 1] :
				// FOREACH: %[[sliceFlat:.+]] = tensor.collapse_shape %[[slice]] {{\[}}[0, 1, 2], [3, 4]{{\]}} :
				// FOREACH: perform_concurrently
				//FOREACH-NEXT: tensor.parallel_insert_slice %[[sliceFlat]] into %[[init]][%[[tid1]], %[[tid2]]] [1, 1] [1, 1] :

				// -----

				// Verifies that a linearized dimension that is not sliced does not generate a loop. Note that this
				// only works for static shapes.

				// CHECK: @extract_slice_non_sliced_linearized_dim(%[[arg0:.+]]: tensor<{{.*}}>,
				func.func @extract_slice_non_sliced_linearized_dim(%input: tensor<3x?x?x11x2xf32>, %offt: index, %size: index) -> tensor<?x22xf32> {
				%collapsed = tensor.collapse_shape %input [[0, 1, 2], [3, 4]] : tensor<3x?x?x11x2xf32> into tensor<?x22xf32>
				%slice = tensor.extract_slice %collapsed [%offt, 0] [%size, 22] [1, 1] : tensor<?x22xf32> to tensor<?x22xf32>
				// CHECK: scf.for
				// CHECK-NOT: scf.for
				// CHECK: %[[multiIndex:.+]]:3 = affine.delinearize_index
				// CHECK: tensor.extract_slice %[[arg0]][%[[multiIndex]]#0, %[[multiIndex]]#1, %[[multiIndex]]#2, 0, 0] [1, 1, 1, 11, 2] [1, 1, 1, 1, 1]
				return %slice : tensor<?x22xf32>
				}

mlir/test/lib/Dialect/Tensor/CMakeLists.txt

	# Exclude tests from libMLIR.so			# Exclude tests from libMLIR.so
	add_mlir_library(MLIRTensorTestPasses			add_mlir_library(MLIRTensorTestPasses
	TestTensorTransforms.cpp			TestTensorTransforms.cpp

	EXCLUDE_FROM_LIBMLIR			EXCLUDE_FROM_LIBMLIR

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRArithmeticDialect			MLIRArithmeticDialect
				MLIRLinalgDialect
	MLIRPass			MLIRPass
	MLIRSCFDialect			MLIRSCFDialect
	MLIRTensorDialect			MLIRTensorDialect
	MLIRTensorTransforms			MLIRTensorTransforms
	MLIRTransforms			MLIRTransforms
	)			)

mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp

//===- TestTensorTransforms.cpp - Test Tensor transformation patterns -----===//		//===- TestTensorTransforms.cpp - Test Tensor transformation patterns -----===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements logic for testing Tensor transformations.		// This file implements logic for testing Tensor transformations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"		#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/SCF/IR/SCF.h"		#include "mlir/Dialect/SCF/IR/SCF.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
		#include "mlir/Dialect/Tensor/Transforms/TransformUtils.h"
#include "mlir/Dialect/Tensor/Transforms/Transforms.h"		#include "mlir/Dialect/Tensor/Transforms/Transforms.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

using namespace mlir;		using namespace mlir;

namespace {		namespace {
struct TestTensorTransforms		struct TestTensorTransforms
: public PassWrapper<TestTensorTransforms, OperationPass<>> {		: public PassWrapper<TestTensorTransforms, OperationPass<>> {
MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(TestTensorTransforms)		MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(TestTensorTransforms)

TestTensorTransforms() = default;		TestTensorTransforms() = default;
TestTensorTransforms(const TestTensorTransforms &pass) : PassWrapper(pass) {}		TestTensorTransforms(const TestTensorTransforms &pass) : PassWrapper(pass) {}

void getDependentDialects(DialectRegistry &registry) const override {		void getDependentDialects(DialectRegistry &registry) const override {
registry.insert<arith::ArithmeticDialect, scf::SCFDialect>();		registry.insert<arith::ArithmeticDialect, scf::SCFDialect,
		linalg::LinalgDialect>();
}		}

StringRef getArgument() const final {		StringRef getArgument() const final {
return "test-tensor-transform-patterns";		return "test-tensor-transform-patterns";
}		}
StringRef getDescription() const final {		StringRef getDescription() const final {
return "Test Tensor transformation patterns by applying them greedily.";		return "Test Tensor transformation patterns by applying them greedily.";
}		}

void runOnOperation() override;		void runOnOperation() override;

Option<bool> testSplitPaddingPatterns{		Option<bool> testSplitPaddingPatterns{
*this, "test-split-padding-patterns",		*this, "test-split-padding-patterns",
llvm::cl::desc("Test patterns to split tensor.pad ops"),		llvm::cl::desc("Test patterns to split tensor.pad ops"),
llvm::cl::init(false)};		llvm::cl::init(false)};

Option<bool> testFoldConstantExtractSlice{		Option<bool> testFoldConstantExtractSlice{
*this, "test-fold-constant-extract-slice",		*this, "test-fold-constant-extract-slice",
llvm::cl::desc("Test folding arith.constant and tensor.extract_slice"),		llvm::cl::desc("Test folding arith.constant and tensor.extract_slice"),
llvm::cl::init(false)};		llvm::cl::init(false)};

		Option<bool> testRewriteExtractSliceWithTiledCollapseShape{
		*this, "test-rewrite-extract-slice-from-collapse-shape",
		llvm::cl::desc("Test swapping tensor.extract_slice of a collapse_shape "
		"with loop nest"),
		llvm::cl::init(false)};

		Option<bool> useForeach{
		*this, "use-foreach",
		llvm::cl::desc(
		"Use the scf.foreach_thread operation when generating loop nests for "
		"the extract_slice of collapse_shape pattern"),
		llvm::cl::init(false)};
};		};
} // namespace		} // namespace

static void applySplitPaddingPatterns(Operation *rootOp) {		static void applySplitPaddingPatterns(Operation *rootOp) {
RewritePatternSet patterns(rootOp->getContext());		RewritePatternSet patterns(rootOp->getContext());
tensor::populateSplitPaddingPatterns(patterns);		tensor::populateSplitPaddingPatterns(patterns);
(void)applyPatternsAndFoldGreedily(rootOp, std::move(patterns));		(void)applyPatternsAndFoldGreedily(rootOp, std::move(patterns));
}		}
Show All 9 Lines	tensor::ControlConstantExtractSliceFusionFn controlFn =
constexpr int64_t kConstantFoldingMaxNumElements = 1024;		constexpr int64_t kConstantFoldingMaxNumElements = 1024;
return resultType.getNumElements() <= kConstantFoldingMaxNumElements;		return resultType.getNumElements() <= kConstantFoldingMaxNumElements;
};		};

tensor::populateFoldConstantExtractSlicePatterns(patterns, controlFn);		tensor::populateFoldConstantExtractSlicePatterns(patterns, controlFn);
(void)applyPatternsAndFoldGreedily(rootOp, std::move(patterns));		(void)applyPatternsAndFoldGreedily(rootOp, std::move(patterns));
}		}

		namespace {
		/// Base pattern to rewrite a `tensor.collapse_shape -> tensor.extract_slice`.
		/// The `tensor.extract_slice` is replaced by a loop or gather operation that
		/// stitches together the desired tile from slices of the source of the collapse
		/// shape op.
		struct RewriteExtractSliceFromCollapseShapeBase
		: public OpRewritePattern<tensor::ExtractSliceOp> {
		RewriteExtractSliceFromCollapseShapeBase(MLIRContext *context)
		: mlir::OpRewritePattern<tensor::ExtractSliceOp>(context) {}

		/// Emit a loop or gather operation that uses `helper` to take each point in
		/// the parallel iteration space bounds, extract a slice from the source
		/// tensor and insert it into `dest`. For examples, see below for `scf.for`
		/// and `scf.foreach`.
		virtual LogicalResult
		emitReplacement(tensor::ExtractSliceOp op, Value dest,
		tensor::ExtractSliceFromCollapseHelper &helper,
		PatternRewriter &rewriter) const = 0;

		LogicalResult matchAndRewrite(tensor::ExtractSliceOp op,
		PatternRewriter &rewriter) const override {
		auto collapseOp = op.getSource().getDefiningOp<tensor::CollapseShapeOp>();
		if (!collapseOp)
		return rewriter.notifyMatchFailure(
		op, "producer is not a tensor.collapse_shape op");

		// Materialize the output shape values of the slice operation.a
		ReifiedRankedShapedTypeDims reifiedShapes;
		if (failed(op.reifyResultShapes(rewriter, reifiedShapes)))
		return rewriter.notifyMatchFailure(op, "failed to reify result shapes");

		// Create the destination tensor using the above values.
		Type elementType = op.getSourceType().getElementType();
		SmallVector<OpFoldResult> outputShape = getAsOpFoldResult(reifiedShapes[0]);
		Value dest = rewriter.create<linalg::InitTensorOp>(
		op->getLoc(), outputShape, elementType);

		// Calculate the parameters for the tile loop nest.
		FailureOr<tensor::ExtractSliceFromCollapseHelper> params =
		tensor::ExtractSliceFromCollapseHelper::create(rewriter, collapseOp,
		op);
		if (failed(params))
		return rewriter.notifyMatchFailure(
		op, "could not calculate tiling parameters");
		return emitReplacement(op, dest, *params, rewriter);
		}
		};

		struct RewriteExtractSliceFromCollapseShapeUsingScfFor
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Fine for a test but if it want to be hoisted, this should prob be 1 pattern per loop op type. Maybe add a TODO? nicolasvasilache: Fine for a test but if it want to be hoisted, this should prob be 1 pattern per loop op type.
		christopherbateAuthorUnsubmitted Done Reply Inline Actions I broke it into two patterns. christopherbate: I broke it into two patterns.
		: public RewriteExtractSliceFromCollapseShapeBase {
		RewriteExtractSliceFromCollapseShapeUsingScfFor(MLIRContext *context)
		: RewriteExtractSliceFromCollapseShapeBase(context) {}
		LogicalResult emitReplacement(tensor::ExtractSliceOp op, Value dest,
		tensor::ExtractSliceFromCollapseHelper &helper,
		PatternRewriter &rewriter) const override {
		Location loc = op.getLoc();
		const unsigned numTiledDims = helper.getIterationSpaceSizes().size();
		auto zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);
		auto one = rewriter.create<arith::ConstantIndexOp>(loc, 1);
		SmallVector<Value> lbs(numTiledDims, zero);
		SmallVector<Value> steps(numTiledDims, one);
		scf::LoopNest nest = scf::buildLoopNest(
		rewriter, loc, lbs, helper.getIterationSpaceSizes(), steps, dest,
		[&](OpBuilder &nestedBuilder, Location loc, ValueRange outputIvs,
		ValueRange iterArgs) -> scf::ValueVector {
		auto [tile, insertParams] =
		helper.emitLoopNestBody(nestedBuilder, loc, outputIvs);

		// Insert the slice into the destination.
		Value result = nestedBuilder.create<tensor::InsertSliceOp>(
		loc, tile, iterArgs[0], insertParams);
		return {result};
		});
		rewriter.replaceOp(op, nest.getResults()[0]);
		return success();
		}
		};

		struct RewriteExtractSliceFromCollapseShapeUsingScfForeach
		: public RewriteExtractSliceFromCollapseShapeBase {
		RewriteExtractSliceFromCollapseShapeUsingScfForeach(MLIRContext *context)
		: RewriteExtractSliceFromCollapseShapeBase(context) {}
		LogicalResult emitReplacement(tensor::ExtractSliceOp op, Value dest,
		tensor::ExtractSliceFromCollapseHelper &helper,
		PatternRewriter &rewriter) const override {
		Location loc = op.getLoc();
		auto foreachOp = rewriter.create<scf::ForeachThreadOp>(
		loc, /numThreads=/helper.getIterationSpaceSizes(),
		/threadDimMapping=/ArrayRef<int64_t>{},
		[&](OpBuilder &nestedBuilder, Location loc, ValueRange outputIvs) {
		auto [tile, insertParams] =
		helper.emitLoopNestBody(nestedBuilder, loc, outputIvs);
		// Insert the slice into the destination.
		auto term = nestedBuilder.create<scf::PerformConcurrentlyOp>(loc);
		nestedBuilder.setInsertionPointToStart(term.getBody());
		nestedBuilder.create<tensor::ParallelInsertSliceOp>(loc, tile, dest,
		insertParams);
		});
		rewriter.replaceOp(op, foreachOp->getResult(0));
		return success();
		}
		};
		} // namespace

		static LogicalResult
		applyRewriteExtractFromCollapseShapePatterns(Operation *rootOp,
		bool useForeach) {
		RewritePatternSet patterns(rootOp->getContext());
		if (useForeach)
		patterns.add<RewriteExtractSliceFromCollapseShapeUsingScfForeach>(
		rootOp->getContext());
		else
		patterns.add<RewriteExtractSliceFromCollapseShapeUsingScfFor>(
		rootOp->getContext());
		return applyPatternsAndFoldGreedily(rootOp, std::move(patterns));
		}

void TestTensorTransforms::runOnOperation() {		void TestTensorTransforms::runOnOperation() {
Operation *rootOp = getOperation();		Operation *rootOp = getOperation();
if (testSplitPaddingPatterns)		if (testSplitPaddingPatterns)
applySplitPaddingPatterns(rootOp);		applySplitPaddingPatterns(rootOp);
if (testFoldConstantExtractSlice)		if (testFoldConstantExtractSlice)
applyFoldConstantExtractSlicePatterns(rootOp);		applyFoldConstantExtractSlicePatterns(rootOp);
		if (testRewriteExtractSliceWithTiledCollapseShape) {
		if (failed(
		applyRewriteExtractFromCollapseShapePatterns(rootOp, useForeach)))
		return signalPassFailure();
		}
}		}

namespace mlir {		namespace mlir {
namespace test {		namespace test {
void registerTestTensorTransforms() {		void registerTestTensorTransforms() {
PassRegistration<TestTensorTransforms>();		PassRegistration<TestTensorTransforms>();
}		}
} // namespace test		} // namespace test
} // namespace mlir		} // namespace mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape`ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 457155

mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td

mlir/include/mlir/Dialect/Tensor/Transforms/TransformUtils.h

mlir/include/mlir/Dialect/Utils/ReshapeOpsUtils.h

mlir/include/mlir/Interfaces/ViewLikeInterface.h

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

mlir/lib/Dialect/Tensor/Transforms/CMakeLists.txt

mlir/lib/Dialect/Tensor/Transforms/ExtractSliceFromReshape.cpp

mlir/lib/Dialect/Utils/CMakeLists.txt

mlir/lib/Dialect/Utils/ReshapeOpsUtils.cpp

mlir/lib/Interfaces/ViewLikeInterface.cpp

mlir/test/Dialect/Tensor/extract-slice-from-collapse-shape.mlir

mlir/test/lib/Dialect/Tensor/CMakeLists.txt

mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp

[mlir][Tensor] Add rewrites to extract slices through `tensor.collape_shape`
ClosedPublic