Page MenuHomePhabricator

[mlir] introduce multi-sized tiling transformation
Needs ReviewPublic

Authored by ftynse on Jun 23 2022, 7:23 AM.

Details

Summary

Introduce a new transformation on structured ops that tiles the iteration space
using two different tile sizes after splitting it into two parts. The tile
sizes are computed dynamically to be less than some target value and so that
both parts only contain full tiles, and may be required to be divisble by a
certain (constant) value. This provides an alternative to padding and peeling
when the iteration space is not perfectly divisble by the target tile size,
with better guarantees on memory footprint.

Diff Detail

Event Timeline

ftynse created this revision.Jun 23 2022, 7:23 AM
ftynse requested review of this revision.Jun 23 2022, 7:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 7:23 AM

High level comment, could we use TilingInterface for this. The advantage is that you don't need to couple this to the looping construct. I hope to purge the implicit coupling of scf within the implementation of tiling in Linalg. I found that separating the looping constructs from the tiling implementation makes things much simpler. It might help here as well

High level comment, could we use TilingInterface for this. The advantage is that you don't need to couple this to the looping construct. I hope to purge the implicit coupling of scf within the implementation of tiling in Linalg. I found that separating the looping constructs from the tiling implementation makes things much simpler. It might help here as well

I looked at the interface and it seems that it would be more involved to implement this over the interface rather than here because of the exponentially expanding loop structure. So it would be preferable to me to have the actual tiling approach reviewed and exercised first, before eventually porting it to the interface in a separate commit.

High level comment, could we use TilingInterface for this. The advantage is that you don't need to couple this to the looping construct. I hope to purge the implicit coupling of scf within the implementation of tiling in Linalg. I found that separating the looping constructs from the tiling implementation makes things much simpler. It might help here as well

I looked at the interface and it seems that it would be more involved to implement this over the interface rather than here because of the exponentially expanding loop structure. So it would be preferable to me to have the actual tiling approach reviewed and exercised first, before eventually porting it to the interface in a separate commit.

+1 we discussed offline and a separate commit is fine.

I wonder if there is an opportunity to split this PR in a few smaller and more composable pieces.

In particular, I would have a use of a general mechanism that splits Linalg ops into smalller Linalg ops that are known to be multiples of X and Y along each dimension while still remaining in Linalg land.
This would compose naturally with existing tiling transforms, reducing the need to port to TilingInterface by reusing as much as possible and gaining tiling to the different types of loops.

Staying in Linalg land also allows say ? -> ?x8 + ?x7 which makes it possible to implement tiling as an N-D Linalg -> 2*N-D Linalg transformation (instead of the current N-D Linalg -> N-D loops + N-D Linalg).
This will also open opportunities for distribution and other fun higher-order stuff.

So please, consider generalizing :)

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
186

should

192

imperfectly

199

The note deserves its own paragraph for highlighting I believe.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
142

appearance

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
325

Can we use affine_apply expressions that would make these quantities nicer to parse (and later compose) ?

411

Same Q here, we could literally write something like:

create<AffineApply>( {iv * highTileSize + lowTrip * lowTileSize}, ...)

and they would render as such.
Despite the artificial limitations on dim/sym classification, I find the construct goo to carry higher-level semantic information on such quantities.

456

emit ? oh my .. :) :p

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
448

Reviewed this part, LGTM.

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
307

computing.

310

I would add an example that mentions a rewrite such as ? -> ?*8 + ?*7.

325

Just to be sure that I don't misinterpret the computation: this is the classical "load balancing" trick right ?

I.e. instead of decomposing 801 into 8*100 + 1, one would decompose in 8 * 94 + 7 * 7 ?
I.e. To avoid a shriveled last tile, you take the remainder (8 - K) out of the last tiles (you take exactly 1 from each tile, so you take 1 from 7 tiles in this case). The result is always N or N-1 sized tiles ?

The way this is described lends to suggest that it may not always be N or N-1 which would be different.
I know there is a paper but I find the description to not be super intuitive.

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
325

Even simpler: 801 into 8*100 + 1, one would decompose in 8 * 99 + 9 * 1.
Just taking the mod and making the last mod tile +1.
Not sure why I went for -1 instead of +1; +1 is the old form we used way back in the torch / lua days ...