This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Add pattern to tile and fuse Linalg operations on buffers.
ClosedPublic

Authored by mravishankar on Sep 28 2020, 10:49 AM.

Details

Summary

The pattern is structured similar to other patterns like LinalgTilingPattern. The fusion patterns takes options that allows you to fuse with producers of multiple operands at once.

  • The pattern fuses only at the level that is known to be legal, i.e if a reduction loop in the consumer is tiled, then fusion should happen "before" this loop. Some refactoring of the fusion code is needed to fuse only where it is legal.
  • Since the fusion on buffers uses the LinalgDependenceGraph that is not mutable in place the fusion pattern keeps the original operations in the IR, but are tagged with a marker that can be later used to find the original operations.

This change also fixes an issue with tiling and distribution/interchange where if the tile size of a loop were 0 it wasnt account for in these.

Diff Detail

Event Timeline

mravishankar created this revision.Sep 28 2020, 10:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2020, 10:49 AM
mravishankar requested review of this revision.Sep 28 2020, 10:49 AM

Updating comments and using a different marker for fused producer.

Rebasing and using updated op semantics of Linalg, as well as adding a
missed comment documentation.

Allow the tile and fuse pattern to just tile if fusion isnt possible.

First batch of comments.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
432

s/change/changes

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
704

I'd just go for "incorrect operand index".

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
464

s/position/positions

467

linalg.matmul ins(%a, %b: ...) outs(%c: ...)

477

s/commong/common

498

Could you please spell out the condition and reason a little more (maybe with an example).
The condition is that the producer iterates a single time along a valid consumer tiling dimension.
Using a linalg.some_op_with_inplace_reduce should make it simpler to visualize.

Before:

linalg.add ins(%a, %b) outs(%c)
linalg.matmul ins(%d, %c) outs(%e)

Correct:

for %j .. += TSj 
  for %k .. += TSk
    %sa = subview %a[%k, %j][...]
    %sb = subview %b[%k, %j][...]
    %sc = subview %c[%k, %j][...]
    %sd = subview %d[0, %k][...]
    %se = subview %e[0, %j][...]
    linalg.add ins(%sa, %sb) outs(%sc)
    linalg.matmul ins(%sd, %sc) outs(%se)

Incorrect (along i):

for %i .. += TSi 
  %sa = subview %a[%0, %0][...]
  %sb = subview %b[%0, %0][...]
  %sc = subview %c[%0, %0][...]
  %sd = subview %d[%i, %0][...]
  %se = subview %e[%i, %0][...]
  linalg.add ins(%sa, %sb) outs(%sc)
  linalg.matmul ins(%sd, %sc) outs(%se)

Here we see that the whole %c could be computed multiple times in-place.
This can be relaxed in the tensor world as we will essentially be recomputing, it will be interesting to see if we can just do it for free and have copy-elision / buffer placement automatically find that if %i == 0 then linalg.add and how that interplays with parallelism.

Note that this depends on how linalg.add is defined:
if linalg.add only computes c = a + b, then it would then just overcompute (and potentially race to writeback) which could be ok.
If linalg.add computes c += a + b then it would produce wrong results: we would need to analyze the region to see whether outs are used in computation(future work).

^-- also @pifon2a

501

plz update syntax

520

trivial braces here and in other places

527

This seems too restrictive.

If we look at the dependencies, the following:

red k
|  doall i
|  |  A[i] += f(k)

doall i
|  doall j
|  |  B[i, j] = g(B[i, j], A[i])

Can fuse into:

doall i += TS
|   sA = subview A[i][...]
|   red k
|   |  doall ii
|   |  |  sA[ii] += g(k)
|   doall j 
|   |  sB = subview B[i, j][...]
|   |  doall ii
|   |  |  sB[ii, j] = g(sB[ii, j], sA[ii])

Let's please add a TODO to relax the assumptions here.

551

Intuitively it seems we could just interchange the generic form of the original consumer, tile, fuse, interchange the generic form of the tiled consumer with inverse permutation ?
Let's please add a TODO to investigate this later.

609

I'd bias towards the behavior you implemented: intuitively tileAndFuse would want different sizes than just tile to fit within a given memory budget.

629

s/transofmration/transformation

mlir/test/Dialect/Linalg/fusion-pattern.mlir
9

Plz rebase to new syntax

nicolasvasilache accepted this revision.Sep 30 2020, 7:24 AM

Some more comments and global code reorg to please address but looks good otherwise.
Thanks for pushing the envelope @mravishankar !

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
35

Should you keep originalOp + fusedOp too ?
I realize you can access the originalOp from the pattern, it is more for symmetry/consistency.

38

Not sure whether relevant longer term but should we split common loops from fusedConsumer-only loops ?

65

Some example in its full complexity would be welcome.
Feel free to use static shapes + tile sizes that divide + call -cse and add affineminscfforcanonicalization to tiling so that you generate clean IR you can just copy-pase.
Seems like fusing 2 matmuls would be a good example re. imperfectly nested structure + mixing tileAndFuse with tile ?

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
472

Can we move a lot of this code to Fusion.cpp ?
The rationale is that tileAndFuse is really fuse with tiling as an enabling transformation.
Also, conflicts will be much easier to resolve with what I have in flight.

531

if (llvm::all_of(commonTilableLoops, ...))
should help make this tighter

561

Can we factor this out in a static function?
The flow is getting hard to follow.

616

s/fiused/fused

622

I'd just fail given memory consumption considerations discussed above.
Would also simplify the code.

632

Can we split some of the code below into new functions and better document what is happening here?

We start with tileAndFuse + tile sizes.
Some of the loops can't be fused into so we split into 2 tile sizes vectors, one on which we fuse the other that we only tile, etc.

Seems a little reorganization would make this quite simpler to digest.

Of course, all this would need some good examples: we are performing tile and fuse of imperfectly nested stuff after all so it is quite more involved than traditional loop transformations :)

675

is it really dont only ? :)

mlir/test/Dialect/Linalg/fusion-pattern.mlir
9

Plz ignore must have had a stale version

277

With interchange we can fuse 2 loops, just sayin' :)

This revision is now accepted and ready to land.Sep 30 2020, 7:24 AM
mravishankar marked 17 inline comments as done.

Addressing comments and moving majority of code to Fusion.cpp

mravishankar marked 6 inline comments as done.Sep 30 2020, 2:05 PM
mravishankar added inline comments.
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
35

I think I kept it here for consistency. I will leave it as is. Returning more info is OK?

38

Yup, good idea. Split it.

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
498

I am confused with this example. I think the correct tile+fused code in your example would be

for %j .. += TSj 
    %sa = subview %a[0, %j][...]
    %sb = subview %b[0, %j][...]
    %sc = subview %c[0, %j][...]
    %sd = subview %d[0, 0][...]
    %se = subview %e[0, %j][...]
    linalg.add ins(%sa, %sb) outs(%sc)
    linalg.matmul ins(%sd, %sc) outs(%se)

The tiling along i would be

for %i .. += TSj 
    %sa = subview %a[%i, 0][...]
    %sb = subview %b[%i, 0][...]
    %sc = subview %c[%i, 0][...]
    %sc2 = subview %c[0, 0][...]
    %sd = subview %d[%i, 0][...]
    %se = subview %e[%i, 0][...]
    linalg.add ins(%sa, %sb) outs(%sc)
    linalg.matmul ins(%sd, %sc2) outs(%se)

This is wrong cause the tile written by the linalg.add is written after its use in the linalg.matmul violating the dependency.

Ill use the above example. Will have a follow up CL on this anyway, will correct if there are any issues then

551

The two step tiling makes the logic of figuring out where to apply the interchange harder. Definitely possible, I havent thought about it (and I couldnt work through this part with everything else). Will add TODO.

609

Makes sense. Will change it.

632

I moved some of the code above into functions to make this method in general smaller. But moving the code below into separate functions (at least the way I see how to do this) would be quite clunky in terms of arguments and returns (and marshalling everything). I will add some comments to make this cleaner?

675

s/dont/done/ . THanks!

mlir/test/Dialect/Linalg/fusion-pattern.mlir
277

+1!

mravishankar marked 4 inline comments as done.

Removing public interface of methods that are now accessed within a single file.