This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Add basic support for TileAndFuse on Linalg on tensors.
ClosedPublic

Authored by nicolasvasilache on Oct 15 2020, 12:15 PM.

Details

Summary

This revision allows the fusion of the producer of input tensors in the consumer under a tiling transformation (which produces subtensors).
Many pieces are still missing (e.g. support init_tensors, better refactor LinalgStructuredOp interface support, try to merge implementations and reuse code) but this still allows getting started.

The greedy pass itself is just for testing purposes and will be extracted in a separate test pass.

Diff Detail

Event Timeline

nicolasvasilache requested review of this revision.Oct 15 2020, 12:15 PM

This is really cool! Minimal change to get this working on tensors! Some minor comments.

Do you want to try using the TileAndFusePattern to test the fusion on tensors. I think it should just work.

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h
99

This can be confusing with the fuseTensorOps method below. They are doing different things. maybe some more comments on that would be helpful.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
256

I am not sure why you need to do this. Cant you just use the resultTypes of the producer directly. That should already be the the tensorType that you have in the consumer right?

463

Looking at this, the fuse method is already able to handle tensor or buffer ops. The only difference between this and fuseProduceOfBuffer method is the logic to get the producer. Do we want to even have a separate function for this. Based on the type of the operand at consumerIdx we can get the produer using the buffer approach or tensor approach and just call fuse. That seems like it should work.

mravishankar requested changes to this revision.Oct 15 2020, 2:07 PM
This revision now requires changes to proceed.Oct 15 2020, 2:07 PM
nicolasvasilache marked 3 inline comments as done.

Address review.

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h
99

Ah yes, thanks!

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
256

I don't think this is true, see the test:

  %t0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)
                     init(%arg2: tensor<?x?xf32>)
    -> tensor<?x?xf32>
  ...
  %3 = scf.for %arg3 = %c0 to %0 step %c2 iter_args(%arg4 = %arg2) -> (tensor<?x?xf32>) {
    %4 = scf.for %arg5 = %c0 to %2 step %c3 iter_args(%arg6 = %arg4) -> (tensor<?x?xf32>) {
      %5 = scf.for %arg7 = %c0 to %1 step %c4 iter_args(%arg8 = %arg6) -> (tensor<?x?xf32>) {
        ...
        %10 = subtensor %t0[%arg3, %arg7] [%7, %9] [1, 1]  : tensor<?x?xf32> to tensor<?x?xf32>
        ...
        %21 = linalg.matmul ins(%10, %15 : tensor<?x?xf32>, tensor<?x?xf32>) init(%20 : tensor<?x?xf32>)  -> tensor<?x?xf32>
...

That subtensor %t0 may easily change the type if we have some static size and the "tile and fused producer" needs to produce that type.
So I am just taking it from the consumer input.
Updated the test to use a mix of static and dynamic to better illustrate what is happening problem.

Also adding an assertion to guard against multi-outputs: this will require more thinking as to where to get the output shape for other returned tensors.

463

Yes, I am planning on doing tighter integration in subsequent commits.
Originally, I had split fuseLinalgOpsGreedily into 2 functions but then saw it was ok to dispatch per operand.
Still, I wanted to keep the chances of messing up input/output_buffers/input_tensors operands + types low.

I can wait until all steps are ready and squash them together later if you prefer?
Tentatively marking as done for now, please reopen if you feel this is not enough.

Thanks Nicolas! If there is no hurry, I'd like to take a couple of days and do a deep dive into the transformation here and get a better idea of the changes. So if it isnt blocking, then I will circle back to this a couple of days.

In the meanwhile, some tests seem to be failing. Maybe its misattribute...

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
256

Just not sure why this needs to be done for tensor, when this is not done for the buffer. AFAICS, the subview computed gets the right shape of the operands for the cloned op. Same should be true for the subtensor op.
I might be missing something here. Its worth me going into details of this. Sorry for the delay. Ill do a deep dive and update here.

Btw, I dont see the updated example. Seems like it needs to be uploaded?

mravishankar requested changes to this revision.Oct 24 2020, 3:42 PM

Seems to be some failing tests here. Will circle back after that is fixed.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
109

This is the part that I am referring to below. The subview calculation above takes the view, offset, sizes, strides and creates a memref that exactly matches (at least as far as I know), the type of the memref of the operand that is the fused view. In the tensor equivalent the fused view is by construction the result of the subtensor operation. It must be creating the same tensor type that is the operand in the consumer. If this isnt the case then the fusion is wrong AFAICS.

This revision now requires changes to proceed.Oct 24 2020, 3:42 PM
nicolasvasilache marked 2 inline comments as done.

Refactor and address review.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
109

The subview calculation above takes the view, offset, sizes, strides and creates a memref that exactly matches (at least as far as I know), the type of the memref of the operand that is the fused view

This does not hold at the moment of construction: the type of the producing subview is always built dynamically.
Even when passing constant sizes, they are constant index ops and the subview builder returns a fully dynamic type.
On the other hand, the consumer may already have been canonicalized (or constructed this way) and can be "more static".
In other words, at the time of construction, the types may not match, but after enough canonicalizations+foldings kick in, the static types are consistent.
The invariants of the transformation guarantee that the types are compatible.
We can debate on whether or not this is reasonable and whether we should write more code to detect static sizes coming from SSA values.
For now this is left to canonicalization + folding which may pull quite a few patterns (e.g. affine_min + scf.for canonicalization for "statically known to be full" tiles).

In the buffer case, there is no use-def chain whose type needs to be consistent so this mismatch did not pose problems.
We have 2 aliasing views that after canonicalization + cse become the same view object and trigger further fusions.

In the tensor case, the IR needs to be valid and this needs a cast.
This is similar to other transformations that introduce memref_cast and tensor_cast ops.

I refactored this to separate make it more explicit what happens, in particular output tensor types are now inferred (always dynamic).

It must be creating the same tensor type that is the operand in the consumer. If this isn't the case then the fusion is wrong AFAICS.

It does but this is only apparent after canonicalization + cse, I added an example in the test to show this.
In the following patch I'll refactor the pass to become a purely test pass that uses pattern rewrites mixed with canonicalization, so I'll be able to remove that special from the RUN command.

256

Just not sure why this needs to be done for tensor, when this is not done for the buffer. AFAICS, the subview computed gets the right shape of the operands for the cloned op. Same should be true for the subtensor op.

See explanation above. In the buffer case, the type mismatch was hidden by aliasing. In the tensor case we need to be type consistent.

Btw, I dont see the updated example. Seems like it needs to be uploaded?

Seems I missed something when uploading before leaving for 1 week, sorry about that!
Should be fixed now.

mravishankar accepted this revision.Oct 26 2020, 10:12 AM

Thanks for the refactoring. Make sense now. Exciting!

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
121

THanks for the comments and refactoring. I understand now how this works.

This revision is now accepted and ready to land.Oct 26 2020, 10:12 AM