This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Introduce linalg.pad_tensor op.
ClosedPublic

Authored by hanchung on Dec 22 2020, 7:12 AM.

Details

Summary

linalg.pad_tensor is an operation that pads the source tensor
with given low and high padding config.

Example 1:

mlir
  %pad_value = ... : f32
  %1 = linalg.pad_tensor %0 low[1, 2] high[2, 3] {
  ^bb0(%arg0 : index, %arg1 : index):
    linalg.yield %pad_value : f32
  } : tensor<?x?xf32> to tensor<?x?xf32>

Example 2:

mlir
  %pad_value = ... : f32
  %1 = linalg.pad_tensor %arg0 low[2, %arg1, 3, 3] high[3, 3, %arg1, 2] {
  ^bb0(%arg2: index, %arg3: index, %arg4: index, %arg5: index):
    linalg.yield %pad_value : f32
  } : tensor<1x2x2x?xf32> to tensor<6x?x?x?xf32>

Diff Detail

Event Timeline

hanchung created this revision.Dec 22 2020, 7:12 AM
hanchung requested review of this revision.Dec 22 2020, 7:12 AM
hanchung edited the summary of this revision. (Show Details)Dec 22 2020, 7:13 AM
mehdi_amini added inline comments.Dec 22 2020, 9:56 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
52

This backquote isn't closed.

60

You don't make it clear if the source buffer is *modified* by this op.
It seems like it is, because I don't see how to store the padded values otherwise, but that's not great that an operation which is described as "taking a view" is actually modifying the source buffer.

mravishankar added inline comments.Dec 22 2020, 11:44 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
60

Thanks Mehdi (Hanhan and I discussed this op extensively and the implementation is based on that discussion). Would like to hash out semantics of this.

The op is not modifying the source buffer while taking a view. Instead it is providing a view of the underlying buffer with information about what values to use if you go "outside" of the underlying source buffer", i.e.

  • When you read from the view, if you are not in the padding, you get the value from the source buffer. If you are in the padding you get the padded value specified using the region of the operation
  • When you write from the view, if you are not in the padding, you write to the source buffer. If you are in the padding the write is ignored. (@Hanhan maybe we should make this semantics explicit)

For the write I agree the semantics is strange (but arguably correct). I dont expect the padded_view to be used to write data. Its only used to read, but the semantics is fairly easy to specify.
This is a way to not always require creating a new buffer to implement padding. Using this op you can fold the padding with its load/store which is a more efficient way to implement padding. You can also fold this with vector.transfer_read/transfer_write to used masked operations and vectorize the computation even with padding.

The advantage here is that the transformation to do tiling/vectorization, etc do not need to actually worry about the padding. They can be implemented as if they are working with the padded buffer. The padding is then implemented using rewrites on load/stores.
Goes without saying this is experimental to some extent, but thats what a lot of manual implementations of ops like conv/pooling, etc. implement

mehdi_amini added inline comments.Dec 22 2020, 11:56 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
60

I have some concerns with what you're describing, because this "view" produces a memref that isn't usable without understanding and accessing this padded_view operation to be able to get the padded value.

mehdi_amini added inline comments.Dec 22 2020, 11:59 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
60

Another aspect is that it does not describe when is the region evaluated: all these ops are evaluating their region when the op is executed, while here it seems that your intent is to fold the region to wherever the memref is used. That can yield weird "effects at a distance" when the region does more than just return a fixed SSA value.

mravishankar added inline comments.Dec 22 2020, 12:22 PM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
60

I have some concerns with what you're describing, because this "view" produces a memref that isn't usable without understanding and accessing this padded_view operation to be able to get the padded value.

Explicitly trying to avoid that. This is producing a view (which is memref type) and not a new allocation. Within Linalg itself, you dont need to know where this "memref" came from. Can you give me a more concrete example of why the memref produced here is not usable. From my reading of what this sentence says, the same is true even for a subview. So maybe missing something.

Another aspect is that it does not describe when is the region evaluated: all these ops are evaluating their region when the op is executed, while here it seems that your intent is to fold the region to wherever the memref is used. That can yield weird "effects at a distance" when the region does more than just return a fixed SSA value.

The region is evaluated when you try to access the source memref using indices that are within the padded region. I am not sure I understand what you mean by region not returning a fixed SSA value. It has a single return value. We could have the region describe the conditionals logic described below, but thats seems cumbersome and not really required for the op semantics.

mehdi_amini added inline comments.Dec 22 2020, 3:05 PM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
60

Explicitly trying to avoid that. This is producing a view (which is memref type) and not a new allocation. Within Linalg itself, you dont need to know where this "memref" came from. Can you give me a more concrete example of why the memref produced here is not usable. From my reading of what this sentence says, the same is true even for a subview. So maybe missing something.

As far as I understand it, a subview only transforms the mapping from the virtual index space into the underlying buffer, this isn't the case here because of the padding.
For example you have this example in the test:

%0 = linalg.padded_view %arg0[1, 2] [2, 3] {
  ^bb0(%arg1 : index, %arg2 : index):
    linalg.yield %pad_value : f32
  } : memref<3x4xf32> to memref<6x9xf32>

What can we do with the %0 memref? Can we use it like other memref?
So for example can you pass it to the runtime print?
We have mlir-cpu-runner test where you should be able to do:

%cast = memref_cast %0 : memref<6x9xf32> to memref<*xf32>
call @print_memref_f32(%unranked_input) : (memref<*xf32>) -> ()

Will this print a 6x9 output? What will it print for the padding?

The region is evaluated when you try to access the source memref using indices that are within the padded region.

As I mentioned before, this seems dangerously fragile to me. At minima that requires ensuring that the region has no side effects.

I am not sure I understand what you mean by region not returning a fixed SSA value. It has a single return value.

I mean that the SSA value alone is not enough, the SSA value somehow carries with it the closure that is represented by the region. This isn't consistent with memref in general I believe.
But we can clarify this point by looking at the example I provide above, this isn't a different point here.

hanchung added inline comments.Dec 23 2020, 1:52 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
60

I don't know how print_memref_f32 work, but I can get a bit what you're saying. Different from other memref, we need a special logic to handle the output of linalg.padded_view. linalg.padded_view not only maps the access indices into the buffer, it also defines non-mapped elements in the buffer.

I feel theoretically it is correct because what print_memref_f32 does is to iterate over all the elements like:

scf.for %iv0 = ... {
  scf.for %iv1 = ... {
    %0 = load padded_view[%iv0, %iv1] : f32
    print %0 : f32
  }
}

And a valid lowering of padded_view is to fold it into if-else to extract the load_value or pad_value. So it will print a 6x9 output.

As Mahesh mentioned, this operation is expected to work within Linalg itself. After applying some transforms, the operation will just fold into either if-else op or alloc + fill + copy. The goal to create a Linalg operation which can represent "pad" semantics in Linalg, work with Linalg transforms, and the op will get killed at some point. I would expect to apply passes (like linalg-to-std) to work with mlir-cpu-runner.

I agree that we can have a more complete definition on the op like Mahesh stated:

When you read from the view, if you are not in the padding, you get the value from the source buffer. If you are in the padding you get the padded value specified using the region of the operation.
When you write from the view, if you are not in the padding, you write to the source buffer. If you are in the padding the write is ignored.

Having a region is easier to extend the pad op to handle different padding cases, like repeat_edge, mirror_edge, etc. But if this is not consistent with memref, I think we can have an explicit operation like padded_scalar_view without taking a region.

hanchung updated this revision to Diff 313538.Dec 23 2020, 5:14 AM

Add more descriptions

mehdi_amini added inline comments.Dec 23 2020, 10:47 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
60

I feel theoretically it is correct because what print_memref_f32 does is to iterate over all the elements like:
...
And a valid lowering of padded_view is to fold it into if-else to extract the load_value or pad_value. So it will print a 6x9 output.

I assume that it is %padded_view here and it is the SSA value returned by linalg.padded_view?

Then it would require *every* consumer of the memref to look for the producer and understand how it need to handle it: the SSA value you're producing is not a valid memref by itself, this is a problem to me.

print_memref_f32 is implemented here: https://github.com/llvm/llvm-project/blob/master/mlir/lib/ExecutionEngine/RunnerUtils.cpp#L39

bondhugula requested changes to this revision.Dec 24 2020, 10:30 AM
bondhugula added a subscriber: bondhugula.
bondhugula added inline comments.
mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
40

pad -> padded_view

41

Nit: : after example

45

This is where you could really use the custom syntax to improve readability. Consider using keywords like low and high so that it's clear what corresponds to low/high and what corresponds to dimensions. Eg:
linalg.padded_view %0 low [1, 2] high [2, 3] {

45

Unfortunately, this design is flawed as @mehdi_amini hints in his first message -- simply because the resulting memref type isn't carrying the necessary information. This basically means that one is lost if the memref "escapes" (not just interprocedurally but also within a function in various ways), passed along to nested regions with arguments or propagated along in explicit capture style, or in numerous other ways. This is a common pitfall of not having the necessary information encoded in the type. @mravishankar - note that this is NOT a problem with the other view/subview like ops or any other memref creating cast ops that I've seen till date --- because the necessary information to lower say the load/stores is available in the *type*. If other memref defining ops with behavior like these have been introduced where you have to look at the defining operation to see what's happening with accesses (like @mehdi_amini points out), I believe that's equally grave!

51

The first list ...

This revision now requires changes to proceed.Dec 24 2020, 10:30 AM

Catching up on the discussion after break here. I agree the design is flawed. The strides are wrong of the result memref. As defined right now that would mean that lowering the memref will need to look at where the memref is coming from which is really wrong. You could return a memref with the strides embedded into the memref type so that it works fine, but I dont think that fully works either. I think Hanhan had a fix to this shortly. Thanks for the feedback!

hanchung updated this revision to Diff 317883.Jan 20 2021, 8:02 AM

Rework and define the op on tensor.

hanchung retitled this revision from [mlir][Linalg] Introduce linalg.padded_view op. to [mlir][Linalg] Introduce linalg.pad_tensor op..Jan 20 2021, 8:03 AM
hanchung edited the summary of this revision. (Show Details)
hanchung updated this revision to Diff 317884.Jan 20 2021, 8:08 AM
hanchung marked 13 inline comments as done.

Update the doc a bit.

After a long offline discussion, we have a plan to make pad op work with Linalg transforms. I will add two pad op, one is tensor version and another is memref version. I'll start with adding pad_tensor op to Linalg.

I agree that the original semantic introduces issues on memref. The information is not enough. The current plan is to pass an extra destination memref instead of returning one. I will work on it and send out for review in another patch. Thanks for the feedbacks, they are really helpful!

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
45

Added low and high. This is a good idea, thanks!

Harbormaster completed remote builds in B85893: Diff 317884.
nicolasvasilache requested changes to this revision.Jan 20 2021, 12:55 PM

Thanks for pushing on this Hanhan.
The impl. needs to be a little more involved to allow different numbers of low and high padding values.
You'll also need extra builders and accessors to make things easier to manipulate, they can be added on a per need basis.
You can look at Subview/Subtensor and the OffsetSizesAndStridesInterface for similarly looking code.
Refactorings to reuse code are most welcome, if reasonable.

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td
121

I think you can't just use SameVariadicOperandSize, the following shouldn't work:
.pad_tensor %t low[0, 0] high[%ub0, %ub1].

Indeed you're missing a test for this.

You need to handle your operand_segment_size manually.

See https://github.com/llvm/llvm-project/blob/b1e1bbae0e30c89251940efb0780eee6a1b79ecd/mlir/include/mlir/Dialect/StandardOps/IR/Ops.td#L226
and https://github.com/llvm/llvm-project/blob/118a71565462db41cab1dbb0349200627d6e8524/mlir/lib/Interfaces/ViewLikeInterface.cpp#L161 if you need an example.

127

I'd move the examples after the textual description.

mlir/test/Dialect/Linalg/roundtrip.mlir
21

Please add an asymmetrical test as discussed above.

This revision now requires changes to proceed.Jan 20 2021, 12:55 PM
hanchung updated this revision to Diff 318231.Jan 21 2021, 9:00 AM

Address comments

  • Use AttrSizedOperandSegments trait
  • Add an asymmetrical test
  • Add couple extra builders
  • Add operands description
  • Move examples after the textual description
hanchung marked 3 inline comments as done.Jan 21 2021, 9:00 AM

This looks good.
Can you please add one test per verifier failure to test/Dialect/Linalg/invalid.mlir ?
Once these are in, this is good to go.

Thanks @hanchung !

hanchung updated this revision to Diff 318291.Jan 21 2021, 12:53 PM

Add one more test to invalid.mlir

This looks good.
Can you please add one test per verifier failure to test/Dialect/Linalg/invalid.mlir ?
Once these are in, this is good to go.

Thanks @hanchung !

Added one more test.

The only missing one is for multiple blocks. I think it is not easy to test in invalid.mlir because the parser doesn't parse two regions.

hanchung updated this revision to Diff 318295.Jan 21 2021, 1:06 PM

Add no block test

This looks good.
Can you please add one test per verifier failure to test/Dialect/Linalg/invalid.mlir ?
Once these are in, this is good to go.

Thanks @hanchung !

Added one more test.

The only missing one is for multiple blocks. I think it is not easy to test in invalid.mlir because the parser doesn't parse two regions.

I was wrong, added.

nicolasvasilache accepted this revision.Jan 21 2021, 1:29 PM
This revision was not accepted when it landed; it landed in state Needs Review.Jan 21 2021, 10:16 PM
This revision was automatically updated to reflect the committed changes.