This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Bufferize tensor constant ops
ClosedPublic

Authored by silvas on Nov 11 2020, 4:12 PM.

Details

Summary

We lower them to a std.global_memref (uniqued by constant value) + a
std.get_global_memref to produce the corresponding memref value.
This allows removing Linalg's somewhat hacky lowering of tensor
constants, now that std properly supports this.

Diff Detail

Event Timeline

silvas created this revision.Nov 11 2020, 4:12 PM
silvas requested review of this revision.Nov 11 2020, 4:12 PM
silvas edited the summary of this revision. (Show Details)Nov 11 2020, 4:13 PM
mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp
312

I think this proposal has far reaching consequences on transformations that are not yet well understood.
In the short-term I'd favor a first class "bufferizable_in_place" attribute on subtensor/subtensor_insert to carry the information.

That attribute may be either:

  1. obtained as the result of some analysis / canonicalization
  2. as a transformation invariant (e.g. the transformation that introduces it decides whether it has enough information/guarantees to add the attribute)
  3. directly written in the IR for testing purposes.

Irrespective of this discussion, I think the root of the problem still lies on the tensor_load + tensor_to_memref canonicalization (see below).

328

I reverted this line locally after patching this revision and here is what I see:

./bin/mlir-opt -linalg-bufferize -std-bufferize -tensor-constant-bufferize -func-bufferize -convert-linalg-to-loops -print-ir-after-all ../mlir/integration_test/Dialect/Linalg/CPU/test-subtensor-insert.mlir
// *** IR Dump After TensorConstantBufferize ***
module {
  global_memref "private" constant @__constant_1xf32 : memref<1xf32> = dense<2.000000e+01>
  global_memref "private" constant @__constant_2xf32 : memref<2xf32> = dense<1.000000e+01>
  func @main() {
    %0 = get_global_memref @__constant_2xf32 : memref<2xf32>
    %1 = tensor_load %0 : memref<2xf32>
    %2 = get_global_memref @__constant_1xf32 : memref<1xf32>
    %3 = tensor_load %2 : memref<1xf32>
    %4 = tensor_to_memref %3 : memref<1xf32>
    %5 = tensor_to_memref %1 : memref<2xf32>
    %6 = subview %5[0] [1] [1]  : memref<2xf32> to memref<1xf32>
    linalg.copy(%4, %6) : memref<1xf32>, memref<1xf32>
    %7 = tensor_load %5 : memref<2xf32>
    %8 = tensor_to_memref %7 : memref<2xf32>
    %9 = memref_cast %8 : memref<2xf32> to memref<*xf32>
    %10 = tensor_load %9 : memref<*xf32>
    call @print_memref_f32(%10) : (tensor<*xf32>) -> ()
    return
  }
  func @print_memref_f32(tensor<*xf32>)
}

// *** IR Dump After FuncBufferize ***
module {
  global_memref "private" constant @__constant_1xf32 : memref<1xf32> = dense<2.000000e+01>
  global_memref "private" constant @__constant_2xf32 : memref<2xf32> = dense<1.000000e+01>
  func @main() {
    %0 = get_global_memref @__constant_2xf32 : memref<2xf32>
    %1 = get_global_memref @__constant_1xf32 : memref<1xf32>
    %2 = subview %0[0] [1] [1]  : memref<2xf32> to memref<1xf32>
    linalg.copy(%1, %2) : memref<1xf32>, memref<1xf32>
    %3 = memref_cast %0 : memref<2xf32> to memref<*xf32>
    call @print_memref_f32(%3) : (memref<*xf32>) -> ()
    return
  }
  func @print_memref_f32(memref<*xf32>)
}

The problem seems to be again coming from the tensor_load + tensor_to_memref canonicalization.

The tensor_to_memref semantics is:

Create a memref from a tensor. This is equivalent to allocating a new
memref of the appropriate (possibly dynamic) shape, and then copying the
elements (as if by a tensor_store op) into the newly allocated memref.

Under these semantics, I would argue the IR is correct after TensorConstantBufferize and incorrect after FuncBufferize.

Relaxing the semantics of tensor_to_memref to not guarantee an alloc + copy would make it reasonable to require the bufferization to insert alloc + copy when it does not know better.

However, I am not sure how to relax tensor_to_memref or even phrase it:

Create a memref from a tensor %t. The resulting memref aliases the memref operand of all tensor_load operations that flows into %t .. ?
silvas added inline comments.Nov 12 2020, 10:27 AM
mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp
312

First of all, this is not a proposal. This is a fact, and this fixes a miscompile that was already present in the code, even before this patch. For clarity, I'll split that out.

I think that adding a BufferizesToSomethingThatWritesIntoAnOperand trait would help. Then we could have a copy insertion pass that inserts copies at the tensor level (if needed), and the bufferization patterns will assume that those copies have been inserted.

"bufferizable in place" is a global property, so transformations cannot possibly know when creating IR whether it is safe to do in place. See e.g. test-tensor-matmul -- that one fails because we attempt to do a subtensor_insert into a std.constant, which happens to bufferize to something read only. A similar problem trivially occurs if the std.constant is used as the init tensor of two different linalg ops (possibly obscured by arbitrary control flow).

328

Agreed that we aren't obeying the "copy" aspect of tensor_to_memref. But that's not the problem here. The current way that we are lowering tensor_to_memref is simply emulating what would happen if you were running dialect conversion in one big pass (in that situation, tensors would be replaced directly by memrefs everywhere, and no tensor_to_memref/tensor_load pairs would be inserted to keep the IR legal). So the problem is independent of tensor_to_memref.

It's a fundamental aspect of simply converting a tensor Value to a memref Value without doing anything else -- the correct semantics is that every *use* of a tensor Value that has been converted to memref needs to see a separate copy, *if* that use might mutate its operand in place.

So we either need to update the dialect conversion framework to respect that, or we need to do a separate copy insertion phase that establishes the invariants.

silvas updated this revision to Diff 304908.Nov 12 2020, 11:04 AM

Split out fix to subtensor bufferization into https://reviews.llvm.org/D91371

silvas edited the summary of this revision. (Show Details)Nov 12 2020, 11:06 AM
silvas added inline comments.
mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp
312

I split out https://reviews.llvm.org/D91371 which fixes just subtensor_insert in isolation.

LGTM once rebased on the addressed https://reviews.llvm.org/D91371.

Looking forward to the next steps :) !

silvas updated this revision to Diff 304923.Nov 12 2020, 11:53 AM
silvas edited the summary of this revision. (Show Details)

update

This revision was not accepted when it landed; it landed in state Needs Review.Nov 12 2020, 2:57 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.