We lower them to a std.global_memref (uniqued by constant value) + a
std.get_global_memref to produce the corresponding memref value.
This allows removing Linalg's somewhat hacky lowering of tensor
constants, now that std properly supports this.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp | ||
---|---|---|
312 | I think this proposal has far reaching consequences on transformations that are not yet well understood. That attribute may be either:
Irrespective of this discussion, I think the root of the problem still lies on the tensor_load + tensor_to_memref canonicalization (see below). | |
328 | I reverted this line locally after patching this revision and here is what I see: ./bin/mlir-opt -linalg-bufferize -std-bufferize -tensor-constant-bufferize -func-bufferize -convert-linalg-to-loops -print-ir-after-all ../mlir/integration_test/Dialect/Linalg/CPU/test-subtensor-insert.mlir // *** IR Dump After TensorConstantBufferize *** module { global_memref "private" constant @__constant_1xf32 : memref<1xf32> = dense<2.000000e+01> global_memref "private" constant @__constant_2xf32 : memref<2xf32> = dense<1.000000e+01> func @main() { %0 = get_global_memref @__constant_2xf32 : memref<2xf32> %1 = tensor_load %0 : memref<2xf32> %2 = get_global_memref @__constant_1xf32 : memref<1xf32> %3 = tensor_load %2 : memref<1xf32> %4 = tensor_to_memref %3 : memref<1xf32> %5 = tensor_to_memref %1 : memref<2xf32> %6 = subview %5[0] [1] [1] : memref<2xf32> to memref<1xf32> linalg.copy(%4, %6) : memref<1xf32>, memref<1xf32> %7 = tensor_load %5 : memref<2xf32> %8 = tensor_to_memref %7 : memref<2xf32> %9 = memref_cast %8 : memref<2xf32> to memref<*xf32> %10 = tensor_load %9 : memref<*xf32> call @print_memref_f32(%10) : (tensor<*xf32>) -> () return } func @print_memref_f32(tensor<*xf32>) } // *** IR Dump After FuncBufferize *** module { global_memref "private" constant @__constant_1xf32 : memref<1xf32> = dense<2.000000e+01> global_memref "private" constant @__constant_2xf32 : memref<2xf32> = dense<1.000000e+01> func @main() { %0 = get_global_memref @__constant_2xf32 : memref<2xf32> %1 = get_global_memref @__constant_1xf32 : memref<1xf32> %2 = subview %0[0] [1] [1] : memref<2xf32> to memref<1xf32> linalg.copy(%1, %2) : memref<1xf32>, memref<1xf32> %3 = memref_cast %0 : memref<2xf32> to memref<*xf32> call @print_memref_f32(%3) : (memref<*xf32>) -> () return } func @print_memref_f32(memref<*xf32>) } The problem seems to be again coming from the tensor_load + tensor_to_memref canonicalization. The tensor_to_memref semantics is: Create a memref from a tensor. This is equivalent to allocating a new memref of the appropriate (possibly dynamic) shape, and then copying the elements (as if by a tensor_store op) into the newly allocated memref. Under these semantics, I would argue the IR is correct after TensorConstantBufferize and incorrect after FuncBufferize. Relaxing the semantics of tensor_to_memref to not guarantee an alloc + copy would make it reasonable to require the bufferization to insert alloc + copy when it does not know better. However, I am not sure how to relax tensor_to_memref or even phrase it: Create a memref from a tensor %t. The resulting memref aliases the memref operand of all tensor_load operations that flows into %t .. ? |
mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp | ||
---|---|---|
312 | First of all, this is not a proposal. This is a fact, and this fixes a miscompile that was already present in the code, even before this patch. For clarity, I'll split that out. I think that adding a BufferizesToSomethingThatWritesIntoAnOperand trait would help. Then we could have a copy insertion pass that inserts copies at the tensor level (if needed), and the bufferization patterns will assume that those copies have been inserted. "bufferizable in place" is a global property, so transformations cannot possibly know when creating IR whether it is safe to do in place. See e.g. test-tensor-matmul -- that one fails because we attempt to do a subtensor_insert into a std.constant, which happens to bufferize to something read only. A similar problem trivially occurs if the std.constant is used as the init tensor of two different linalg ops (possibly obscured by arbitrary control flow). | |
328 | Agreed that we aren't obeying the "copy" aspect of tensor_to_memref. But that's not the problem here. The current way that we are lowering tensor_to_memref is simply emulating what would happen if you were running dialect conversion in one big pass (in that situation, tensors would be replaced directly by memrefs everywhere, and no tensor_to_memref/tensor_load pairs would be inserted to keep the IR legal). So the problem is independent of tensor_to_memref. It's a fundamental aspect of simply converting a tensor Value to a memref Value without doing anything else -- the correct semantics is that every *use* of a tensor Value that has been converted to memref needs to see a separate copy, *if* that use might mutate its operand in place. So we either need to update the dialect conversion framework to respect that, or we need to do a separate copy insertion phase that establishes the invariants. |
mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp | ||
---|---|---|
312 | I split out https://reviews.llvm.org/D91371 which fixes just subtensor_insert in isolation. |
LGTM once rebased on the addressed https://reviews.llvm.org/D91371.
Looking forward to the next steps :) !
I think this proposal has far reaching consequences on transformations that are not yet well understood.
In the short-term I'd favor a first class "bufferizable_in_place" attribute on subtensor/subtensor_insert to carry the information.
That attribute may be either:
Irrespective of this discussion, I think the root of the problem still lies on the tensor_load + tensor_to_memref canonicalization (see below).