diff --git a/mlir/docs/Bufferization.md b/mlir/docs/Bufferization.md --- a/mlir/docs/Bufferization.md +++ b/mlir/docs/Bufferization.md @@ -125,7 +125,7 @@ Tensor ops that are not in destination-passing style always bufferize to a memory allocation. E.g.: -``` +```mlir %0 = tensor.generate %sz { ^bb0(%i : index): %cst = arith.constant 0.0 : f32 @@ -138,7 +138,7 @@ `linalg.generic`, which can express the same computation with a destination ("out") tensor: -``` +```mlir #map = affine_map<(i) -> (i)> %0 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel"]} outs(%t : tensor) { @@ -153,7 +153,7 @@ operand in the first place? As an example, this can be useful for overwriting a slice of a tensor: -``` +```mlir %t = tensor.extract_slice %s [%idx] [%sz] [1] : tensor to tensor %0 = linalg.generic ... outs(%t) { ... } -> tensor %1 = tensor.insert_slice %0 into %s [%idx] [%sz] [1] @@ -170,7 +170,7 @@ where the result of a tensor op is the "destination" operand of the next tensor ops, e.g.: -``` +```mlir %0 = "my_dialect.some_op"(%t) : (tensor) -> (tensor) %1 = "my_dialect.another_op"(%0) : (tensor) -> (tensor) %2 = "my_dialect.yet_another_op"(%1) : (tensor) -> (tensor) @@ -179,7 +179,7 @@ Buffer copies are likely inserted if the SSA use-def chain splits at some point, e.g.: -``` +```mlir %0 = "my_dialect.some_op"(%t) : (tensor) -> (tensor) %1 = "my_dialect.another_op"(%0) : (tensor) -> (tensor) %2 = "my_dialect.yet_another_op"(%0) : (tensor) -> (tensor) @@ -230,13 +230,13 @@ contrast to the dialect conversion-based bufferization that delegates this job to the [`-buffer-deallocation`](https://mlir.llvm.org/docs/Passes/#-buffer-deallocation-adds-all-required-dealloc-operations-for-all-allocations-in-the-input-program) -pass. One-Shot Bufferize cannot handle IR where a newly allocated buffer is -returned from a block. Such IR will fail bufferization. +pass. By default, One-Shot Bufferize rejects IR where a newly allocated buffer +is returned from a block. Such IR will fail bufferization. A new buffer allocation is returned from a block when the result of an op that is not in destination-passing style is returned. E.g.: -``` +```mlir %0 = scf.if %c -> (tensor) { %1 = tensor.generate ... -> tensor scf.yield %1 : tensor @@ -251,7 +251,7 @@ Another case in which a buffer allocation may be returned is when a buffer copy must be inserted due to a RaW conflict. E.g.: -``` +```mlir %0 = scf.if %c -> (tensor) { %1 = tensor.insert %cst into %another_tensor[%idx] : tensor "my_dialect.reading_tensor_op"(%another_tensor) : (tensor) -> () @@ -266,10 +266,44 @@ inserted) is yielded from the "then" branch. In both examples, a buffer is allocated inside of a block and then yielded from -the block. This is not supported in One-Shot Bufferize. Alternatively, One-Shot -Bufferize can be configured to leak all memory and not generate any buffer -deallocations with `create-deallocs=0 allowReturnMemref`. The buffers can then -be deallocated by running `-buffer-deallocation` after One-Shot Bufferize. +the block. Deallocation of such buffers is tricky and not currently implemented +in an efficient way. For this reason, One-Shot Bufferize must be explicitly +configured with `allow-return-allocs` to support such IR. + +When running with `allow-return-allocs`, One-Shot Bufferize resolves yields of +newly allocated buffers with copies. E.g., the `scf.if` example above would +bufferize to IR similar to the following: + +```mlir +%0 = scf.if %c -> (memref) { + %1 = memref.alloc(...) : memref + ... + scf.yield %1 : memref +} else { + %2 = memref.alloc(...) : memref + memref.copy %another_memref, %2 + scf.yield %2 : memref +} +``` + +In the bufferized IR, both branches return a newly allocated buffer, so it does +not matter which if-branch was taken. In both cases, the resulting buffer `%0` +must be deallocated at some point after the `scf.if` (unless the `%0` is +returned/yielded from its block). + +One-Shot Bufferize internally utilizes functionality from the +[Buffer Deallocation](https://mlir.llvm.org/docs/BufferDeallocationInternals/) +pass to deallocate yielded buffers. Therefore, ops with regions must implement +the `RegionBranchOpInterface` when `allow-return-allocs`. + +Note: Buffer allocations that are returned from a function are not deallocated. +It is the caller's responsibility to deallocate the buffer. In the future, this +could be automated with allocation hoisting (across function boundaries) or +reference counting. + +One-Shot Bufferize can be configured to leak all memory and not generate any +buffer deallocations with `create-deallocs=0`. This can be useful for +compatibility with legacy code that has its own method of deallocating buffers. ## Memory Layouts @@ -279,7 +313,7 @@ bufferization boundary and decide on a memref type. By default, One-Shot Bufferize choose the most dynamic memref type wrt. layout maps. E.g.: -``` +```mlir %0 = "my_dialect.unbufferizable_op(%t) : (tensor) -> (tensor) %1 = tensor.extract %0[%idx1, %idx2] : tensor ``` @@ -287,7 +321,7 @@ When bufferizing the above IR, One-Shot Bufferize inserts a `to_memref` ops with dynamic offset and strides: -``` +```mlir #map = affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)> %0 = "my_dialect.unbufferizable_op(%t) : (tensor) -> (tensor) %0_m = bufferization.to_memref %0 : memref