This is an archive of the discontinued LLVM Phabricator instance.

[mlir][bufferize] Better error handling: Fail if ToMemrefOps are found
ClosedPublic

Authored by springerm on Aug 17 2022, 6:57 AM.

Details

Summary

bufferization.to_memref ops are not supported in One-Shot Analysis. They often trigger a failed assertion that can be confusing. Instead, scan for to_memref ops before running the analysis and immediately abort with a proper error message.

Diff Detail

Event Timeline

springerm created this revision.Aug 17 2022, 6:57 AM
springerm requested review of this revision.Aug 17 2022, 6:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 6:57 AM
This revision is now accepted and ready to land.Aug 18 2022, 12:47 AM

Hi!
I suspect this assertion is too strict.
I have here a piece of code that works without the assertion but fails when the assertion is there:

Command:
mlir-opt -allow-unregistered-dialect -one-shot-bufferize="bufferize-function-boundaries=1 allow-return-allocs=1 allow-unknown-ops=1 create-deallocs=0 function-boundary-type-conversion=identity-layout-map"

Input:

#map = affine_map<(d0, d1, d2, d3) -> (d0 * 6400 + d1 * 6400 + d2 * 80 + d3)>
#map1 = affine_map<(d0, d1, d2, d3) -> (d0 * 25 + d1 * 25 + d2 * 5 + d3)>
#map2 = affine_map<(d0, d1, d2, d3) -> (d0 * 92416 + d1 * 5776 + d2 * 76 + d3)>
    module {
      func.func @foo(%arg0: tensor<16xsi8>, %arg1: tensor<?x1x80x80xui8>, %arg2: tensor<16xsi8>, %arg3: tensor<16x1x5x5xsi8>) -> (tensor<?x16x76x76xsi8>) {
        %0 = "my.op"(%arg1, %arg3, %arg2, %arg0) : (tensor<?x1x80x80xui8>, tensor<16x1x5x5xsi8>, tensor<16xsi8>, tensor<16xsi8>) -> tensor<?x16x76x76xsi8>
        return %0 : tensor<?x16x76x76xsi8>
      }
      func.func @bar(%arg0: memref<16xsi8>, %arg1: memref<1x1x80x80xui8, #map>, %arg2: memref<16xsi8>, %arg3: memref<16x1x5x5xsi8, #map1>) -> (memref<1x16x76x76xsi8, #map2>) {
        %cast = memref.cast %arg1 : memref<1x1x80x80xui8, #map> to memref<?x1x80x80xui8, #map>
        %0 = bufferization.to_tensor %arg0 : memref<16xsi8>
        %1 = bufferization.to_tensor %cast : memref<?x1x80x80xui8, #map>
        %2 = bufferization.to_tensor %arg2 : memref<16xsi8>
        %3 = bufferization.to_tensor %arg3 : memref<16x1x5x5xsi8, #map1>
        %4 = call @foo(%0, %1, %2, %3) : (tensor<16xsi8>, tensor<?x1x80x80xui8>, tensor<16xsi8>, tensor<16x1x5x5xsi8>) -> tensor<?x16x76x76xsi8>
        %5 = bufferization.to_memref %4 : memref<?x16x76x76xsi8, #map2>
        %cast_0 = memref.cast %5 : memref<?x16x76x76xsi8, #map2> to memref<1x16x76x76xsi8, #map2>
        %cast_1 = memref.cast %cast_0 : memref<1x16x76x76xsi8, #map2> to memref<1x16x76x76xsi8, #map2>
        return %cast_1 : memref<1x16x76x76xsi8, #map2>
      }
    }

Output when ToMemref are not asserted:

#map = affine_map<(d0, d1, d2, d3) -> (d0 * 6400 + d1 * 6400 + d2 * 80 + d3)>
#map1 = affine_map<(d0, d1, d2, d3) -> (d0 * 25 + d1 * 25 + d2 * 5 + d3)>
#map2 = affine_map<(d0, d1, d2, d3) -> (d0 * 92416 + d1 * 5776 + d2 * 76 + d3)>
module {
  func.func @foo(%arg0: memref<16xsi8>, %arg1: memref<?x1x80x80xui8>, %arg2: memref<16xsi8>, %arg3: memref<16x1x5x5xsi8>) -> memref<?x16x76x76xsi8> {
    %0 = bufferization.to_tensor %arg3 : memref<16x1x5x5xsi8>
    %1 = bufferization.to_tensor %arg2 : memref<16xsi8>
    %2 = bufferization.to_tensor %arg1 : memref<?x1x80x80xui8>
    %3 = bufferization.to_tensor %arg0 : memref<16xsi8>
    %4 = "my.op"(%2, %0, %1, %3) : (tensor<?x1x80x80xui8>, tensor<16x1x5x5xsi8>, tensor<16xsi8>, tensor<16xsi8>) -> tensor<?x16x76x76xsi8>
    %5 = bufferization.to_memref %4 : memref<?x16x76x76xsi8>
    return %5 : memref<?x16x76x76xsi8>
  }
  func.func @bar(%arg0: memref<16xsi8>, %arg1: memref<1x1x80x80xui8, #map>, %arg2: memref<16xsi8>, %arg3: memref<16x1x5x5xsi8, #map1>) -> memref<1x16x76x76xsi8, #map2> {
    %cast = memref.cast %arg1 : memref<1x1x80x80xui8, #map> to memref<?x1x80x80xui8>
    %cast_0 = memref.cast %arg3 : memref<16x1x5x5xsi8, #map1> to memref<16x1x5x5xsi8>
    %0 = call @foo(%arg0, %cast, %arg2, %cast_0) : (memref<16xsi8>, memref<?x1x80x80xui8>, memref<16xsi8>, memref<16x1x5x5xsi8>) -> memref<?x16x76x76xsi8>
    %cast_1 = memref.cast %0 : memref<?x16x76x76xsi8> to memref<1x16x76x76xsi8, #map2>
    return %cast_1 : memref<1x16x76x76xsi8, #map2>
  }
}

Output when ToMemref are asserted:

<stdin>:16:14: error: to_memref ops not supported during One-Shot Analysis
        %5 = bufferization.to_memref %4 : memref<?x16x76x76xsi8, #map2>
             ^
<stdin>:16:14: note: see current operation: %6 = "bufferization.to_memref"(%5) : (tensor<?x16x76x76xsi8>) -> memref<?x16x76x76xsi8, affine_map<(d0, d1, d2, d3) -> (d0 * 92416 + d1 * 5776 + d2 * 76 + d3)>>

First we need to agree that this test is valid and should be bufferized.
If so, we may want to identify the supported cases.

Thanks in advance,
Maya

The input in your test case has mixed tensor/memref ops. How did you get to that state? Can you use -one-shot-bufferize="bufferize-function-boundaries" on the initial IR when everything is tensors?

We could indeed support a few cases with bufferization.to_memref and bufferization.to_tensor. But it would be pretty fragile. The main problem is that One-Shot Bufferize cannot analyze through memref code. Changing the IR just a little bit can make it so that One-Shot Bufferize has to bail. That's why we forbid to_memref in general -- to make it less surprising.

Btw, you should be able to run bufferize your IR with -one-shot-bufferize="copy-before-write". That will skip the analysis and to_memref/to_tensor should work. But it will introduce an extra copy for every op that writes to memory.

The input in your test case has mixed tensor/memref ops. How did you get to that state? Can you use -one-shot-bufferize="bufferize-function-boundaries" on the initial IR when everything is tensors?

Actually the input to my all program uses memrefs (we created it). For example @bar from my reproducer.
Then I get tensor based functions (like @foo from my reproducer) and I need to bufferize them and connect them to the existing program.
So I don't have a state where everything is tensor based..

Btw, you should be able to run bufferize your IR with -one-shot-bufferize="copy-before-write". That will skip the analysis and to_memref/to_tensor should work. But it will introduce an extra copy for every op that writes to memory.

Interesting.. I just need to make sure I can avoid them later, because my program must run as fast as possible. Do you know some passes that can clean these copies?
Maybe I can change the code a bit so the "copy-before-write" will be decided per function? Because I don't mind if there will be copies in @bar. It's @foo that I want to keep clean.

I will be glad to contribute and I can also add a patch that supports more cases. I just don't know what other cases are legal.

Thanks a lot
Maya

Interesting.. I just need to make sure I can avoid them later, because my program must run as fast as possible. Do you know some passes that can clean these copies?

We don't have a pass for that. You would essentially have to reimplement the entire bufferization analysis. It may be possible to write a smaller removal pass that just considers simple cases.

Basically the current bufferization analyzes the IR to dectect conflict where copies are needed; i.e, insert copies when needed. One alternative that we considered was inserting copies everywhere, then having a pass that removes them. But we decided against that because analyses are generally easier on tensors than on memrefs.

Maybe I can change the code a bit so the "copy-before-write" will be decided per function? Because I don't mind if there will be copies in @bar. It's @foo that I want to keep clean.

That could work. The entry point is bufferization::bufferizeModuleOp. This function calls bufferizeOp, which uses options.copyBeforeWrite. You could try toggling that flag based on the function. Probably some smaller changes to the analysis are needed to set up the state in FuncBufferizableOpInterfaceImpl.cpp. E.g., when copy-before-write is on, aliasingFuncArgs and aliasingReturnVals should be set such that every funcBbArg is aliasing every func result. Similarly, readBbArgs should contain all bbArgs. These fields are usually filled by OneShotAnalysis, but if you run with copu-before-write=true, the analysis is skipped. But the caller of a function expects the analysis results of the callee, so you have to set it up manually.

Another problem is memory deallocation. It looks like your operation returns a new allocation after bufferization (tensor<?x16x76x76xsi8> is different from all other operand types). This is generally not supported in the current bufferization because it would require reference counting or a similar technique. What you can do is adding a "destination" tensor to your op and passing that tensor to the function. This is similar to how linalg.generic has an outs operand.