This is an archive of the discontinued LLVM Phabricator instance.

[mlir][bufferization] Privatize buffers for parallel regions
ClosedPublic

Authored by springerm on Aug 31 2023, 7:39 AM.

Details

Summary

One-Shot Bufferize correctly handles RaW conflicts around repetitive regions (loops). Specical handling is needed for parallel regions. These are a special kind of repetitive regions that can have additional RaW conflicts that would not be present if the regions would be executed sequentially.

Example:

%0 = bufferization.alloc_tensor()
scf.forall ... {
  %1 = linalg.fill ins(...) outs(%0)
  ...
  scf.forall.in_parallel {
    tensor.parallel_insert_slice %1 into ...
  }
}

A separate (private) buffer must be allocated for each iteration of the scf.forall loop.

This change adds a new interface method to BufferizableOpInterface to detect parallel regions. By default, regions are assumed to be sequential.

A buffer is privatized if an OpOperand bufferizes to a memory read inside a parallel region that is different from the parallel region where operand's value is defined.

Diff Detail

Event Timeline

springerm created this revision.Aug 31 2023, 7:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 7:39 AM
springerm requested review of this revision.Aug 31 2023, 7:39 AM
maerhart added inline comments.Sep 4 2023, 7:56 AM
mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h
687

region?

mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp
127

Nit: maybe assert that it is also a repetitive region since that is requested in the documentation?

mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
551

Does this include reads of the result of the region-containing op? I.e., there are writes inside the parallel region, but they are basically discarded (like doing nothing in the in_parallel), is it guaranteed that they won't show up?

springerm updated this revision to Diff 555829.Sep 5 2023, 1:58 AM
springerm marked 3 inline comments as done.

address comments

mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
551

If an alias of the scf.forall result (which is also an alias of the init_arg) is written in the loop body and read afterwards, something (usually the op inside the loop) must bufferize out-of-place. This is irrespective of parallelism. But I added a test case where there is no read that bufferizes in-place.

maerhart accepted this revision.Sep 5 2023, 3:10 AM
This revision is now accepted and ready to land.Sep 5 2023, 3:10 AM