This is an archive of the discontinued LLVM Phabricator instance.

[mlir][SCF] foreach_thread: Capture shared output tensors explicitly
ClosedPublic

Authored by springerm on Sep 1 2022, 7:30 AM.

Details

Summary

This change refines the semantics of scf.foreach_thread. Tensors that are inserted into in the terminator must now be passed to the region explicitly via shared_outs. Inside of the body of the op, those tensors are then accessed via block arguments.

The body of a scf.foreach_thread is now treated as a repetitive region. I.e., op dominance can no longer be used in conflict detection when using a value that is defined outside of the body. Such uses may now be considered as conflicts (if there is at least one read and one write in the body), effectively privatizing the tensor. Shared outputs are not privatized when they are used via their corresponding block arguments.

As part of this change, it was also necessary to update the "tiling to scf.foreach_thread", such that the generated tensor.extract_slice ops use the scf.foreach_thread's block arguments. This is implemented by cloning the TilingInterface op inside the scf.foreach_thread, rewriting all of its outputs with block arguments and then calling the tiling implementation. Afterwards, the cloned op is deleted again.

Depends On D133113

Diff Detail

Event Timeline

springerm created this revision.Sep 1 2022, 7:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2022, 7:30 AM
springerm requested review of this revision.Sep 1 2022, 7:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2022, 7:30 AM
nicolasvasilache accepted this revision.Sep 2 2022, 4:51 AM

Nice!

mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
340

Suggestion: "The future buffers corresponding to these tensors"

343

The semantics are concurrent and are defined as they will execute in some unspecified order, this is not UB.

549

Can we find a better name here ?
getDests feels a bit misleading: on one hand it reads like getResults, OTOH we are talking about the BBargs that correspond to shared tensors.

mlir/lib/Dialect/SCF/IR/SCF.cpp
1323–1325

hmm .. I know this predates you but maybe we should have an llvm_unreachable to prevent from silently propagating null values.

This will turn into an interface at some point.

This revision is now accepted and ready to land.Sep 2 2022, 4:51 AM
springerm updated this revision to Diff 457572.Sep 2 2022, 5:48 AM
springerm marked 4 inline comments as done.

address comments

mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
549

Changed the return value to SmallVector<BlockArgument> to clarify that it's the BBargs, not the shared_outs.

This revision was landed with ongoing or failed builds.Sep 2 2022, 5:54 AM
This revision was automatically updated to reflect the committed changes.

Thanks for adding this.

I have some questions about naming, can we continue that discussion?

mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
549

Several things seem strange about the naming here:

  1. In the assembly format the new operands are called shared_outs but in the tablegen spec its called outputs
  2. Here getDests uses another name different from the other two.

From a user's perspective it's not clear, and there's also no documentation above this extra class method that is going to explain what dests is.

I agree that the return type helps clarify, but to be sure a user goes from use in source -> tablegen file -> CPP file to confirm.

Can we borrow naming convention from SCFForOp for these methods, e.g.:

BlockArgument getRegionArgForSharedOutsOperand(OpOperand &opOperand);

or

SmallVector<BlockArgument> getRegionArgsForSharedOutsOperands()
christopherbate added inline comments.Sep 2 2022, 9:10 AM
mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
549

Oh never mind, I see this is under perform concurrently not ForeachThread! sorry for the noise.