Add RegionBranchOpIntefface to scf.forall and scf.parallel op to make analysis trace through subregions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
As discussed on https://discourse.llvm.org/t/why-scf-forall-op-doesnt-have-regionbranchop-interface/70789/4, these two operation has special terminators.
Operation forall's terminator is InParallelOp without input operands.
Operation parallel's termnator is yield op without input operands.
So Interface "RegionBranchTerminatorOpInterface" don't need any more, return empty operands will keep “value propagation” invalid.
scf.parallel can have an scf.reduce terminator.
E.g.:
%init = arith.constant 0.0 : f32 scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init) -> f32 { %elem_to_reduce = load %buffer[%iv] : memref<100xf32> scf.reduce(%elem_to_reduce) : f32 { ^bb0(%lhs : f32, %rhs: f32): %res = arith.addf %lhs, %rhs : f32 scf.reduce.return %res : f32 } }
scf.parallel has a implicit scf.yield terminator(https://github.com/llvm/llvm-project/blob/cf1ef4161006e8119761b3a137423c23436bcf33/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td#L812).
And scf.reduce don't have terminator trait(https://github.com/llvm/llvm-project/blob/cf1ef4161006e8119761b3a137423c23436bcf33/mlir/include/mlir/Dialect/SCF/IR/SCFOps.td#L900).
The actual result of scf.parallel is generated by scf.reduce, but terminator of scf.parallel yield empty. So we can prevent value propagation without registe RegionBranchTerminatorOpInterface to terminator.