This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Dialect/SCF/SCF.cpp
1736	Could you rather use `rewriter.mergeBlocks` here? It lets you remap values and doesn't require copying the operation. Not necessarily in the `bodyBuilder`, just create a new ParallelOp and merge the body of the inner loop into that of newly created loop, then erase the duplicate terminator.

This revision now requires changes to proceed.May 20 2021, 5:07 AM

Hardcode84 added inline comments.May 20 2021, 8:35 AM

mlir/lib/Dialect/SCF/SCF.cpp
1736	I didn't quite get how to do this. If we create create new outer ParallelOp and try to merge block from old inner ParallelOp into it, this new block will still reference iterArgs from old outer ParallelOp (we cannot remap them via mergeBlocks api).

ftynse added inline comments.May 20 2021, 8:39 AM

mlir/lib/Dialect/SCF/SCF.cpp
1736	We can remap, that's what the third argument is for in https://github.com/llvm/llvm-project/blob/182162b61629f039e7aafc3f7eaab9cc64a81c83/mlir/include/mlir/IR/PatternMatch.h#L734.

ftynse accepted this revision.May 20 2021, 8:43 AM

ftynse added inline comments.

mlir/lib/Dialect/SCF/SCF.cpp
1736	My bad, you mean the outer loop.

This revision is now accepted and ready to land.May 20 2021, 8:43 AM

Hardcode84 added inline comments.May 20 2021, 8:45 AM

mlir/lib/Dialect/SCF/SCF.cpp
1736	It can remap only source block arguments as I understand but iter vars from outer loop are no inner block arguments, they are just captured in inner block.

Anyway, there is another issue, I should also add check that outer block iter vars not being used as inner loop bounds/steps

check outer iter vars not used as bounds/steps for inner loop

Harbormaster completed remote builds in B105473: Diff 346804.May 20 2021, 11:56 AM

Closed by commit rG4184018253e7: [mlir][SCF] Canonicalize nested ParallelOp's (authored by Hardcode84). · Explain WhyMay 22 2021, 4:00 AM

This revision was automatically updated to reflect the committed changes.

Hardcode84 added a commit: rG4184018253e7: [mlir][SCF] Canonicalize nested ParallelOp's.

What was the motivation behind this canonicalization? In particular, why would a combined parallel loop be considered the canonical form?

We found this pattern because it undoes the tiling we do in some cases. I agree the two forms are semantically equivalent, so the transformation is correct. It is inconvenient for our use case, though.

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 7:15 AM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 2 others. · View Herald Transcript

Isn't the whole point of canonicalization to convert semantically equivalent codes to the single form? So we should either always merge nested loops or always split them (which makes nested loops support in scf.parallel useless).

In D102799#3537347, @Hardcode84 wrote:

Isn't the whole point of canonicalization to convert semantically equivalent codes to the single form? So we should either always merge nested loops or always split them (which makes nested loops support in scf.parallel useless).

I agree with this. As a single scf.parallel op can represent multi-dimensional nests, the canonical form would be one where successive (1-d or n-d) scf.parallel ops are combined into a larger n-d scf.parallel op. However, a few issues with this revision:

It's missing a commit summary; please don't leave it empty in such cases.
Test cases are missing nests where one or more of the input scf.parallel were already multi-dimensional. This is important to exercise.
The rewrite pattern is missing a doc comment and code comments at places.

mlir/lib/Dialect/SCF/SCF.cpp
1722	Can you just use `llvm::is_contained`? (In that case, this lambda isn't adding anything.)

Revision Contents

Path

Size

mlir/

lib/

Dialect/

SCF/

SCF.cpp

61 lines

test/

Dialect/

SCF/

canonicalize.mlir

39 lines

Diff 347196

mlir/lib/Dialect/SCF/SCF.cpp

Show First 20 Lines • Show All 1,699 Lines • ▼ Show 20 Lines	for (auto dim : llvm::zip(op.lowerBound(), op.upperBound())) {
rewriter.replaceOp(op, op.initVals());		rewriter.replaceOp(op, op.initVals());
return success();		return success();
}		}
}		}
return failure();		return failure();
}		}
};		};

		struct MergeNestedParallelLoops : public OpRewritePattern<ParallelOp> {
		using OpRewritePattern<ParallelOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(ParallelOp op,
		PatternRewriter &rewriter) const override {
		Block &outerBody = op.getLoopBody().front();
		if (!llvm::hasSingleElement(outerBody.without_terminator()))
		return failure();

		auto innerOp = dyn_cast<ParallelOp>(outerBody.front());
		if (!innerOp)
		return failure();

		auto hasVal = [](const auto &range, Value val) {
		return llvm::find(range, val) != range.end();
		bondhugulaUnsubmitted Not Done Reply Inline Actions Can you just use `llvm::is_contained`? (In that case, this lambda isn't adding anything.) bondhugula: Can you just use `llvm::is_contained`? (In that case, this lambda isn't adding anything.)
		};

		for (auto val : outerBody.getArguments())
		if (hasVal(innerOp.lowerBound(), val) \|\|
		hasVal(innerOp.upperBound(), val) \|\| hasVal(innerOp.step(), val))
		return failure();

		// Reductions are not supported yet.
		if (!op.initVals().empty() \|\| !innerOp.initVals().empty())
		return failure();

		auto bodyBuilder = [&](OpBuilder &builder, Location /loc/,
		ValueRange iterVals, ValueRange) {
		Block &innerBody = innerOp.getLoopBody().front();
		ftynseUnsubmitted Not Done Reply Inline Actions Could you rather use `rewriter.mergeBlocks` here? It lets you remap values and doesn't require copying the operation. Not necessarily in the `bodyBuilder`, just create a new ParallelOp and merge the body of the inner loop into that of newly created loop, then erase the duplicate terminator. ftynse: Could you rather use `rewriter.mergeBlocks` here? It lets you remap values and doesn't require…
		Hardcode84AuthorUnsubmitted Not Done Reply Inline Actions I didn't quite get how to do this. If we create create new outer ParallelOp and try to merge block from old inner ParallelOp into it, this new block will still reference iterArgs from old outer ParallelOp (we cannot remap them via mergeBlocks api). Hardcode84: I didn't quite get how to do this. If we create create new outer ParallelOp and try to merge…
		ftynseUnsubmitted Not Done Reply Inline Actions We can remap, that's what the third argument is for in https://github.com/llvm/llvm-project/blob/182162b61629f039e7aafc3f7eaab9cc64a81c83/mlir/include/mlir/IR/PatternMatch.h#L734. ftynse: We can remap, that's what the third argument is for in https://github.com/llvm/llvm…
		ftynseUnsubmitted Not Done Reply Inline Actions My bad, you mean the outer loop. ftynse: My bad, you mean the outer loop.
		Hardcode84AuthorUnsubmitted Not Done Reply Inline Actions It can remap only source block arguments as I understand but iter vars from outer loop are no inner block arguments, they are just captured in inner block. Hardcode84: It can remap only source block arguments as I understand but iter vars from outer loop are no…
		assert(iterVals.size() ==
		(outerBody.getNumArguments() + innerBody.getNumArguments()));
		BlockAndValueMapping mapping;
		mapping.map(outerBody.getArguments(),
		iterVals.take_front(outerBody.getNumArguments()));
		mapping.map(innerBody.getArguments(),
		iterVals.take_back(innerBody.getNumArguments()));
		for (Operation &op : innerBody.without_terminator())
		builder.clone(op, mapping);
		};

		auto concatValues = [](const auto &first, const auto &second) {
		SmallVector<Value> ret;
		ret.reserve(first.size() + second.size());
		ret.assign(first.begin(), first.end());
		ret.append(second.begin(), second.end());
		return ret;
		};

		auto newLowerBounds = concatValues(op.lowerBound(), innerOp.lowerBound());
		auto newUpperBounds = concatValues(op.upperBound(), innerOp.upperBound());
		auto newSteps = concatValues(op.step(), innerOp.step());

		rewriter.replaceOpWithNewOp<ParallelOp>(op, newLowerBounds, newUpperBounds,
		newSteps, llvm::None, bodyBuilder);
		return success();
		}
		};

} // namespace		} // namespace

void ParallelOp::getCanonicalizationPatterns(RewritePatternSet &results,		void ParallelOp::getCanonicalizationPatterns(RewritePatternSet &results,
MLIRContext *context) {		MLIRContext *context) {
results.add<CollapseSingleIterationLoops, RemoveEmptyParallelLoops>(context);		results.add<CollapseSingleIterationLoops, RemoveEmptyParallelLoops,
		MergeNestedParallelLoops>(context);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ReduceOp		// ReduceOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void ReduceOp::build(		void ReduceOp::build(
OpBuilder &builder, OperationState &result, Value operand,		OpBuilder &builder, OperationState &result, Value operand,
▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

mlir/test/Dialect/SCF/canonicalize.mlir

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
// CHECK-NOT: scf.reduce.return		// CHECK-NOT: scf.reduce.return
// CHECK-NOT: scf.yield		// CHECK-NOT: scf.yield
// CHECK: [[V0:%.*]] = addi [[ARG0]], [[C1]]		// CHECK: [[V0:%.*]] = addi [[ARG0]], [[C1]]
// CHECK: [[V1:%.*]] = muli [[ARG1]], [[C3]]		// CHECK: [[V1:%.*]] = muli [[ARG1]], [[C3]]
// CHECK: return [[V0]], [[V1]]		// CHECK: return [[V0]], [[V1]]

// -----		// -----

		func @nested_parallel(%0: memref<?x?x?xf64>) -> memref<?x?x?xf64> {
		%c0 = constant 0 : index
		%c1 = constant 1 : index
		%c2 = constant 2 : index
		%1 = memref.dim %0, %c0 : memref<?x?x?xf64>
		%2 = memref.dim %0, %c1 : memref<?x?x?xf64>
		%3 = memref.dim %0, %c2 : memref<?x?x?xf64>
		%4 = memref.alloc(%1, %2, %3) : memref<?x?x?xf64>
		scf.parallel (%arg1) = (%c0) to (%1) step (%c1) {
		scf.parallel (%arg2) = (%c0) to (%2) step (%c1) {
		scf.parallel (%arg3) = (%c0) to (%3) step (%c1) {
		%5 = memref.load %0[%arg1, %arg2, %arg3] : memref<?x?x?xf64>
		memref.store %5, %4[%arg1, %arg2, %arg3] : memref<?x?x?xf64>
		scf.yield
		}
		scf.yield
		}
		scf.yield
		}
		return %4 : memref<?x?x?xf64>
		}

		// CHECK-LABEL: func @nested_parallel(
		// CHECK: [[C0:%.*]] = constant 0 : index
		// CHECK: [[C1:%.*]] = constant 1 : index
		// CHECK: [[C2:%.*]] = constant 2 : index
		// CHECK: [[B0:%.]] = memref.dim {{.}}, [[C0]]
		// CHECK: [[B1:%.]] = memref.dim {{.}}, [[C1]]
		// CHECK: [[B2:%.]] = memref.dim {{.}}, [[C2]]
		// CHECK: scf.parallel ([[V0:%.]], [[V1:%.]], [[V2:%.*]]) = ([[C0]], [[C0]], [[C0]]) to ([[B0]], [[B1]], [[B2]]) step ([[C1]], [[C1]], [[C1]])
		// CHECK: memref.load {{.*}}{{\[}}[[V0]], [[V1]], [[V2]]]
		// CHECK: memref.store {{.*}}{{\[}}[[V0]], [[V1]], [[V2]]]

		// -----

func private @side_effect()		func private @side_effect()
func @one_unused(%cond: i1) -> (index) {		func @one_unused(%cond: i1) -> (index) {
%c0 = constant 0 : index		%c0 = constant 0 : index
%c1 = constant 1 : index		%c1 = constant 1 : index
%c2 = constant 2 : index		%c2 = constant 2 : index
%c3 = constant 3 : index		%c3 = constant 3 : index
%0, %1 = scf.if %cond -> (index, index) {		%0, %1 = scf.if %cond -> (index, index) {
call @side_effect() : () -> ()		call @side_effect() : () -> ()
▲ Show 20 Lines • Show All 517 Lines • ▼ Show 20 Lines	func @cond_prop(%arg0 : i1) -> index {
%c4 = constant 4 : index		%c4 = constant 4 : index
%res = scf.if %arg0 -> index {		%res = scf.if %arg0 -> index {
%res1 = scf.if %arg0 -> index {		%res1 = scf.if %arg0 -> index {
%v1 = "test.get_some_value"() : () -> i32		%v1 = "test.get_some_value"() : () -> i32
scf.yield %c1 : index		scf.yield %c1 : index
} else {		} else {
%v2 = "test.get_some_value"() : () -> i32		%v2 = "test.get_some_value"() : () -> i32
scf.yield %c2 : index		scf.yield %c2 : index
}		}
scf.yield %res1 : index		scf.yield %res1 : index
} else {		} else {
%res2 = scf.if %arg0 -> index {		%res2 = scf.if %arg0 -> index {
%v3 = "test.get_some_value"() : () -> i32		%v3 = "test.get_some_value"() : () -> i32
scf.yield %c3 : index		scf.yield %c3 : index
} else {		} else {
%v4 = "test.get_some_value"() : () -> i32		%v4 = "test.get_some_value"() : () -> i32
scf.yield %c4 : index		scf.yield %c4 : index
}		}
scf.yield %res2 : index		scf.yield %res2 : index
}		}
return %res : index		return %res : index
}		}
// CHECK-DAG: %[[c1:.+]] = constant 1 : index		// CHECK-DAG: %[[c1:.+]] = constant 1 : index
// CHECK-DAG: %[[c4:.+]] = constant 4 : index		// CHECK-DAG: %[[c4:.+]] = constant 4 : index
// CHECK-NEXT: %[[if:.+]] = scf.if %arg0 -> (index) {		// CHECK-NEXT: %[[if:.+]] = scf.if %arg0 -> (index) {
// CHECK-NEXT: %{{.+}} = "test.get_some_value"() : () -> i32		// CHECK-NEXT: %{{.+}} = "test.get_some_value"() : () -> i32
▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][SCF] Canonicalize nested ParallelOps'sClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 347196

mlir/lib/Dialect/SCF/SCF.cpp

mlir/test/Dialect/SCF/canonicalize.mlir

[mlir][SCF] Canonicalize nested ParallelOps's
ClosedPublic