This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/SCF/
-
Dialect/
-
SCF/
5/5
SCF.cpp
-
test/Dialect/SCF/
-
Dialect/
-
SCF/
-
canonicalize.mlir

Differential D126310

Do not destroy attrs in MergeNestedParallelLoops
AbandonedPublic

Authored by gflegar on May 24 2022, 10:58 AM.

Download Raw Diff

Details

Reviewers

csigg
herhut

Summary

We can only merge nested loops if that does not destroy
attribute information. While it might be possible to merge
loops with attributes in some cases, for now we just disallow
merging if any of the two loops have additional attributes.

This change prevents the canonicalizer from destroying
information added with greadilyMapParallelSCFToGPU.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gflegar created this revision.May 24 2022, 10:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2022, 10:58 AM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 20 others. · View Herald Transcript

gflegar requested review of this revision.May 24 2022, 10:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2022, 10:58 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

gflegar added a reviewer: csigg.May 24 2022, 10:59 AM

Harbormaster completed remote builds in B166091: Diff 431728.May 24 2022, 12:36 PM

I think the intended way is to call addWithLabel() instead. But I'm not sure that's what we want to do, it does seem clunky to exclude this pattern from canonicalization until we get to greedilyMapParallelSCFToGPU.

Would it be possible to instead apply the mapping to gpu early (when we tile the loops) and carry the attributes forward during canonicalization?

Sure, I can change to addWithLabel instead.

I tried delaying the canonicalizer pass till after the MapParallelLoops pass but that does not help. The canonicalizer still merges multiple loops (and destroys the attributes created by MapParallelLoops). So to make it work without removing the pattern, we would need to delay it all the way after the ConvertParallelLoopToGpu pass. However, that pass fails unless we've run the canonicalizer before it, didn't investigate why exactly yet. (here is the evolution of IR in the working pipeline that just disables this patterns, while here is what happens if we try to delay canonicalizer until after ConvertParallelLoopToGpu).

There's a bigger design question here about the current solution where we implicitly encode how a parallel loop maps to specific hardware by generating a specific nest of loops, which should be semantically equivalent to a single loop - and this is where things break, since the canonicalizer rightfully makes this assumption. Arguably, we should somehow explicitly encode the mapping in the future, so I am considering this CL to be only a quick workaround while we're figuring out how to best do that.

However, there might be a nicer workaround: I would say it's wrong for the canonicalizer to merge loops with different attributes, as that leads to the loss of some of them, so I can try to fix the pattern to not do that. That would work around our problem, so we can get the pipeline working (assuming all goes well). I would still consider it a workaround, as we're still being implicit about what the loop nest means, but it's at least not as much of a hack as the current approach.

Prevent merging loops with attrs instead

gflegar retitled this revision from Label MergeNestedParallelLoops patterns to Do not destroy attrs in MergeNestedParallelLoops.May 30 2022, 4:42 AM

gflegar edited the summary of this revision. (Show Details)

Worked like a charm.

Now I'm also thinking that encoding hardware mapping information in scf.parallel's attributes is probably explicit enough, as long as various passes do not destroy this information (and if they do, that should be treated as a bug).

Harbormaster completed remote builds in B166890: Diff 432888.May 30 2022, 4:48 AM

Looks good, thanks!

Could you please get another approval from e.g. Stephan or Nicolas, I'm not feeling confident enough to put the official stamp on it.

mlir/lib/Dialect/SCF/SCF.cpp
2187	Checking whether `attr` is not in `ParallelOp::getAttributeNames()` might be slightly better.

gflegar added a reviewer: herhut.May 30 2022, 9:40 AM

gflegar added inline comments.May 30 2022, 9:50 AM

mlir/lib/Dialect/SCF/SCF.cpp
2187	Is it? Would you know what exactly is the semantics of `getAttributeNames()`? Are all of the values returned by it mandatory arguments to `ParallelOp`? My thinking here was that we should err on the side of not merging loops if it could lead to information loss. We obviously have to ignore mandatory attributes (as otherwise, we would end up merging nothing), but for everything else, we should not merge, unless someone explicitly states in which situations it is o.k. to merge a loop with a specific attribute.

csigg added inline comments.May 30 2022, 10:28 AM

mlir/lib/Dialect/SCF/SCF.cpp
2187	Yes, `getAttributeNames()` returns all known op attributes. At the moment, this is equivalent to `ParallelOp::getOperandSegmentSizeAttr()`. My thinking is that if the op knows about an certain attribute, the canonicalizer should handle it.

gflegar added inline comments.May 30 2022, 10:58 AM

mlir/lib/Dialect/SCF/SCF.cpp
2187	What I'm trying to avoid is a situation like the following: You add your new known attribute `is_awesome` (optional boolean) that signifies you can do awesome optimizations on your parallel loop. When you're adding it, you're not aware about this merging loops canonicalizer pass. You write a test that checks your loop is being awesomely optimized, e.g.: scf.parallel (%arg0) = (%c0) to (%c16) step (%c1) { <some operations> scf.yield } {is_awesome = 1} and that works like a charm. You build your pipeline, everything is working, and you're happy and blissfully unaware of the loop merging canonicalization. However, there's a potential problem. At some point your compiler gets the following IR: scf.parallel (%arg0) = (%c0) to (%c16) step (%c1) { scf.parallel (%arg1) = (%c0) to (%c16) step (%c1) { <some operations> scf.yield } {is_awesome = 1} scf.yield } {is_awesome = 1} If we use `getAttributeNames` here, the canonicalizer will happily replace this loop with: scf.parallel (%arg0, %arg1) = (%c0, %c0) to (%c16, %c16) step (%c1, %c1) { <some operations> scf.yield } and now your loop is no longer as awesome. If we explicitly list only the attributes we want to ignore while merging (as this CL does), the loop will not be merged. Better yet, the `canonicalize.mlir` test (below) will likely fail when you add a new attribute, since loop merging will be broken, letting you know that you should decide how your new attribute should be handled during loop merger.

csigg added inline comments.May 30 2022, 11:32 AM

mlir/lib/Dialect/SCF/SCF.cpp
2187	Good point, you convinced me.

gflegar marked 3 inline comments as done.May 31 2022, 3:31 AM

I don't think there are any guarantees that MLIR preserves unknown attributes during canonicalization. So this would be the first precedent in that direction.

Also, this only hides the problem, as the loops would still be combined if we ran canonicalize twice before attaching the mapping attribute. In essence, the issue is that tiling, mapping to gpu and the transformation to gpu dialect have to run after each other without being disturbed by other passes. So the best fix would be to ensure this is the case.

In essence, the issue is that tiling, mapping to gpu and the transformation to gpu dialect have to run after each other without being disturbed by other passes.

That wouldn't work either, as transformation to gpu relies on canonicalizing the output from tiling (specifically, folding the loop steps into constants).

We can achieve what we need by adding LoopInvariantCodeMotionPass. Abandoning this.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

SCF/

SCF.cpp

10 lines

test/

Dialect/

SCF/

canonicalize.mlir

30 lines

Diff 432888

mlir/lib/Dialect/SCF/SCF.cpp

Show First 20 Lines • Show All 2,174 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(ParallelOp op,
Block &outerBody = op.getLoopBody().front();		Block &outerBody = op.getLoopBody().front();
if (!llvm::hasSingleElement(outerBody.without_terminator()))		if (!llvm::hasSingleElement(outerBody.without_terminator()))
return failure();		return failure();

auto innerOp = dyn_cast<ParallelOp>(outerBody.front());		auto innerOp = dyn_cast<ParallelOp>(outerBody.front());
if (!innerOp)		if (!innerOp)
return failure();		return failure();

		// Merging loops with attributes is not supported yet.
		// We need to figure out how to properly merge those attributes without
		// losing information.
		auto isUnexpectedAttr = [](mlir::NamedAttribute attr) {
		return attr.getName() != ParallelOp::getOperandSegmentSizeAttr();
		csiggUnsubmitted Done Reply Inline Actions Checking whether `attr` is not in `ParallelOp::getAttributeNames()` might be slightly better. csigg: Checking whether `attr` is not in `ParallelOp::getAttributeNames()` might be slightly better.
		gflegarAuthorUnsubmitted Done Reply Inline Actions Is it? Would you know what exactly is the semantics of `getAttributeNames()`? Are all of the values returned by it mandatory arguments to `ParallelOp`? My thinking here was that we should err on the side of not merging loops if it could lead to information loss. We obviously have to ignore mandatory attributes (as otherwise, we would end up merging nothing), but for everything else, we should not merge, unless someone explicitly states in which situations it is o.k. to merge a loop with a specific attribute. gflegar: Is it? Would you know what exactly is the semantics of `getAttributeNames()`? Are all of the…
		csiggUnsubmitted Done Reply Inline Actions Yes, `getAttributeNames()` returns all known op attributes. At the moment, this is equivalent to `ParallelOp::getOperandSegmentSizeAttr()`. My thinking is that if the op knows about an certain attribute, the canonicalizer should handle it. csigg: Yes, `getAttributeNames()` returns all known op attributes. At the moment, this is equivalent…
		gflegarAuthorUnsubmitted Done Reply Inline Actions What I'm trying to avoid is a situation like the following: You add your new known attribute `is_awesome` (optional boolean) that signifies you can do awesome optimizations on your parallel loop. When you're adding it, you're not aware about this merging loops canonicalizer pass. You write a test that checks your loop is being awesomely optimized, e.g.: scf.parallel (%arg0) = (%c0) to (%c16) step (%c1) { <some operations> scf.yield } {is_awesome = 1} and that works like a charm. You build your pipeline, everything is working, and you're happy and blissfully unaware of the loop merging canonicalization. However, there's a potential problem. At some point your compiler gets the following IR: scf.parallel (%arg0) = (%c0) to (%c16) step (%c1) { scf.parallel (%arg1) = (%c0) to (%c16) step (%c1) { <some operations> scf.yield } {is_awesome = 1} scf.yield } {is_awesome = 1} If we use `getAttributeNames` here, the canonicalizer will happily replace this loop with: scf.parallel (%arg0, %arg1) = (%c0, %c0) to (%c16, %c16) step (%c1, %c1) { <some operations> scf.yield } and now your loop is no longer as awesome. If we explicitly list only the attributes we want to ignore while merging (as this CL does), the loop will not be merged. Better yet, the `canonicalize.mlir` test (below) will likely fail when you add a new attribute, since loop merging will be broken, letting you know that you should decide how your new attribute should be handled during loop merger. gflegar: What I'm trying to avoid is a situation like the following: You add your new known attribute…
		csiggUnsubmitted Done Reply Inline Actions Good point, you convinced me. csigg: Good point, you convinced me.
		};
		if (llvm::any_of(op->getAttrs(), isUnexpectedAttr) \|\|
		llvm::any_of(innerOp->getAttrs(), isUnexpectedAttr))
		return failure();

auto hasVal = [](const auto &range, Value val) {		auto hasVal = [](const auto &range, Value val) {
return llvm::find(range, val) != range.end();		return llvm::find(range, val) != range.end();
};		};

for (auto val : outerBody.getArguments())		for (auto val : outerBody.getArguments())
if (hasVal(innerOp.getLowerBound(), val) \|\|		if (hasVal(innerOp.getLowerBound(), val) \|\|
hasVal(innerOp.getUpperBound(), val) \|\|		hasVal(innerOp.getUpperBound(), val) \|\|
hasVal(innerOp.getStep(), val))		hasVal(innerOp.getStep(), val))
▲ Show 20 Lines • Show All 850 Lines • Show Last 20 Lines

mlir/test/Dialect/SCF/canonicalize.mlir

	Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
	// CHECK: [[B1:%.]] = memref.dim {{.}}, [[C1]]			// CHECK: [[B1:%.]] = memref.dim {{.}}, [[C1]]
	// CHECK: [[B2:%.]] = memref.dim {{.}}, [[C2]]			// CHECK: [[B2:%.]] = memref.dim {{.}}, [[C2]]
	// CHECK: scf.parallel ([[V0:%.]], [[V1:%.]], [[V2:%.*]]) = ([[C0]], [[C0]], [[C0]]) to ([[B0]], [[B1]], [[B2]]) step ([[C1]], [[C1]], [[C1]])			// CHECK: scf.parallel ([[V0:%.]], [[V1:%.]], [[V2:%.*]]) = ([[C0]], [[C0]], [[C0]]) to ([[B0]], [[B1]], [[B2]]) step ([[C1]], [[C1]], [[C1]])
	// CHECK: memref.load {{.*}}{{\[}}[[V0]], [[V1]], [[V2]]]			// CHECK: memref.load {{.*}}{{\[}}[[V0]], [[V1]], [[V2]]]
	// CHECK: memref.store {{.*}}{{\[}}[[V0]], [[V1]], [[V2]]]			// CHECK: memref.store {{.*}}{{\[}}[[V0]], [[V1]], [[V2]]]

	// -----			// -----

				func.func @nested_parallel_attr(%0: memref<?x?xf64>) -> memref<?x?xf64> {
				%c0 = arith.constant 0 : index
				%c1 = arith.constant 1 : index
				%1 = memref.dim %0, %c0 : memref<?x?xf64>
				%2 = memref.dim %0, %c1 : memref<?x?xf64>
				%3 = memref.alloc(%1, %2) : memref<?x?xf64>
				scf.parallel (%arg1) = (%c0) to (%1) step (%c1) {
				scf.parallel (%arg2) = (%c0) to (%2) step (%c1) {
				%4 = memref.load %0[%arg1, %arg2] : memref<?x?xf64>
				memref.store %4, %3[%arg1, %arg2] : memref<?x?xf64>
				scf.yield
				} {mapping = [{bound = affine_map<(d0) -> (d0)>, map = affine_map<(d0) -> (d0)>, processor = 3 : i64}]}
				scf.yield
				} {mapping = [{bound = affine_map<(d0) -> (d0)>, map = affine_map<(d0) -> (d0)>, processor = 0 : i64}]}
				return %3 : memref<?x?xf64>
				}

				// CHECK-LABEL: func @nested_parallel_attr(
				// CHECK-DAG: [[C0:%.*]] = arith.constant 0 : index
				// CHECK-DAG: [[C1:%.*]] = arith.constant 1 : index
				// CHECK-DAG: [[B0:%.]] = memref.dim {{.}}, [[C0]]
				// CHECK-DAG: [[B1:%.]] = memref.dim {{.}}, [[C1]]
				// CHECK: scf.parallel ([[V0:%.*]]) = ([[C0]]) to ([[B0]]) step ([[C1]])
				// CHECK: scf.parallel ([[V1:%.*]]) = ([[C0]]) to ([[B1]]) step ([[C1]])
				// CHECK: memref.load {{.*}}{{\[}}[[V0]], [[V1]]]
				// CHECK: memref.store {{.*}}{{\[}}[[V0]], [[V1]]]
				// CHECK: {mapping = {{.*}} processor = 3
				// CHECK: {mapping = {{.*}} processor = 0
				// -----

	func.func private @side_effect()			func.func private @side_effect()
	func.func @one_unused(%cond: i1) -> (index) {			func.func @one_unused(%cond: i1) -> (index) {
	%0, %1 = scf.if %cond -> (index, index) {			%0, %1 = scf.if %cond -> (index, index) {
	func.call @side_effect() : () -> ()			func.call @side_effect() : () -> ()
	%c0 = "test.value0"() : () -> (index)			%c0 = "test.value0"() : () -> (index)
	%c1 = "test.value1"() : () -> (index)			%c1 = "test.value1"() : () -> (index)
	scf.yield %c0, %c1 : index, index			scf.yield %c0, %c1 : index, index
	} else {			} else {
	▲ Show 20 Lines • Show All 1,316 Lines • Show Last 20 Lines