This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/SCF/
-
mlir/
-
Dialect/
-
SCF/
-
TransformOps/
8/8
SCFTransformOps.td
-
Utils/
4/4
Utils.h
-
lib/Dialect/SCF/
-
Dialect/
-
SCF/
-
TransformOps/
19/19
SCFTransformOps.cpp
-
Utils/
8/8
Utils.cpp
-
test/Dialect/SCF/
-
Dialect/
-
SCF/
-
transform-loop-fuse-sibling.mlir

Differential D157069

[SCF][Transform] Add transform.loop.fuse_sibling
ClosedPublic

Authored by Groverkss on Aug 3 2023, 10:46 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
ftynse
dcaballe
springerm

Commits

rG9aaf007a982a: [SCF][Transform] Add transform.loop.fuse_sibling

Summary

This patch adds a new transform operation transform.loop.fuse_sibling,
which given two loops, fuses them, assuming that they are independent.
The transform operation itself performs very basic checks to ensure
IR legality, and leaves the responsibility of ensuring independence on the user.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Groverkss created this revision.Aug 3 2023, 10:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 10:46 PM

Herald added subscribers: bviyer, Moerafaat, bzcheeseman and 21 others. · View Herald Transcript

Groverkss requested review of this revision.Aug 3 2023, 10:46 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptAug 3 2023, 10:46 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Groverkss added reviewers: ftynse, dcaballe.Aug 3 2023, 10:46 PM

Groverkss added a reviewer: springerm.Aug 3 2023, 10:53 PM

Harbormaster completed remote builds in B250253: Diff 547105.Aug 3 2023, 11:01 PM

springerm added inline comments.Aug 4 2023, 1:05 AM

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
319–320	Mention which kind of "loop" ops are supported. Also mention that the bounds must match and that a silenceable failure is produced otherwise. Also mention that both input handles are mapped to exactly one op and what happens if they are not.
324–325	Add the following: "This operation consumes the `loop` and `sibling` handles and produces the `fused_loop` handle."
328–329	I would rename these to indicate which loop is fused into which one. Maybe something like `target` and `source`.
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
378–380	What happens if the result of one loop is used as an "shared_outs" operand of the other loop? Also, could there be issues where the resulting IR violates dominance because some operands of ops inside the loop are not defined after the other loop. I think it should be possible to write a relatively simple check with `getUsedValuesDefinedAbove`.
383–384	I would make this and the `isOpSibiling` check a silenceable error. When there is an obvious error like "op type does not match", indicating there's something inherently wrong with the transform dialect script, I usually make it a definite error.
mlir/lib/Dialect/SCF/Utils/Utils.cpp
974–975	Can you give this function a `RewriterBase &` and use that rewriter to modify the IR. That would make it easier to reuse this helper function.
1025–1033	These should then also go through the rewriter.

nicolasvasilache added inline comments.Aug 4 2023, 1:11 AM

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
320	The legality of loop fusion is a tricky topic, we can't just approximate with "neither is an ancestor of another". You should at least restrict the transform to only work in the case of loops with no recursive side effects (i.e. on tensors only and without function calls like print). And this should also be well documented.

ftynse requested changes to this revision.Aug 4 2023, 1:22 AM

ftynse added inline comments.

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
319–320	Can this description be more extensive? It should mention that only `forall` loops can be fused, and which preconditions should hold for the loops. Also mention that there is no real legality check performed (see below).
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
332–333	The bounds being the same is not a sufficient criterion for the fusion to be legal. Consider scf.forall %i in %bounds store %i, %mem[%i] scf.forall %i in %bounds load %mem[%bounds - %i] that would trivially be incorrect if two loops are fused. I'm not necessary opposed to having a "dangerous" transform exposed as an op, but it must be very clearly documented as such.
336–341	`isa` followed by `cast`, even at a distance, is an anti-pattern. Call `dyn_cast` and check the resulting variables for being non-null instead.
370–373	This should be documented in the op description.
380	Nit: don't start error messages with a capital letter.
390	Nit: it's possible to construct an `ArrayRef` temporary from a single element, no need to use a vector here.
mlir/lib/Dialect/SCF/Utils/Utils.cpp
977	Pass in `RewriterBase &` as an argument to this function and use the `rewriter` available in the `apply` method.
980	Nit: expand auto unless the type is obvious from context (e.g., there's a cast on the RHS).
982	Nit: don't specify the number of vector stack elements unless there is a very good reason to.
983–985	Something like `llvm::to_vector(llvm::concat(range1, range2))` would probably work here.
1025–1033	This must use proper rewriting API, e.g. `replaceOp` and `eraseOp`. Low-level `Operation` API will result to catastrophic memory problems when used with proper rewriters.

This revision now requires changes to proceed.Aug 4 2023, 1:22 AM

ftynse added inline comments.Aug 4 2023, 1:24 AM

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
328–330	This will accept loops from, e.g., different functions. A better, and faster, condition is to check that both operations belong to the same block. Also, depending on the understanding of "sibling", this may accept as siblings two loops that have other operations (including other loops!) between them, which may not be legal.

Groverkss edited the summary of this revision. (Show Details)Aug 4 2023, 7:19 AM

Address comments

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
320	Right. I don't really want to make this operation check for the legality of loop fusion other than something that may produce invalid IR as that is not easy, but I still want this operation to be useable in those cases. I have changed the description to reflect this better.
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
328–330	I updated the documentation to describe what this operation is meant to do better.
332–333	Added documentation to make it clear that it is the responsibility of the user to consider these.
378–380	If the shared_outs use happens, then it's a dependence. This is what I wanted to check with this operation but wrongly assumed that isAncestor does this. What I want to check is if any result produced by the first loop is used by the second loop. For the dominance problem, I added a check that all values used by `target` must be defined before `source`.
mlir/lib/Dialect/SCF/Utils/Utils.cpp
983–985	For some reason, to_vector refuses to work with this. Tried: llvm::to_vector(llvm::concat<Value>(range1, range2)) But that fails with: error: taking the address of a temporary object of type 'mlir::Value' [-Waddress-of-temporary] Using the old implementation for now.

Harbormaster completed remote builds in B250315: Diff 547203.Aug 4 2023, 7:48 AM

ftynse requested changes to this revision.Aug 7 2023, 2:18 AM

ftynse added inline comments.

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
316	Nit: I'd just say "assuming the fusion is legal". Consumer/producer fusion is also fine here IMO.
326	s/matching/mapping
mlir/include/mlir/Dialect/SCF/Utils/Utils.h
189–190	Nit: in my understanding, "siblings" doesn't imply independence and vice versa, these are two separate conditions that must be met. Two adjacent loops can depend on each other, e.g. in producer/consumer way or have the first loop compute the trip count of the second loop.
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
329
336
341	Not all operands necessarily have a defining op, they may be block arguments, at which point this will be dereferencing a null pointer and likely crash. Furthermore, I suspect this condition one of the conditions here is either incorrect or unnecessary. I'm confused by the use of "source" and "target" here, so I'll just my own example: %results = scf.forall { // first_loop } // something in-between scf.forall %operands { // second_loop } If the body of `first_loop` is merged into `second_loop`, then the condition that has to be ensured is that all uses of %results are after `second_loop`. Since `second_loop` already follows `first_loop`, all operands of `first_loop` are already visible at `second_loop` if the input IR is valid, no need to check for that. If the body of the `second_loop` is merged into the `first_loop`, then the condition is that operands of `second_loop` are visible at `first_loop`, but the results are already okay. Also note that dominance is more complex than just order of operations in blocks. Operands may be defined in surrounding regions, or in dominating blocks that transfer control to the given block. Results can be used in nested regions, or dominated blocks to which the control is transferred. It is possible to assume (and actually check) that the current block belongs to a single-block region, which is the most common case in presence of structured control flow. When done carefully, this entire test will conservatively return failure. However, I am encouraging you to consider at least nested regions, otherwise the utility of this transform is quite limited.
359	Nit: I would rename this function to make it better correspond to what it actually does, e.g., `isForallWithIdenticalConfiguration`. Otherwise somebody will eventually try using it for actual fusion legality tests.
360–361	Nit: consider https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code. Here and below.
392	Nit: please follow https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements here and below.
404	I may sound obsessed with the term, but "not siblings" doesn't describe the actual check that has taken place, which also includes dominance checks. Couldn't you just return `DiagnosedSilenceableFailure` from the helper function and put a more descriptive error message there?

This revision now requires changes to proceed.Aug 7 2023, 2:18 AM

Add better dominance checking startergy, add more tests, address comments

Groverkss added inline comments.Aug 8 2023, 11:54 PM

mlir/include/mlir/Dialect/SCF/Utils/Utils.h
189–190	My understanding of siblings is that they are two loops that do not behave in a producer/consumer fashion. A result/side-effect produced by the first loop, should not be consumed by the other loop. Isn't this the same as being independent? Or am I missing something? I'm happy to change the name if you feel this is not the right word.
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
341	Regarding the unnecessary conditions, I now check the ordering and only do the required checks. Regarding dominance, I switched to using DominanceInfo, will that be enough? Also, I added some more test cases which check for dominance failures. While adding these tests, I also noticed that not only do the operands need to be checked, but we also need to check if any value used by the second_loop should be visible by the first loop. So, I added another check for this.

Add comment to explain context of isOpSibling

Harbormaster completed remote builds in B251297: Diff 548483.Aug 9 2023, 2:05 AM

ftynse accepted this revision.Aug 16 2023, 5:59 AM

ftynse added inline comments.

mlir/include/mlir/Dialect/SCF/Utils/Utils.h
189–190	In my understanding, sibling operations are merely operations that have a common parent. Nothing more. This is a natural extension of the parent/child terminology used extensively for operations nested in regions of other operations: two "child" operations that have a common parent are siblings. It doesn't imply anything about data flow between them. In non-polyhedral loop transformation terminology, "sibling fusion" is sometimes used in opposition to "producer/consumer fusion", but this is a very niche usage compared to the basic IR structure usage that is more pervasive. I could live with something like `doIndependentSiblingFusion` so it is clearer that "sibling" qualifies the kind of fusion and not the relation between operations, but a more verbose and less ambiguous name is welcome.

This revision is now accepted and ready to land.Aug 16 2023, 5:59 AM

Change method name

Groverkss added inline comments.Aug 19 2023, 2:49 AM

mlir/include/mlir/Dialect/SCF/Utils/Utils.h
189–190	Changed it to fuseIndependentSiblingForallLoops.

This revision was landed with ongoing or failed builds.Aug 19 2023, 3:03 AM

Closed by commit rG9aaf007a982a: [SCF][Transform] Add transform.loop.fuse_sibling (authored by Groverkss). · Explain Why

This revision was automatically updated to reflect the committed changes.

Groverkss added a commit: rG9aaf007a982a: [SCF][Transform] Add transform.loop.fuse_sibling.

Harbormaster completed remote builds in B253637: Diff 551729.Aug 19 2023, 3:55 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SCF/

TransformOps/

SCFTransformOps.td

35 lines

Utils/

Utils.h

11 lines

lib/

Dialect/

SCF/

TransformOps/

SCFTransformOps.cpp

141 lines

Utils/

Utils.cpp

65 lines

test/

Dialect/

SCF/

transform-loop-fuse-sibling.mlir

113 lines

Diff 551731

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td

Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{
::mlir::DiagnosedSilenceableFailure applyToOne(		::mlir::DiagnosedSilenceableFailure applyToOne(
::mlir::transform::TransformRewriter &rewriter,		::mlir::transform::TransformRewriter &rewriter,
::mlir::scf::IfOp ifOp,		::mlir::scf::IfOp ifOp,
::mlir::transform::ApplyToEachResultList &results,		::mlir::transform::ApplyToEachResultList &results,
::mlir::transform::TransformState &state);		::mlir::transform::TransformState &state);
}];		}];
}		}

		def LoopFuseSibling : Op<Transform_Dialect, "loop.fuse_sibling",
		[FunctionalStyleTransformOpTrait, MemoryEffectsOpInterface,
		DeclareOpInterfaceMethods<TransformOpInterface>]> {
		let summary = "Fuse a loop into another loop, assuming the fusion is legal.";
		ftynseUnsubmitted Done Reply Inline Actions Nit: I'd just say "assuming the fusion is legal". Consumer/producer fusion is also fine here IMO. ftynse: Nit: I'd just say "assuming the fusion is legal". Consumer/producer fusion is also fine here…

		let description = [{
		Fuses the `target` loop into the `source` loop assuming they are
		independent of each other. It is the responsibility of the user to ensure
		springermUnsubmitted Done Reply Inline Actions Mention which kind of "loop" ops are supported. Also mention that the bounds must match and that a silenceable failure is produced otherwise. Also mention that both input handles are mapped to exactly one op and what happens if they are not. springerm: Mention which kind of "loop" ops are supported. Also mention that the bounds must match and…
		nicolasvasilacheUnsubmitted Done Reply Inline Actions The legality of loop fusion is a tricky topic, we can't just approximate with "neither is an ancestor of another". You should at least restrict the transform to only work in the case of loops with no recursive side effects (i.e. on tensors only and without function calls like print). And this should also be well documented. nicolasvasilache: The legality of loop fusion is a tricky topic, we can't just approximate with "neither is an…
		GroverkssAuthorUnsubmitted Done Reply Inline Actions Right. I don't really want to make this operation check for the legality of loop fusion other than something that may produce invalid IR as that is not easy, but I still want this operation to be useable in those cases. I have changed the description to reflect this better. Groverkss: Right. I don't really want to make this operation check for the legality of loop fusion other…
		ftynseUnsubmitted Done Reply Inline Actions Can this description be more extensive? It should mention that only `forall` loops can be fused, and which preconditions should hold for the loops. Also mention that there is no real legality check performed (see below). ftynse: Can this description be more extensive? It should mention that only `forall` loops can be fused…
		that the given two loops are independent of each other, this operation will
		not performa any legality checks and will simply fuse the two given loops.

		Currently, the only fusion supported is when both `target` and `source`
		are `scf.forall` operations. For `scf.forall` fusion, the bounds and the
		springermUnsubmitted Done Reply Inline Actions Add the following: "This operation consumes the `loop` and `sibling` handles and produces the `fused_loop` handle." springerm: Add the following: "This operation consumes the `loop` and `sibling` handles and produces the…
		mapping must match, otherwise a silencable failure is produced.
		ftynseUnsubmitted Done Reply Inline Actions s/matching/mapping ftynse: s/matching/mapping

		The input handles `target` and `source` must map to exactly one operation,
		a definite failure is produced otherwise.
		springermUnsubmitted Done Reply Inline Actions I would rename these to indicate which loop is fused into which one. Maybe something like `target` and `source`. springerm: I would rename these to indicate which loop is fused into which one. Maybe something like…

		#### Return modes

		This operation consumes the `target` and `source` handles and produces the
		`fused_loop` handle, which points to the fused loop.
		}];

		let arguments = (ins TransformHandleTypeInterface:$target,
		TransformHandleTypeInterface:$source);
		let results = (outs TransformHandleTypeInterface:$fused_loop);
		let assemblyFormat = "$target `into` $source attr-dict "
		" `:` functional-type(operands, results)";

		let builders = [
		OpBuilder<(ins "Value":$loop, "Value":$fused_loop)>
		];
		}

#endif // SCF_TRANSFORM_OPS		#endif // SCF_TRANSFORM_OPS

mlir/include/mlir/Dialect/SCF/Utils/Utils.h

	Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines

	/// Get perfectly nested sequence of loops starting at root of loop nest			/// Get perfectly nested sequence of loops starting at root of loop nest
	/// (the first op being another AffineFor, and the second op - a terminator).			/// (the first op being another AffineFor, and the second op - a terminator).
	/// A loop is perfectly nested iff: the first op in the loop's body is another			/// A loop is perfectly nested iff: the first op in the loop's body is another
	/// AffineForOp, and the second op is a terminator).			/// AffineForOp, and the second op is a terminator).
	void getPerfectlyNestedLoops(SmallVectorImpl<scf::ForOp> &nestedLoops,			void getPerfectlyNestedLoops(SmallVectorImpl<scf::ForOp> &nestedLoops,
	scf::ForOp root);			scf::ForOp root);

				/// Given two scf.forall loops, `target` and `source`, fuses `target` into
				/// `source`. Assumes that the given loops are siblings and are independent of
				/// each other.
				ftynseUnsubmitted Done Reply Inline Actions Nit: in my understanding, "siblings" doesn't imply independence and vice versa, these are two separate conditions that must be met. Two adjacent loops can depend on each other, e.g. in producer/consumer way or have the first loop compute the trip count of the second loop. ftynse: Nit: in my understanding, "siblings" doesn't imply independence and vice versa, these are two…
				GroverkssAuthorUnsubmitted Done Reply Inline Actions My understanding of siblings is that they are two loops that do not behave in a producer/consumer fashion. A result/side-effect produced by the first loop, should not be consumed by the other loop. Isn't this the same as being independent? Or am I missing something? I'm happy to change the name if you feel this is not the right word. Groverkss: My understanding of siblings is that they are two loops that do not behave in a…
				ftynseUnsubmitted Done Reply Inline Actions In my understanding, sibling operations are merely operations that have a common parent. Nothing more. This is a natural extension of the parent/child terminology used extensively for operations nested in regions of other operations: two "child" operations that have a common parent are siblings. It doesn't imply anything about data flow between them. In non-polyhedral loop transformation terminology, "sibling fusion" is sometimes used in opposition to "producer/consumer fusion", but this is a very niche usage compared to the basic IR structure usage that is more pervasive. I could live with something like `doIndependentSiblingFusion` so it is clearer that "sibling" qualifies the kind of fusion and not the relation between operations, but a more verbose and less ambiguous name is welcome. ftynse: In my understanding, sibling operations are merely operations that have a common parent.
				GroverkssAuthorUnsubmitted Done Reply Inline Actions Changed it to fuseIndependentSiblingForallLoops. Groverkss: Changed it to fuseIndependentSiblingForallLoops.
				///
				/// This function does not perform any legality checks and simply fuses the
				/// loops. The caller is responsible for ensuring that the loops are legal to
				/// fuse.
				scf::ForallOp fuseIndependentSiblingForallLoops(scf::ForallOp target,
				scf::ForallOp source,
				RewriterBase &rewriter);

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_SCF_UTILS_UTILS_H_			#endif // MLIR_DIALECT_SCF_UTILS_UTILS_H_

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp

Show All 12 Lines

#include "mlir/Dialect/SCF/IR/SCF.h" #include "mlir/Dialect/SCF/IR/SCF.h"

#include "mlir/Dialect/SCF/Transforms/Patterns.h" #include "mlir/Dialect/SCF/Transforms/Patterns.h"

#include "mlir/Dialect/SCF/Transforms/Transforms.h" #include "mlir/Dialect/SCF/Transforms/Transforms.h"

#include "mlir/Dialect/SCF/Utils/Utils.h" #include "mlir/Dialect/SCF/Utils/Utils.h"

#include "mlir/Dialect/Transform/IR/TransformDialect.h" #include "mlir/Dialect/Transform/IR/TransformDialect.h"

#include "mlir/Dialect/Transform/IR/TransformInterfaces.h" #include "mlir/Dialect/Transform/IR/TransformInterfaces.h"

#include "mlir/Dialect/Transform/IR/TransformOps.h" #include "mlir/Dialect/Transform/IR/TransformOps.h"

#include "mlir/Dialect/Vector/IR/VectorOps.h" #include "mlir/Dialect/Vector/IR/VectorOps.h"

#include "mlir/IR/Dominance.h"

using namespace mlir; using namespace mlir;

using namespace mlir::affine; using namespace mlir::affine;

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// Apply...PatternsOp // Apply...PatternsOp

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 285 Lines • ▼ Show 20 Lines

void transform::TakeAssumedBranchOp::getEffects( void transform::TakeAssumedBranchOp::getEffects(

SmallVectorImpl<MemoryEffects::EffectInstance> &effects) { SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {

onlyReadsHandle(getTarget(), effects); onlyReadsHandle(getTarget(), effects);

modifiesPayload(effects); modifiesPayload(effects);

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// LoopFuseSibling

//===----------------------------------------------------------------------===//

/// Check if `target` and `source` are siblings, in the context that `target`

/// is being fused into `source`.

///

/// This is a simple check that just checks if both operations are in the same

ftynseUnsubmitted

Done

/// This is a simple check that just checks if both operations are in the same

- /// block and some checks to ensure that the fused IR does not voilate

+ /// block and some checks to ensure that the fused IR does not violate

/// dominance.

ftynse:

/// block and some checks to ensure that the fused IR does not violate

ftynseUnsubmitted

Done

This will accept loops from, e.g., different functions. A better, and faster, condition is to check that both operations belong to the same block.

Also, depending on the understanding of "sibling", this may accept as siblings two loops that have other operations (including other loops!) between them, which may not be legal.

ftynse: This will accept loops from, e.g., different functions. A better, and faster, condition is to…

GroverkssAuthorUnsubmitted

Done

I updated the documentation to describe what this operation is meant to do better.

Groverkss: I updated the documentation to describe what this operation is meant to do better.

/// dominance.

static DiagnosedSilenceableFailure isOpSibling(Operation *target,

Operation *source) {

ftynseUnsubmitted

Done

The bounds being the same is not a sufficient criterion for the fusion to be legal. Consider

scf.forall %i in %bounds
  store %i, %mem[%i]

scf.forall %i in %bounds
  load %mem[%bounds - %i]

that would trivially be incorrect if two loops are fused.

I'm not necessary opposed to having a "dangerous" transform exposed as an op, but it must be very clearly documented as such.

ftynse: The bounds being the same is not a sufficient criterion for the fusion to be legal. Consider…

GroverkssAuthorUnsubmitted

Done

Added documentation to make it clear that it is the responsibility of the user to consider these.

Groverkss: Added documentation to make it clear that it is the responsibility of the user to consider…

// Check if both operations are same.

if (target == source)

return emitSilenceableFailure(source)

ftynseUnsubmitted

Done

return false;

- // Check if fusion will voilate dominance. We check that every operand of

+ // Check if fusion will violate dominance. We check that every operand of

// `target` dominates `source` and every result of `target` is dominated by

ftynse:

<< "target and source need to be different loops";

// Check if both operations are in the same block.

if (target->getBlock() != source->getBlock())

return emitSilenceableFailure(source)

ftynseUnsubmitted

Done

isa followed by cast, even at a distance, is an anti-pattern. Call dyn_cast and check the resulting variables for being non-null instead.

ftynse: `isa` followed by `cast`, even at a distance, is an anti-pattern. Call `dyn_cast` and check the…

ftynseUnsubmitted

Done

Not all operands necessarily have a defining op, they may be block arguments, at which point this will be dereferencing a null pointer and likely crash.

Furthermore, I suspect this condition one of the conditions here is either incorrect or unnecessary. I'm confused by the use of "source" and "target" here, so I'll just my own example:

%results = scf.forall {  // first_loop
}
// something in-between
scf.forall %operands {  // second_loop
}

If the body of first_loop is merged into second_loop, then the condition that has to be ensured is that all uses of %results are after second_loop. Since second_loop already follows first_loop, all operands of first_loop are already visible at second_loop if the input IR is valid, no need to check for that.
If the body of the second_loop is merged into the first_loop, then the condition is that operands of second_loop are visible at first_loop, but the results are already okay.

Also note that dominance is more complex than just order of operations in blocks. Operands may be defined in surrounding regions, or in dominating blocks that transfer control to the given block. Results can be used in nested regions, or dominated blocks to which the control is transferred. It is possible to assume (and actually check) that the current block belongs to a single-block region, which is the most common case in presence of structured control flow. When done carefully, this entire test will conservatively return failure. However, I am encouraging you to consider at least nested regions, otherwise the utility of this transform is quite limited.

ftynse: Not all operands necessarily have a defining op, they may be block arguments, at which point…

GroverkssAuthorUnsubmitted

Done

Regarding the unnecessary conditions, I now check the ordering and only do the required checks.

Regarding dominance, I switched to using DominanceInfo, will that be enough? Also, I added some more test cases which check for dominance failures.

While adding these tests, I also noticed that not only do the operands need to be checked, but we also need to check if any value used by the second_loop should be visible by the first loop. So, I added another check for this.

Groverkss: Regarding the unnecessary conditions, I now check the ordering and only do the required checks.

<< "target and source are not in the same block";

// Check if fusion will violate dominance.

DominanceInfo domInfo(source);

if (target->isBeforeInBlock(source)) {

// Since, `target` is before `source`, all users of results of `target`

// need to be dominated by `source`.

for (Operation *user : target->getUsers()) {

if (!domInfo.properlyDominates(source, user, /*enclosingOpOk=*/false)) {

return emitSilenceableFailure(target)

<< "user of results of target should be properly dominated by "

"source";

}

} else {

// Since `target` is after `source`, all values used by `target` need

// to dominate `source`.

ftynseUnsubmitted

Done

Nit: I would rename this function to make it better correspond to what it actually does, e.g., isForallWithIdenticalConfiguration. Otherwise somebody will eventually try using it for actual fusion legality tests.

ftynse: Nit: I would rename this function to make it better correspond to what it actually does, e.g.

// Check if operands of `target` are dominated by `source`.

for (Value operand : target->getOperands()) {

ftynseUnsubmitted

Done

Nit: consider https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code. Here and below.

ftynse: Nit: consider https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to…

Operation *operandOp = operand.getDefiningOp();

// If operand does not have a defining operation, it is a block arguement,

// which will always dominate `source`, since `target` and `source` are in

// the same block and the operand dominated `source` before.

if (!operandOp)

continue;

// Operand's defining operation should properly dominate `source`.

if (!domInfo.properlyDominates(operandOp, source,

/*enclosingOpOk=*/false))

return emitSilenceableFailure(target)

<< "operands of target should be properly dominated by source";

ftynseUnsubmitted

Done

This should be documented in the op description.

ftynse: This should be documented in the op description.

}

// Check if values used by `target` are dominated by `source`.

bool failed = false;

OpOperand *failedValue = nullptr;

visitUsedValuesDefinedAbove(target->getRegions(), [&](OpOperand *operand) {

if (!domInfo.properlyDominates(operand->getOwner(), source,

springermUnsubmitted

Done

What happens if the result of one loop is used as an "shared_outs" operand of the other loop? Also, could there be issues where the resulting IR violates dominance because some operands of ops inside the loop are not defined after the other loop. I think it should be possible to write a relatively simple check with getUsedValuesDefinedAbove.

springerm: What happens if the result of one loop is used as an "shared_outs" operand of the other loop?

GroverkssAuthorUnsubmitted

Done

If the shared_outs use happens, then it's a dependence. This is what I wanted to check with this operation but wrongly assumed that isAncestor does this.

What I want to check is if any result produced by the first loop is used by the second loop.

For the dominance problem, I added a check that all values used by target must be defined before source.

Groverkss: If the shared_outs use happens, then it's a dependence. This is what I wanted to check with…

ftynseUnsubmitted

Done

Nit: don't start error messages with a capital letter.

ftynse: Nit: don't start error messages with a capital letter.

/*enclosingOpOk=*/false)) {

failed = true;

failedValue = operand;

}

springermUnsubmitted

Done

I would make this and the isOpSibiling check a silenceable error. When there is an obvious error like "op type does not match", indicating there's something inherently wrong with the transform dialect script, I usually make it a definite error.

springerm: I would make this and the `isOpSibiling` check a silenceable error. When there is an obvious…

});

if (failed)

return emitSilenceableFailure(failedValue->getOwner())

<< "values used inside regions of target should be properly "

"dominated by source";

ftynseUnsubmitted

Done

Nit: it's possible to construct an ArrayRef temporary from a single element, no need to use a vector here.

ftynse: Nit: it's possible to construct an `ArrayRef` temporary from a single element, no need to use a…

}

ftynseUnsubmitted

Done

Nit: please follow https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements here and below.

ftynse: Nit: please follow https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single…

return DiagnosedSilenceableFailure::success();

}

/// Check if `target` can be fused into `source`.

///

/// This is a simple check that just checks if both loops have same

/// bounds, steps and mapping. This check does not ensure that the side effects

/// of `target` are independent of `source` or vice-versa. It is the

/// responsibility of the caller to ensure that.

static bool isForallWithIdenticalConfiguration(Operation *target,

Operation *source) {

auto targetOp = dyn_cast<scf::ForallOp>(target);

ftynseUnsubmitted

Done

I may sound obsessed with the term, but "not siblings" doesn't describe the actual check that has taken place, which also includes dominance checks. Couldn't you just return DiagnosedSilenceableFailure from the helper function and put a more descriptive error message there?

ftynse: I may sound obsessed with the term, but "not siblings" doesn't describe the actual check that…

auto sourceOp = dyn_cast<scf::ForallOp>(source);

if (!targetOp || !sourceOp)

return false;

return targetOp.getMixedLowerBound() == sourceOp.getMixedLowerBound() &&

targetOp.getMixedUpperBound() == sourceOp.getMixedUpperBound() &&

targetOp.getMixedStep() == sourceOp.getMixedStep() &&

targetOp.getMapping() == sourceOp.getMapping();

}

/// Fuse `target` into `source` assuming they are siblings and indepndent.

/// TODO: Add fusion for more operations. Currently, we handle only scf.forall.

static Operation *fuseSiblings(Operation *target, Operation *source,

RewriterBase &rewriter) {

auto targetOp = dyn_cast<scf::ForallOp>(target);

auto sourceOp = dyn_cast<scf::ForallOp>(source);

if (!targetOp || !sourceOp)

return nullptr;

return fuseIndependentSiblingForallLoops(targetOp, sourceOp, rewriter);

}

DiagnosedSilenceableFailure

transform::LoopFuseSibling::apply(transform::TransformRewriter &rewriter,

transform::TransformResults &results,

transform::TransformState &state) {

auto targetOps = state.getPayloadOps(getTarget());

auto sourceOps = state.getPayloadOps(getSource());

if (!llvm::hasSingleElement(targetOps) ||

!llvm::hasSingleElement(sourceOps)) {

return emitDefiniteFailure()

<< "requires exactly one target handle (got "

<< llvm::range_size(targetOps) << ") and exactly one "

<< "source handle (got " << llvm::range_size(sourceOps) << ")";

}

Operation *target = *targetOps.begin();

Operation *source = *sourceOps.begin();

// Check if the target and source are siblings.

DiagnosedSilenceableFailure diag = isOpSibling(target, source);

if (!diag.succeeded())

return diag;

// Check if the target can be fused into source.

if (!isForallWithIdenticalConfiguration(target, source)) {

return emitSilenceableFailure(target->getLoc())

<< "operations cannot be fused";

}

Operation *fusedLoop = fuseSiblings(target, source, rewriter);

assert(fusedLoop && "failed to fuse operations");

results.set(cast<OpResult>(getFusedLoop()), {fusedLoop});

return DiagnosedSilenceableFailure::success();

}

//===----------------------------------------------------------------------===//

// Transform op registration // Transform op registration

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

namespace { namespace {

class SCFTransformDialectExtension class SCFTransformDialectExtension

: public transform::TransformDialectExtension< : public transform::TransformDialectExtension<

SCFTransformDialectExtension> { SCFTransformDialectExtension> {

public: public:

Show All 20 Lines

mlir/lib/Dialect/SCF/Utils/Utils.cpp

Show First 20 Lines • Show All 964 Lines • ▼ Show 20 Lines	TileLoops mlir::extractFixedOuterLoops(scf::ForOp rootForOp,

// TODO: for now we just ignore the result of band isolation.		// TODO: for now we just ignore the result of band isolation.
// In the future, mapping decisions may be impacted by the ability to		// In the future, mapping decisions may be impacted by the ability to
// isolate perfectly nested bands.		// isolate perfectly nested bands.
(void)tryIsolateBands(tileLoops);		(void)tryIsolateBands(tileLoops);

return tileLoops;		return tileLoops;
}		}

		scf::ForallOp mlir::fuseIndependentSiblingForallLoops(scf::ForallOp target,
		scf::ForallOp source,
		springermUnsubmitted Done Reply Inline Actions Can you give this function a `RewriterBase &` and use that rewriter to modify the IR. That would make it easier to reuse this helper function. springerm: Can you give this function a `RewriterBase &` and use that rewriter to modify the IR. That…
		RewriterBase &rewriter) {
		unsigned numTargetOuts = target.getNumResults();
		ftynseUnsubmitted Done Reply Inline Actions Pass in `RewriterBase &` as an argument to this function and use the `rewriter` available in the `apply` method. ftynse: Pass in `RewriterBase &` as an argument to this function and use the `rewriter` available in…
		unsigned numSourceOuts = source.getNumResults();

		OperandRange targetOuts = target.getOutputs();
		ftynseUnsubmitted Done Reply Inline Actions Nit: expand auto unless the type is obvious from context (e.g., there's a cast on the RHS). ftynse: Nit: expand auto unless the type is obvious from context (e.g., there's a cast on the RHS).
		OperandRange sourceOuts = source.getOutputs();

		ftynseUnsubmitted Done Reply Inline Actions Nit: don't specify the number of vector stack elements unless there is a very good reason to. ftynse: Nit: don't specify the number of vector stack elements unless there is a very good reason to.
		// Create fused shared_outs.
		SmallVector<Value> fusedOuts;
		fusedOuts.reserve(numTargetOuts + numSourceOuts);
		ftynseUnsubmitted Done Reply Inline Actions Something like `llvm::to_vector(llvm::concat(range1, range2))` would probably work here. ftynse: Something like `llvm::to_vector(llvm::concat(range1, range2))` would probably work here.
		GroverkssAuthorUnsubmitted Done Reply Inline Actions For some reason, to_vector refuses to work with this. Tried: llvm::to_vector(llvm::concat<Value>(range1, range2)) But that fails with: error: taking the address of a temporary object of type 'mlir::Value' [-Waddress-of-temporary] Using the old implementation for now. Groverkss: For some reason, to_vector refuses to work with this. Tried: llvm::to_vector(llvm…
		fusedOuts.append(targetOuts.begin(), targetOuts.end());
		fusedOuts.append(sourceOuts.begin(), sourceOuts.end());

		// Create a new scf::forall op after the source loop.
		rewriter.setInsertionPointAfter(source);
		scf::ForallOp fusedLoop = rewriter.create<scf::ForallOp>(
		source.getLoc(), source.getMixedLowerBound(), source.getMixedUpperBound(),
		source.getMixedStep(), fusedOuts, source.getMapping());

		// Map control operands.
		IRMapping fusedMapping;
		fusedMapping.map(target.getInductionVars(), fusedLoop.getInductionVars());
		fusedMapping.map(source.getInductionVars(), fusedLoop.getInductionVars());

		// Map shared outs.
		fusedMapping.map(target.getOutputBlockArguments(),
		fusedLoop.getOutputBlockArguments().slice(0, numTargetOuts));
		fusedMapping.map(
		source.getOutputBlockArguments(),
		fusedLoop.getOutputBlockArguments().slice(numTargetOuts, numSourceOuts));

		// Append everything except the terminator into the fused operation.
		rewriter.setInsertionPointToStart(fusedLoop.getBody());
		for (Operation &op : target.getLoopBody().begin()->without_terminator())
		rewriter.clone(op, fusedMapping);
		for (Operation &op : source.getLoopBody().begin()->without_terminator())
		rewriter.clone(op, fusedMapping);

		// Fuse the old terminator in_parallel ops into the new one.
		scf::InParallelOp targetTerm = target.getTerminator();
		scf::InParallelOp sourceTerm = source.getTerminator();
		scf::InParallelOp fusedTerm = fusedLoop.getTerminator();

		rewriter.setInsertionPointToStart(fusedTerm.getBody());
		for (Operation &op : targetTerm.getYieldingOps())
		rewriter.clone(op, fusedMapping);
		for (Operation &op : sourceTerm.getYieldingOps())
		rewriter.clone(op, fusedMapping);

		// Replace all uses of the old loops with the fused loop.
		rewriter.replaceAllUsesWith(target.getResults(),
		fusedLoop.getResults().slice(0, numTargetOuts));
		rewriter.replaceAllUsesWith(
		source.getResults(),
		fusedLoop.getResults().slice(numTargetOuts, numSourceOuts));

		// Erase the old loops.
		rewriter.eraseOp(target);
		springermUnsubmitted Done Reply Inline Actions These should then also go through the rewriter. springerm: These should then also go through the rewriter.
		ftynseUnsubmitted Done Reply Inline Actions This must use proper rewriting API, e.g. `replaceOp` and `eraseOp`. Low-level `Operation` API will result to catastrophic memory problems when used with proper rewriters. ftynse: This must use proper rewriting API, e.g. `replaceOp` and `eraseOp`. Low-level `Operation` API…
		rewriter.eraseOp(source);

		return fusedLoop;
		}

mlir/test/Dialect/SCF/transform-loop-fuse-sibling.mlir

This file was added.

				// RUN: mlir-opt %s -test-transform-dialect-interpreter --cse --canonicalize -split-input-file -verify-diagnostics \| FileCheck %s

				func.func @test(%A : tensor<128x128xf32>, %B1 : tensor<128x128xf32>, %B2 : tensor<128x128xf32>) -> (tensor<128x128xf32>, tensor<128x128xf32>) {
				%zero = arith.constant 0.0 : f32
				%out_alloc = tensor.empty() : tensor<128x128xf32>
				%out = linalg.fill ins(%zero : f32) outs(%out_alloc : tensor<128x128xf32>) -> tensor<128x128xf32>

				// CHECK: scf.forall ([[I:%.]]) in (4) shared_outs([[S1:%.]] = [[IN1:%.]], [[S2:%.]] = [[IN2:%.*]]) -> (tensor<128x128xf32>, tensor<128x128xf32>) {
				// CHECK: [[T:%.*]] = affine.apply
				// CHECK: tensor.extract_slice [[S1]][[[T]], 0] [32, 128] [1, 1]
				// CHECK: [[OUT1:%.*]] = linalg.matmul
				// CHECK: tensor.extract_slice [[S2]][[[T]], 0] [32, 128] [1, 1]
				// CHECK: [[OUT2:%.*]] = linalg.matmul
				// CHECK: scf.forall.in_parallel {
				// CHECK: tensor.parallel_insert_slice [[OUT1]] into [[S1]][[[T]], 0] [32, 128] [1, 1]
				// CHECK: tensor.parallel_insert_slice [[OUT2]] into [[S2]][[[T]], 0] [32, 128] [1, 1]
				// CHECK: }
				// CHECK: }
				%out1 = linalg.matmul ins(%A, %B1 : tensor<128x128xf32>, tensor<128x128xf32>) outs(%out : tensor<128x128xf32>) -> tensor<128x128xf32>
				%out2 = linalg.matmul ins(%A, %B2 : tensor<128x128xf32>, tensor<128x128xf32>) outs(%out : tensor<128x128xf32>) -> tensor<128x128xf32>

				func.return %out1, %out2 : tensor<128x128xf32>, tensor<128x128xf32>
				}

				transform.sequence failures(propagate) {
				^bb0(%variant_op : !transform.any_op):
				%matched = transform.structured.match ops{["linalg.matmul"]} in %variant_op : (!transform.any_op) -> (!transform.any_op)

				%mm1, %mm2 = transform.split_handle %matched : (!transform.any_op) -> (!transform.any_op, !transform.any_op)

				%loop1, %tiled_mm1 = transform.structured.tile_to_forall_op %mm1 tile_sizes [32] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
				%loop2, %tiled_mm2 = transform.structured.tile_to_forall_op %mm2 tile_sizes [32] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)

				%fused_loop = transform.loop.fuse_sibling %loop1 into %loop2 : (!transform.any_op, !transform.any_op) -> !transform.any_op
				}

				// -----

				func.func @test(%A : tensor<128x128xf32>, %B1 : tensor<128x128xf32>, %B2 : tensor<128x128xf32>) -> (tensor<128x128xf32>, tensor<128x128xf32>) {
				%zero = arith.constant 0.0 : f32
				%out_alloc = tensor.empty() : tensor<128x128xf32>
				%out = linalg.fill ins(%zero : f32) outs(%out_alloc : tensor<128x128xf32>) -> tensor<128x128xf32>

				// expected-error @below {{user of results of target should be properly dominated by source}}
				%out1 = linalg.matmul ins(%A, %B1 : tensor<128x128xf32>, tensor<128x128xf32>) outs(%out : tensor<128x128xf32>) -> tensor<128x128xf32>
				%out2 = linalg.matmul ins(%A, %out1 : tensor<128x128xf32>, tensor<128x128xf32>) outs(%out : tensor<128x128xf32>) -> tensor<128x128xf32>

				func.return %out1, %out2 : tensor<128x128xf32>, tensor<128x128xf32>
				}

				transform.sequence failures(propagate) {
				^bb0(%variant_op : !transform.any_op):
				%matched = transform.structured.match ops{["linalg.matmul"]} in %variant_op : (!transform.any_op) -> (!transform.any_op)

				%mm1, %mm2 = transform.split_handle %matched : (!transform.any_op) -> (!transform.any_op, !transform.any_op)

				%loop1, %tiled_mm1 = transform.structured.tile_to_forall_op %mm1 tile_sizes [32] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
				%loop2, %tiled_mm2 = transform.structured.tile_to_forall_op %mm2 tile_sizes [32] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)

				%fused_loop = transform.loop.fuse_sibling %loop1 into %loop2 : (!transform.any_op, !transform.any_op) -> !transform.any_op
				}

				// -----

				func.func @test(%A : tensor<128x128xf32>, %B1 : tensor<128x128xf32>, %B2 : tensor<128x128xf32>) -> (tensor<128x128xf32>, tensor<128x128xf32>) {
				%zero = arith.constant 0.0 : f32
				%out_alloc = tensor.empty() : tensor<128x128xf32>
				%out = linalg.fill ins(%zero : f32) outs(%out_alloc : tensor<128x128xf32>) -> tensor<128x128xf32>

				%out1 = linalg.matmul ins(%A, %B1 : tensor<128x128xf32>, tensor<128x128xf32>) outs(%out : tensor<128x128xf32>) -> tensor<128x128xf32>
				// expected-error @below {{values used inside regions of target should be properly dominated by source}}
				%out2 = linalg.matmul ins(%A, %out1 : tensor<128x128xf32>, tensor<128x128xf32>) outs(%out : tensor<128x128xf32>) -> tensor<128x128xf32>

				func.return %out1, %out2 : tensor<128x128xf32>, tensor<128x128xf32>
				}

				transform.sequence failures(propagate) {
				^bb0(%variant_op : !transform.any_op):
				%matched = transform.structured.match ops{["linalg.matmul"]} in %variant_op : (!transform.any_op) -> (!transform.any_op)

				%mm1, %mm2 = transform.split_handle %matched : (!transform.any_op) -> (!transform.any_op, !transform.any_op)

				%loop1, %tiled_mm1 = transform.structured.tile_to_forall_op %mm1 tile_sizes [32] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
				%loop2, %tiled_mm2 = transform.structured.tile_to_forall_op %mm2 tile_sizes [32] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)

				%fused_loop = transform.loop.fuse_sibling %loop2 into %loop1 : (!transform.any_op, !transform.any_op) -> !transform.any_op
				}

				// -----

				func.func @test(%A : tensor<128x128xf32>, %B1 : tensor<128x128xf32>, %B2 : tensor<128x128xf32>) -> (tensor<128x128xf32>, tensor<128x128xf32>) {
				%zero = arith.constant 0.0 : f32
				%out_alloc = tensor.empty() : tensor<128x128xf32>
				%out = linalg.fill ins(%zero : f32) outs(%out_alloc : tensor<128x128xf32>) -> tensor<128x128xf32>

				%out1 = linalg.matmul ins(%A, %B1 : tensor<128x128xf32>, tensor<128x128xf32>) outs(%out : tensor<128x128xf32>) -> tensor<128x128xf32>
				// expected-error @below {{operands of target should be properly dominated by source}}
				%out2 = linalg.matmul ins(%A, %B2 : tensor<128x128xf32>, tensor<128x128xf32>) outs(%out1 : tensor<128x128xf32>) -> tensor<128x128xf32>

				func.return %out1, %out2 : tensor<128x128xf32>, tensor<128x128xf32>
				}

				transform.sequence failures(propagate) {
				^bb0(%variant_op : !transform.any_op):
				%matched = transform.structured.match ops{["linalg.matmul"]} in %variant_op : (!transform.any_op) -> (!transform.any_op)

				%mm1, %mm2 = transform.split_handle %matched : (!transform.any_op) -> (!transform.any_op, !transform.any_op)

				%loop1, %tiled_mm1 = transform.structured.tile_to_forall_op %mm1 tile_sizes [32] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
				%loop2, %tiled_mm2 = transform.structured.tile_to_forall_op %mm2 tile_sizes [32] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)

				%fused_loop = transform.loop.fuse_sibling %loop2 into %loop1 : (!transform.any_op, !transform.any_op) -> !transform.any_op
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SCF][Transform] Add transform.loop.fuse_siblingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 551731

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td

mlir/include/mlir/Dialect/SCF/Utils/Utils.h

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp

mlir/lib/Dialect/SCF/Utils/Utils.cpp

mlir/test/Dialect/SCF/transform-loop-fuse-sibling.mlir

[SCF][Transform] Add transform.loop.fuse_sibling
ClosedPublic