This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/SCF/
-
mlir/
-
Dialect/
-
SCF/
-
TransformOps/
4/11
SCFTransformOps.td
-
Utils/
-
Utils.h
-
lib/Dialect/SCF/
-
Dialect/
-
SCF/
-
TransformOps/
3
SCFTransformOps.cpp
-
Utils/
2
Utils.cpp
-
test/Dialect/SCF/
-
Dialect/
-
SCF/
1/4
transform-op-coalesce.mlir

Differential D147830

[mlir][SCF][Transform] Add loop.coalesce_parallel Op in transform dialect
Needs RevisionPublic

Authored by tavakkoliamirmohammad on Apr 7 2023, 6:17 PM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache

Summary

loop.coalesce_parallel enable the coalescing of arbitrary index variables in an scf.forall. This is done by calculating the old index variables based on the new index variables.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tavakkoliamirmohammad created this revision.Apr 7 2023, 6:17 PM

Herald added subscribers: bviyer, Moerafaat, bzcheeseman and 22 others. · View Herald TranscriptApr 7 2023, 6:17 PM

tavakkoliamirmohammad requested review of this revision.Apr 7 2023, 6:17 PM

Herald added a subscriber: stephenneuendorffer. · View Herald TranscriptApr 7 2023, 6:18 PM

Harbormaster completed remote builds in B224317: Diff 511834.Apr 7 2023, 6:27 PM

Fix missing import after applying patch

Harbormaster completed remote builds in B224327: Diff 511844.Apr 7 2023, 9:57 PM

Hi @ftynse @nicolasvasilache. Could you kindly review my submitted patch? Thanks!

ftynse requested changes to this revision.Apr 14 2023, 3:05 PM

ftynse added inline comments.

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
221	Can't we rather extend `loop.coalesce` to dispatch to one of the two functions based on the kind of op it sees?
226	Please provide more documentation tha one line.
234–236	What are these? Please document. It's unclear to me why do we need to hardcode exactly three of these.
mlir/test/Dialect/SCF/transform-op-coalesce.mlir
99	Coalescing normally requires two nested loops. I suppose this attempts to collapse iterators of a single op, but this is not documented, and not really exercised in the test, which doesn't test for the trip count of the new dimensions or for how the access indices are updated.

This revision now requires changes to proceed.Apr 14 2023, 3:05 PM

Updated the op documentation and addressed the comments.

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
221	Since we need additional collapsed dimensions input specifying the position of the newly coalesced dim. Also, I added another input for the mapping attribute which is not necessary in the case of scf.for loop.
226	Added
234–236	collapsed_dim0 means that whatever is specified in that array will be dim0 in new scf.forall. I added an example in op document. The number three comes from the mapping attribute having at most three dimensions. Also, I don't know if we can have a two-dimensional array in MLIR. If this is supported, we can change this code to be a two-dimensional array, collapsed_dims.
mlir/test/Dialect/SCF/transform-op-coalesce.mlir
99	Could you please help me understand your comment better? I apologize if I didn't fully comprehend what you were trying to convey. The access indices won't change inside the computation. The access indices are computed based on `arith.remsi` and `arith.divsi`. These checks are in the test file.

Harbormaster completed remote builds in B225754: Diff 513781.Apr 14 2023, 5:14 PM

Apologies for the delay. Some more comments, but we are getting there.

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
221	Okay, then let's at least rename this to `coalesce_forall` because "scf.parallel" is a different construct that is not being handled here.
226	Nit: let's make sure it fits into 80 cols. Also: "when using" is not clear in this context, say something like "`scf.forall` ops associated with the operand handle have their dimensions coalesced, other ops are not modified".
227	Nit: numbner/number
232	This can use markdown-style three backticks to delimit the code here (it will also render properly on the website).
234–236	Also, I don't know if we can have a two-dimensional array in MLIR. If this is supported, we can change this code to be a two-dimensional array, collapsed_dims. We can have an ArrayAttr of DenseArrayAttr (or any other attribute as a matter of fact). Let's do this instead. Hardcoding assumptions about hardware isn't desirable at this level.
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
257	Ultra-nit: `no need to prefix` SmallVector` with `llvm::`, it is re-exported in the `mlir` namespace. Also, you may want to consider `RaggedArray` from `mlir/Dialect/Transform/Utils/`.
259	Nit: please expand `auto` unless the type is obvious from context (e.g., there's a cast on the RHS).
279	It would be nice to add a diagnostic note pointing to the specific loop that failed to coalesce.
mlir/lib/Dialect/SCF/Utils/Utils.cpp
713–714	Let's rather make this function accept a `RewriterBase` so we can connect that properly from elsewhere in the codebase.
797–799	If this uses `RewriterBase`, all of these becomes much simpler and cheaper. The body block of the original loop can be inlined, which also handles the update of block arguments. Using RAUW makes this function incompatible with rewrites and generally difficult to debug.
mlir/test/Dialect/SCF/transform-op-coalesce.mlir
99	Now that there's more documentation, I suppose the only missing thing here is the check that the new loop iterates `in (1824, 3648)`.
126	Nit: please add the newline.

This revision now requires changes to proceed.Apr 27 2023, 11:32 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SCF/

TransformOps/

SCFTransformOps.td

54 lines

Utils/

Utils.h

8 lines

lib/

Dialect/

SCF/

TransformOps/

SCFTransformOps.cpp

36 lines

Utils/

Utils.cpp

104 lines

test/

Dialect/

SCF/

transform-op-coalesce.mlir

33 lines

Diff 513781

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td

//===- SCFTransformOps.td - SCF (loop) transformation ops --- tablegen --===//		//===- SCFTransformOps.td - SCF (loop) transformation ops --- tablegen --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef SCF_TRANSFORM_OPS		#ifndef SCF_TRANSFORM_OPS
#define SCF_TRANSFORM_OPS		#define SCF_TRANSFORM_OPS

include "mlir/Dialect/Transform/IR/TransformDialect.td"		include "mlir/Dialect/Transform/IR/TransformDialect.td"
include "mlir/Dialect/Transform/IR/TransformInterfaces.td"		include "mlir/Dialect/Transform/IR/TransformInterfaces.td"
include "mlir/Dialect/Transform/IR/TransformTypes.td"		include "mlir/Dialect/Transform/IR/TransformTypes.td"
		include "mlir/Dialect/SCF/IR/DeviceMappingInterface.td"
include "mlir/Interfaces/SideEffectInterfaces.td"		include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/IR/OpBase.td"		include "mlir/IR/OpBase.td"


def Transform_ScfForOp : Transform_ConcreteOpType<"scf.for">;		def Transform_ScfForOp : Transform_ConcreteOpType<"scf.for">;

def GetParentForOp : Op<Transform_Dialect, "loop.get_parent_for",		def GetParentForOp : Op<Transform_Dialect, "loop.get_parent_for",
[NavigationTransformOpTrait, MemoryEffectsOpInterface,		[NavigationTransformOpTrait, MemoryEffectsOpInterface,
DeclareOpInterfaceMethods<TransformOpInterface>]> {		DeclareOpInterfaceMethods<TransformOpInterface>]> {
let summary = "Gets a handle to the parent 'for' loop of the given operation";		let summary = "Gets a handle to the parent 'for' loop of the given operation";
let description = [{		let description = [{
Produces a handle to the n-th (default 1) parent `scf.for` or `affine.for`		Produces a handle to the n-th (default 1) parent `scf.for` or `affine.for`
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	def LoopCoalesceOp : Op<Transform_Dialect, "loop.coalesce", [
let extraClassDeclaration = [{		let extraClassDeclaration = [{
::mlir::DiagnosedSilenceableFailure applyToOne(		::mlir::DiagnosedSilenceableFailure applyToOne(
::mlir::Operation *target,		::mlir::Operation *target,
::mlir::transform::ApplyToEachResultList &results,		::mlir::transform::ApplyToEachResultList &results,
::mlir::transform::TransformState &state);		::mlir::transform::TransformState &state);
}];		}];
}		}


		def LoopCoalesceParallelOp : Op<Transform_Dialect, "loop.coalesce_parallel", [
		ftynseUnsubmitted Not Done Reply Inline Actions Can't we rather extend `loop.coalesce` to dispatch to one of the two functions based on the kind of op it sees? ftynse: Can't we rather extend `loop.coalesce` to dispatch to one of the two functions based on the…
		tavakkoliamirmohammadAuthorUnsubmitted Done Reply Inline Actions Since we need additional collapsed dimensions input specifying the position of the newly coalesced dim. Also, I added another input for the mapping attribute which is not necessary in the case of scf.for loop. tavakkoliamirmohammad: Since we need additional collapsed dimensions input specifying the position of the newly…
		ftynseUnsubmitted Not Done Reply Inline Actions Okay, then let's at least rename this to `coalesce_forall` because "scf.parallel" is a different construct that is not being handled here. ftynse: Okay, then let's at least rename this to `coalesce_forall` because "scf.parallel" is a…
		FunctionalStyleTransformOpTrait, MemoryEffectsOpInterface,
		TransformOpInterface, TransformEachOpTrait]> {
		let summary = "Coalesces scf.forall loop";
		let description = [{
		When using `scf.forall`, the dimensions are coalesced. By specifying the integer array collapsed_dim{i}, all dimensions specified in this array are coalesced in the {i}th dimension of the new `scf.forall`.
		ftynseUnsubmitted Done Reply Inline Actions Please provide more documentation tha one line. ftynse: Please provide more documentation tha one line.
		tavakkoliamirmohammadAuthorUnsubmitted Done Reply Inline Actions Added tavakkoliamirmohammad: Added
		ftynseUnsubmitted Not Done Reply Inline Actions Nit: let's make sure it fits into 80 cols. Also: "when using" is not clear in this context, say something like "`scf.forall` ops associated with the operand handle have their dimensions coalesced, other ops are not modified". ftynse: Nit: let's make sure it fits into 80 cols. Also: "when using" is not clear in this context, say…
		Up to three dimensions can be hard-coded because of an additional mapping parameter that this operation receives. The magic numbner three is reflecting the three-dimensional mapping on devices grid.
		ftynseUnsubmitted Not Done Reply Inline Actions Nit: numbner/number ftynse: Nit: numbner/number
		The new dimensions of scf.forall have this `mapping` attribute attached.

		For Example:

		scf.forall (%arg0, %arg1, %arg2, %arg3) in (16, 114, 114, 32)
		ftynseUnsubmitted Not Done Reply Inline Actions This can use markdown-style three backticks to delimit the code here (it will also render properly on the website). ftynse: This can use markdown-style three backticks to delimit the code here (it will also render…
		and the following transform dialect op
		transform.loop.coalesce_parallel %0 {collapsed_dim0 = array<i64: 0, 3>, collapsed_dim1 = array<i64: 1, 2>, mapping = [#gpu.block<x>, #gpu.block<y>]}

		This code will coalesce the parallel dimensions 0 and 3 into the 0th dimension, and dimensions 1 and 2 into the first dimension of the new `scf.forall`.
		ftynseUnsubmitted Not Done Reply Inline Actions What are these? Please document. It's unclear to me why do we need to hardcode exactly three of these. ftynse: What are these? Please document. It's unclear to me why do we need to hardcode exactly three of…
		tavakkoliamirmohammadAuthorUnsubmitted Done Reply Inline Actions collapsed_dim0 means that whatever is specified in that array will be dim0 in new scf.forall. I added an example in op document. The number three comes from the mapping attribute having at most three dimensions. Also, I don't know if we can have a two-dimensional array in MLIR. If this is supported, we can change this code to be a two-dimensional array, collapsed_dims. tavakkoliamirmohammad: collapsed_dim0 means that whatever is specified in that array will be dim0 in new scf.forall. I…
		ftynseUnsubmitted Not Done Reply Inline Actions Also, I don't know if we can have a two-dimensional array in MLIR. If this is supported, we can change this code to be a two-dimensional array, collapsed_dims. We can have an ArrayAttr of DenseArrayAttr (or any other attribute as a matter of fact). Let's do this instead. Hardcoding assumptions about hardware isn't desirable at this level. ftynse: > Also, I don't know if we can have a two-dimensional array in MLIR. If this is supported, we…
		Additionally, the mapping of dimensions to block x and y is specified.

		scf.forall (%arg0, %arg1) in (512, 12996) {
		...
		} {[#gpu.block<x>, #gpu.block<y>]}

		Note that collapsed_dim{i} is not restricted to have more than one dimension. If a collapsed_dim{i} has only a single element, that dimension will remain the same in the new `scf.forall` and will move to the ith dimension.
		However, if mapping is specified, the number of elements in mapping should be the same as the number of specified collapsed dimensions.

		#### Return modes

		The return handle points to the coalesced loop if coalescing happens, or
		the given input loop if coalescing does not happen.
		}];
		let arguments = (ins TransformHandleTypeInterface:$target,
		DefaultValuedAttr<DenseI64ArrayAttr, "{}">:$collapsed_dim0,
		DefaultValuedAttr<DenseI64ArrayAttr, "{}">:$collapsed_dim1,
		DefaultValuedAttr<DenseI64ArrayAttr, "{}">:$collapsed_dim2,
		OptionalAttr<DeviceMappingArrayAttr>:$mapping
		);

		let results = (outs TransformHandleTypeInterface:$transformed);

		let assemblyFormat = [{
		$target attr-dict `:` functional-type($target, $transformed)
		}];

		let extraClassDeclaration = [{
		::mlir::DiagnosedSilenceableFailure applyToOne(
		::mlir::Operation *target,
		::mlir::transform::ApplyToEachResultList &results,
		::mlir::transform::TransformState &state);
		}];
		}

#endif // SCF_TRANSFORM_OPS		#endif // SCF_TRANSFORM_OPS

mlir/include/mlir/Dialect/SCF/Utils/Utils.h

	Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	LogicalResult coalesceLoops(MutableArrayRef<scf::ForOp> loops);			LogicalResult coalesceLoops(MutableArrayRef<scf::ForOp> loops);

	/// Take the ParallelLoop and for each set of dimension indices, combine them			/// Take the ParallelLoop and for each set of dimension indices, combine them
	/// into a single dimension. combinedDimensions must contain each index into			/// into a single dimension. combinedDimensions must contain each index into
	/// loops exactly once.			/// loops exactly once.
	void collapseParallelLoops(scf::ParallelOp loops,			void collapseParallelLoops(scf::ParallelOp loops,
	ArrayRef<std::vector<unsigned>> combinedDimensions);			ArrayRef<std::vector<unsigned>> combinedDimensions);

				/// Take the Forall and for each set of dimension indices, combine them
				/// into a single dimension. combinedDimensions must contain each index into
				/// loops exactly once.
				LogicalResult
				collapseForallLoops(scf::ForallOp loops,
				ArrayRef<std::vector<unsigned>> combinedDimensions,
				std::optional<ArrayAttr> mapping);

	/// Promotes the loop body of a scf::ForOp to its containing block if the loop			/// Promotes the loop body of a scf::ForOp to its containing block if the loop
	/// was known to have a single iteration.			/// was known to have a single iteration.
	LogicalResult promoteIfSingleIteration(scf::ForOp forOp);			LogicalResult promoteIfSingleIteration(scf::ForOp forOp);

	/// Unrolls this for operation by the specified unroll factor. Returns failure			/// Unrolls this for operation by the specified unroll factor. Returns failure
	/// if the loop cannot be unrolled either due to restrictions or due to invalid			/// if the loop cannot be unrolled either due to restrictions or due to invalid
	/// unroll factors. Requires positive loop bounds and step. If specified,			/// unroll factors. Requires positive loop bounds and step. If specified,
	/// annotates the Ops in each unrolled iteration by applying `annotateFn`.			/// annotates the Ops in each unrolled iteration by applying `annotateFn`.
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	if (failed(result)) {
DiagnosedSilenceableFailure diag = emitSilenceableError()		DiagnosedSilenceableFailure diag = emitSilenceableError()
<< "failed to coalesce";		<< "failed to coalesce";
return diag;		return diag;
}		}
return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// LoopCoalesceParallelOp
		//===----------------------------------------------------------------------===//

		DiagnosedSilenceableFailure transform::LoopCoalesceParallelOp::applyToOne(
		Operation *op, transform::ApplyToEachResultList &results,
		transform::TransformState &state) {
		LogicalResult result(failure());
		if (scf::ForallOp scfForallOp = dyn_cast<scf::ForallOp>(op)) {
		llvm::SmallVector<std::vector<unsigned>, 3> combinedLoops;
		ftynseUnsubmitted Not Done Reply Inline Actions Ultra-nit: `no need to prefix` SmallVector` with `llvm::`, it is re-exported in the `mlir` namespace. Also, you may want to consider `RaggedArray` from `mlir/Dialect/Transform/Utils/`. ftynse: Ultra-nit: `no need to prefix `SmallVector` with `llvm::`, it is re-exported in the `mlir`…
		if (!getCollapsedDim0().empty()) {
		auto vec = getCollapsedDim0().vec();
		ftynseUnsubmitted Not Done Reply Inline Actions Nit: please expand `auto` unless the type is obvious from context (e.g., there's a cast on the RHS). ftynse: Nit: please expand `auto` unless the type is obvious from context (e.g., there's a cast on the…
		combinedLoops.push_back(
		std::vector<unsigned int>(vec.begin(), vec.end()));
		}
		if (!getCollapsedDim1().empty()) {
		auto vec = getCollapsedDim1().vec();
		combinedLoops.push_back(
		std::vector<unsigned int>(vec.begin(), vec.end()));
		}
		if (!getCollapsedDim2().empty()) {
		auto vec = getCollapsedDim2().vec();
		combinedLoops.push_back(
		std::vector<unsigned int>(vec.begin(), vec.end()));
		}
		result = collapseForallLoops(scfForallOp, combinedLoops, getMapping());
		}

		results.push_back(op);
		if (failed(result)) {
		DiagnosedSilenceableFailure diag = emitSilenceableError()
		<< "failed to coalesce";
		ftynseUnsubmitted Not Done Reply Inline Actions It would be nice to add a diagnostic note pointing to the specific loop that failed to coalesce. ftynse: It would be nice to add a diagnostic note pointing to the specific loop that failed to coalesce.
		return diag;
		}
		return DiagnosedSilenceableFailure::success();
		}
		//===----------------------------------------------------------------------===//
// Transform op registration		// Transform op registration
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
class SCFTransformDialectExtension		class SCFTransformDialectExtension
: public transform::TransformDialectExtension<		: public transform::TransformDialectExtension<
SCFTransformDialectExtension> {		SCFTransformDialectExtension> {
public:		public:
Show All 20 Lines

mlir/lib/Dialect/SCF/Utils/Utils.cpp

Show First 20 Lines • Show All 701 Lines • ▼ Show 20 Lines	LogicalResult mlir::coalesceLoops(MutableArrayRef<scf::ForOp> loops) {
innermost.getBody()->back().erase();		innermost.getBody()->back().erase();
outermost.getBody()->getOperations().splice(		outermost.getBody()->getOperations().splice(
Block::iterator(second.getOperation()),		Block::iterator(second.getOperation()),
innermost.getBody()->getOperations());		innermost.getBody()->getOperations());
second.erase();		second.erase();
return success();		return success();
}		}

		LogicalResult
		mlir::collapseForallLoops(scf::ForallOp loops,
		ArrayRef<std::vector<unsigned>> combinedDimensions,
		std::optional<ArrayAttr> mapping) {
		OpBuilder outsideBuilder(loops);
		ftynseUnsubmitted Not Done Reply Inline Actions Let's rather make this function accept a `RewriterBase` so we can connect that properly from elsewhere in the codebase. ftynse: Let's rather make this function accept a `RewriterBase` so we can connect that properly from…
		Location loc = loops.getLoc();

		// Presort combined dimensions.
		auto sortedDimensions = llvm::to_vector<3>(combinedDimensions);
		for (auto &dims : sortedDimensions)
		llvm::sort(dims);

		// Normalize ForallOp's iteration pattern.
		SmallVector<Value, 3> normalizedLowerBounds, normalizedSteps,
		normalizedUpperBounds;
		for (unsigned i = 0, e = loops.getRank(); i < e; ++i) {
		OpBuilder insideLoopBuilder = OpBuilder::atBlockBegin(loops.getBody());
		auto resultBounds = normalizeLoop(outsideBuilder, insideLoopBuilder, loc,
		loops.getLowerBound(outsideBuilder)[i],
		loops.getUpperBound(outsideBuilder)[i],
		loops.getStep(outsideBuilder)[i],
		loops.getBody()->getArgument(i));

		normalizedLowerBounds.push_back(resultBounds.lowerBound);
		normalizedUpperBounds.push_back(resultBounds.upperBound);
		normalizedSteps.push_back(resultBounds.step);
		}

		// Combine iteration spaces.
		SmallVector<Value, 3> lowerBounds, upperBounds, steps;
		auto cst0 = outsideBuilder.create<arith::ConstantIndexOp>(loc, 0);
		auto cst1 = outsideBuilder.create<arith::ConstantIndexOp>(loc, 1);
		for (unsigned i = 0, e = sortedDimensions.size(); i < e; ++i) {
		Value newUpperBound =
		outsideBuilder.createOrFold<arith::ConstantIndexOp>(loc, 1);
		for (auto idx : sortedDimensions[i]) {
		newUpperBound = outsideBuilder.createOrFold<arith::MulIOp>(
		loc, newUpperBound, normalizedUpperBounds[idx]);
		}
		lowerBounds.push_back(cst0);
		steps.push_back(cst1);
		upperBounds.push_back(newUpperBound);
		}

		// Create new Forall with conversions to the original induction values.
		// The loop below uses divisions to get the relevant range of values in the
		// new induction value that represent each range of the original induction
		// value. The remainders then determine based on that range, which iteration
		// of the original induction value this represents. This is a normalized value
		// that is un-normalized already by the previous logic.
		auto newPloop = outsideBuilder.create<scf::ForallOp>(
		loc, getAsOpFoldResult(upperBounds), loops.getOutputs(), mapping,
		[&](OpBuilder &insideBuilder, Location, ValueRange ploopIVs) {
		for (unsigned i = 0, e = combinedDimensions.size(); i < e; ++i) {
		Value previous = ploopIVs[i];
		unsigned numberCombinedDimensions = combinedDimensions[i].size();
		// Iterate over all except the last induction value.
		for (unsigned j = numberCombinedDimensions - 1; j > 0; --j) {
		unsigned idx = combinedDimensions[i][j];

		// Determine the current induction value's current loop iteration
		Value iv = insideBuilder.createOrFold<arith::RemSIOp>(
		loc, previous, normalizedUpperBounds[idx]);
		replaceAllUsesInRegionWith(loops.getInductionVar(idx), iv,
		loops.getRegion());

		// Remove the effect of the current induction value to prepare for
		// the next value.
		previous = insideBuilder.createOrFold<arith::DivSIOp>(
		loc, previous, normalizedUpperBounds[idx]);
		}

		// The final induction value is just the remaining value.
		unsigned idx = combinedDimensions[i][0];
		replaceAllUsesInRegionWith(loops.getBody()->getArgument(idx),
		previous, loops.getRegion());
		}
		// Create empty in_parallel section
		insideBuilder.create<scf::InParallelOp>(loc);
		});

		// Map the old values to new values when cloning the code
		IRMapping irMapping;
		irMapping.map(loops.getOutputBlockArguments(),
		newPloop.getOutputBlockArguments());

		// Clone the body of forall
		outsideBuilder.setInsertionPoint(newPloop.getTerminator());
		for (auto &op : loops.getBody()->without_terminator())
		outsideBuilder.clone(op, irMapping);
		ftynseUnsubmitted Not Done Reply Inline Actions If this uses `RewriterBase`, all of these becomes much simpler and cheaper. The body block of the original loop can be inlined, which also handles the update of block arguments. Using RAUW makes this function incompatible with rewrites and generally difficult to debug. ftynse: If this uses `RewriterBase`, all of these becomes much simpler and cheaper. The body block of…

		// Clone the body of forall terminator
		outsideBuilder.setInsertionPointToStart(newPloop.getTerminator().getBody());
		auto forallTerminator = loops.getTerminator();
		for (auto &bodyOp : forallTerminator.getYieldingOps()) {
		outsideBuilder.clone(bodyOp, irMapping);
		}

		// Replace the old loop with the new loop.
		loops.replaceAllUsesWith(newPloop);
		loops.erase();
		return success();
		}

void mlir::collapseParallelLoops(		void mlir::collapseParallelLoops(
scf::ParallelOp loops, ArrayRef<std::vector<unsigned>> combinedDimensions) {		scf::ParallelOp loops, ArrayRef<std::vector<unsigned>> combinedDimensions) {
OpBuilder outsideBuilder(loops);		OpBuilder outsideBuilder(loops);
Location loc = loops.getLoc();		Location loc = loops.getLoc();

// Presort combined dimensions.		// Presort combined dimensions.
auto sortedDimensions = llvm::to_vector<3>(combinedDimensions);		auto sortedDimensions = llvm::to_vector<3>(combinedDimensions);
for (auto &dims : sortedDimensions)		for (auto &dims : sortedDimensions)
▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

mlir/test/Dialect/SCF/transform-op-coalesce.mlir

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !pdl.operation):			^bb1(%arg1: !pdl.operation):
	%0 = transform.structured.match ops{["scf.for"]} attributes {coalesce} in %arg1 : (!pdl.operation) -> !pdl.operation			%0 = transform.structured.match ops{["scf.for"]} attributes {coalesce} in %arg1 : (!pdl.operation) -> !pdl.operation
	%1 = transform.cast %0 : !pdl.operation to !transform.op<"scf.for">			%1 = transform.cast %0 : !pdl.operation to !transform.op<"scf.for">
	%2 = transform.loop.coalesce %1 : (!transform.op<"scf.for">) -> (!transform.op<"scf.for">)			%2 = transform.loop.coalesce %1 : (!transform.op<"scf.for">) -> (!transform.op<"scf.for">)
	transform.loop.unroll %2 {factor = 3} : !transform.op<"scf.for">			transform.loop.unroll %2 {factor = 3} : !transform.op<"scf.for">
	}			}


				// -----
				#map = affine_map<(d0) -> (d0 * 2)>
				func.func @conv(%0: tensor<32x230x230x32xf32>, %1: tensor<3x3x32x64xf32>, %2: tensor<32x228x228x64xf32>) -> tensor<32x228x228x64xf32> {
				%c0 = arith.constant 0 : index
				%6 = scf.forall (%arg0, %arg1, %arg2, %arg3) in (16, 114, 114, 32) shared_outs(%arg4 = %2) -> (tensor<32x228x228x64xf32>) {
				ftynseUnsubmitted Not Done Reply Inline Actions Coalescing normally requires two nested loops. I suppose this attempts to collapse iterators of a single op, but this is not documented, and not really exercised in the test, which doesn't test for the trip count of the new dimensions or for how the access indices are updated. ftynse: Coalescing normally requires two nested loops. I suppose this attempts to collapse iterators of…
				tavakkoliamirmohammadAuthorUnsubmitted Done Reply Inline Actions Could you please help me understand your comment better? I apologize if I didn't fully comprehend what you were trying to convey. The access indices won't change inside the computation. The access indices are computed based on `arith.remsi` and `arith.divsi`. These checks are in the test file. tavakkoliamirmohammad: Could you please help me understand your comment better? I apologize if I didn't fully…
				ftynseUnsubmitted Not Done Reply Inline Actions Now that there's more documentation, I suppose the only missing thing here is the check that the new loop iterates `in (1824, 3648)`. ftynse: Now that there's more documentation, I suppose the only missing thing here is the check that…
				// CHECK: scf.forall (%[[IV1:.+]], %[[IV2:.+]]) in
				// CHECK: %[[IDX0:.+]] = arith.remsi %[[IV1]]
				// CHECK: %[[IDX1:.+]] = arith.divsi %[[IV1]]
				// CHECK: %[[IDX2:.+]] = arith.remsi %[[IV2]]
				// CHECK: %[[IDX3:.+]] = arith.divsi %[[IV2]]
				%7 = affine.apply #map(%arg0)
				%8 = affine.apply #map(%arg1)
				%9 = affine.apply #map(%arg2)
				%10 = affine.apply #map(%arg3)
				%extracted_slice = tensor.extract_slice %0[%7, %8, %9, 0] [2, 4, 4, 32] [1, 1, 1, 1] : tensor<32x230x230x32xf32> to tensor<2x4x4x32xf32>
				%extracted_slice_0 = tensor.extract_slice %1[0, 0, 0, %10] [3, 3, 32, 2] [1, 1, 1, 1] : tensor<3x3x32x64xf32> to tensor<3x3x32x2xf32>
				%extracted_slice_1 = tensor.extract_slice %arg4[%7, %8, %9, %10] [2, 2, 2, 2] [1, 1, 1, 1] : tensor<32x228x228x64xf32> to tensor<2x2x2x2xf32>
				%11 = linalg.conv_2d_nhwc_hwcf {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%extracted_slice, %extracted_slice_0 : tensor<2x4x4x32xf32>, tensor<3x3x32x2xf32>) outs(%extracted_slice_1 : tensor<2x2x2x2xf32>) -> tensor<2x2x2x2xf32>
				// CHECK: scf.forall.in_parallel
				scf.forall.in_parallel {
				tensor.parallel_insert_slice %11 into %arg4[%7, %8, %9, %10] [2, 2, 2, 2] [1, 1, 1, 1] : tensor<2x2x2x2xf32> into tensor<32x228x228x64xf32>
				}
				}
				return %6 : tensor<32x228x228x64xf32>
				}

				transform.sequence failures(propagate) {
				^bb1(%arg1: !pdl.operation):
				%0 = transform.structured.match ops{["scf.forall"]} in %arg1 : (!pdl.operation) -> !pdl.operation
				%2 = transform.loop.coalesce_parallel %0 {collapsed_dim0 = array<i64: 0, 1>, collapsed_dim1 = array<i64: 2, 3>, mapping = [#gpu.block<x>, #gpu.block<y>]} : (!pdl.operation) -> !pdl.operation
				}
				No newline at end of file
				ftynseUnsubmitted Not Done Reply Inline Actions Nit: please add the newline. ftynse: Nit: please add the newline.

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][SCF][Transform] Add loop.coalesce_parallel Op in transform dialectNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 513781

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td

mlir/include/mlir/Dialect/SCF/Utils/Utils.h

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp

mlir/lib/Dialect/SCF/Utils/Utils.cpp

mlir/test/Dialect/SCF/transform-op-coalesce.mlir

[mlir][SCF][Transform] Add loop.coalesce_parallel Op in transform dialect
Needs RevisionPublic